Added Pawel's origin commit message into the osggpucull source as it examples a lot about how the example works
This commit is contained in:
@@ -13,6 +13,137 @@
|
||||
*
|
||||
*/
|
||||
|
||||
/** osggpucull example.
|
||||
|
||||
A geometry instancing rendering algorithm consisting of two consequent phases :
|
||||
|
||||
- first phase is a GLSL shader performing object culling and LOD picking ( a culling shader ).
|
||||
Every culled object is represented as GL_POINT in the input osg::Geometry.
|
||||
The output of the culling shader is a set of object LODs that need to be rendered.
|
||||
The output is stored in texture buffer objects. No pixel is drawn to the screen
|
||||
because GL_RASTERIZER_DISCARD mode is used.
|
||||
|
||||
- second phase draws osg::Geometry containing merged LODs using glDrawArraysIndirect()
|
||||
function. Information about quantity of instances to render, its positions and other
|
||||
parameters is sourced from texture buffer objects filled in the first phase.
|
||||
|
||||
The example uses various OpenGL 4.2 features such as texture buffer objects,
|
||||
atomic counters, image units and functions defined in GL_ARB_shader_image_load_store
|
||||
extension to achieve its goal and thus will not work on graphic cards with older OpenGL
|
||||
versions.
|
||||
|
||||
The example was tested on Linux and Windows with NVidia 570 and 580 cards.
|
||||
The tests on AMD cards were not conducted ( due to lack of it ).
|
||||
The tests were performed using OSG revision 14088.
|
||||
|
||||
The main advantages of this rendering method :
|
||||
- instanced rendering capable of drawing thousands of different objects with
|
||||
almost no CPU intervention ( cull and draw times are close to 0 ms ).
|
||||
- input objects may be sourced from any OSG graph ( for example - information about
|
||||
object points may be stored in a PagedLOD graph. This way we may cover the whole
|
||||
countries with trees, buildings and other objects ).
|
||||
Furthermore if we create osgDB plugins that generate data on the fly, we may
|
||||
generate information for every grass blade for that country.
|
||||
- every object may have its own parameters and thus may be distinct from other objects
|
||||
of the same type.
|
||||
- relatively low memory footprint ( single object information is stored in a few
|
||||
vertex attributes ).
|
||||
- no GPU->CPU roundtrip typical for such methods ( method uses atomic counters
|
||||
and glDrawArraysIndirect() function instead of OpenGL queries. This way
|
||||
information about quantity of rendered objects never goes back to CPU.
|
||||
The typical GPU->CPU roundtrip cost is about 2 ms ).
|
||||
- this example also shows how to render dynamic objects ( objects that may change
|
||||
its position ) with moving parts ( like car wheels or airplane propellers ) .
|
||||
The obvious extension to that dynamic method would be the animated crowd rendering.
|
||||
- rendered objects may be easily replaced ( there is no need to process the whole
|
||||
OSG graphs, because these graphs store only positional information ).
|
||||
|
||||
The main disadvantages of a method :
|
||||
- the maximum quantity of objects to render must be known beforehand
|
||||
( because texture buffer objects holding data between phases have constant size ).
|
||||
- OSG statistics are flawed ( they don't know anymore how many objects are drawn ).
|
||||
- osgUtil::Intersection does not work
|
||||
|
||||
Example application may be used to make some performance tests, so below you
|
||||
will find some extended parameter description :
|
||||
--skip-dynamic - skip rendering of dynamic objects if you only want to
|
||||
observe static object statistics
|
||||
--skip-static - the same for static objects
|
||||
--dynamic-area-size - size of the area for dynamic rendering. Default = 1000 meters
|
||||
( square 1000m x 1000m ). Along with density defines
|
||||
how many dynamic objects is there in the example.
|
||||
--static-area-size - the same for static objects. Default = 2000 meters
|
||||
( square 2000m x 2000m ).
|
||||
|
||||
Example application defines some parameters (density, LOD ranges, object's triangle count).
|
||||
You may manipulate its values using below described modifiers:
|
||||
--density-modifier - density modifier in percent. Default = 100%.
|
||||
Density ( along with LOD ranges ) defines maximum
|
||||
quantity of rendered objects. registerType() function
|
||||
accepts maximum density ( in objects per square kilometer )
|
||||
as its parameter.
|
||||
--lod-modifier - defines the LOD ranges. Default = 100%.
|
||||
--triangle-modifier - defines the number of triangles in finally rendered objects.
|
||||
Default = 100 %.
|
||||
--instances-per-cell - for static rendering the application builds OSG graph using
|
||||
InstanceCell class ( this class is a modified version of Cell class
|
||||
from osgforest example - it builds simple quadtree from a list
|
||||
of static instances ). This parameter defines maximum number
|
||||
of instances in a single osg::Group in quadtree.
|
||||
If, for example, you modify it to value=100, you will see
|
||||
really big cull time in OSG statistics ( because resulting
|
||||
tree generated by InstanceCell will be very deep ).
|
||||
Default value = 4096 .
|
||||
--export-objects - write object geometries and quadtree of instances to osgt files
|
||||
for later analysis.
|
||||
--use-multi-draw - use glMultiDrawArraysIndirect() instead of glDrawArraysIndirect() in a
|
||||
draw shader. Thanks to this we may render all ( different ) objects
|
||||
using only one draw call. Requires OpenGL version 4.3.
|
||||
|
||||
This application is inspired by Daniel Rákos work : "GPU based dynamic geometry LOD" that
|
||||
may be found under this address : http://rastergrid.com/blog/2010/10/gpu-based-dynamic-geometry-lod/
|
||||
There are however some differences :
|
||||
- Daniel Rákos uses GL queries to count objects to render, while this example
|
||||
uses atomic counters ( no GPU->CPU roundtrip )
|
||||
- this example does not use transform feedback buffers to store intermediate data
|
||||
( it uses texture buffer objects instead ).
|
||||
- I use only the vertex shader to cull objects, whereas Daniel Rákos uses vertex shader
|
||||
and geometry shader ( because only geometry shader can send more than one primitive
|
||||
to transform feedback buffers ).
|
||||
- objects in the example are drawn using glDrawArraysIndirect() function,
|
||||
instead of glDrawElementsInstanced().
|
||||
|
||||
Finally there are some things to consider/discuss :
|
||||
- the whole algorithm exploits nice OpenGL feature that any GL buffer
|
||||
may be bound as any type of buffer ( in our example a buffer is once bound
|
||||
as a texture buffer object, and later is bound as GL_DRAW_INDIRECT_BUFFER ).
|
||||
osg::TextureBuffer class has one handy method to do that trick ( bindBufferAs() ),
|
||||
and new primitive sets use osg::TextureBuffer as input.
|
||||
For now I added new primitive sets to example ( DrawArraysIndirect and
|
||||
MultiDrawArraysIndirect defined in examples/osggpucull/DrawIndirectPrimitiveSet.h ),
|
||||
but if Robert will accept its current implementations ( I mean - primitive
|
||||
sets that have osg::TextureBuffer in constructor ), I may add it to
|
||||
osg/include/PrimitiveSet header.
|
||||
- I used BufferTemplate class writen and published by Aurelien in submission forum
|
||||
some time ago. For some reason this class never got into osg/include, but is
|
||||
really needed during creation of UBOs, TBOs, and possibly SSBOs in the future.
|
||||
I added std::vector specialization to that template class.
|
||||
- I needed to create similar osg::Geometries with variable number of vertices
|
||||
( to create different LODs in my example ). For this reason I've written
|
||||
some code allowing me to create osg::Geometries from osg::Shape descendants.
|
||||
This code may be found in ShapeToGeometry.* files. Examples of use are in
|
||||
osggpucull.cpp . The question is : should this code stay in example, or should
|
||||
it be moved to osgUtil ?
|
||||
- this remark is important for NVidia cards on Linux and Windows : if
|
||||
you have "Sync to VBlank" turned ON in nvidia-settings and you want to see
|
||||
real GPU times in OSG statistics window, you must set the power management
|
||||
settings to "Prefer maximum performance", because when "Adaptive mode" is used,
|
||||
the graphic card's clock may be slowed down by the driver during program execution
|
||||
( On Linux when OpenGL application starts in adaptive mode, clock should work
|
||||
as fast as possible, but after one minute of program execution, the clock slows down ).
|
||||
This happens when GPU time in OSG statistics window is shorter than 3 ms.
|
||||
*/
|
||||
|
||||
#include <osg/Vec4i>
|
||||
#include <osg/Quat>
|
||||
#include <osg/Geometry>
|
||||
|
||||
Reference in New Issue
Block a user