From 74226e5904f5f7ea6a81999d977f55a3a7e52655 Mon Sep 17 00:00:00 2001 From: Robert Osfield Date: Mon, 6 Nov 2017 20:49:23 +0000 Subject: [PATCH] Added Pawel's origin commit message into the osggpucull source as it examples a lot about how the example works --- examples/osggpucull/osggpucull.cpp | 131 +++++++++++++++++++++++++++++ 1 file changed, 131 insertions(+) diff --git a/examples/osggpucull/osggpucull.cpp b/examples/osggpucull/osggpucull.cpp index 4b87433b6..7b8bd8a1d 100644 --- a/examples/osggpucull/osggpucull.cpp +++ b/examples/osggpucull/osggpucull.cpp @@ -13,6 +13,137 @@ * */ + /** osggpucull example. + + A geometry instancing rendering algorithm consisting of two consequent phases : + + - first phase is a GLSL shader performing object culling and LOD picking ( a culling shader ). + Every culled object is represented as GL_POINT in the input osg::Geometry. + The output of the culling shader is a set of object LODs that need to be rendered. + The output is stored in texture buffer objects. No pixel is drawn to the screen + because GL_RASTERIZER_DISCARD mode is used. + + - second phase draws osg::Geometry containing merged LODs using glDrawArraysIndirect() + function. Information about quantity of instances to render, its positions and other + parameters is sourced from texture buffer objects filled in the first phase. + + The example uses various OpenGL 4.2 features such as texture buffer objects, + atomic counters, image units and functions defined in GL_ARB_shader_image_load_store + extension to achieve its goal and thus will not work on graphic cards with older OpenGL + versions. + + The example was tested on Linux and Windows with NVidia 570 and 580 cards. + The tests on AMD cards were not conducted ( due to lack of it ). + The tests were performed using OSG revision 14088. + + The main advantages of this rendering method : + - instanced rendering capable of drawing thousands of different objects with + almost no CPU intervention ( cull and draw times are close to 0 ms ). + - input objects may be sourced from any OSG graph ( for example - information about + object points may be stored in a PagedLOD graph. This way we may cover the whole + countries with trees, buildings and other objects ). + Furthermore if we create osgDB plugins that generate data on the fly, we may + generate information for every grass blade for that country. + - every object may have its own parameters and thus may be distinct from other objects + of the same type. + - relatively low memory footprint ( single object information is stored in a few + vertex attributes ). + - no GPU->CPU roundtrip typical for such methods ( method uses atomic counters + and glDrawArraysIndirect() function instead of OpenGL queries. This way + information about quantity of rendered objects never goes back to CPU. + The typical GPU->CPU roundtrip cost is about 2 ms ). + - this example also shows how to render dynamic objects ( objects that may change + its position ) with moving parts ( like car wheels or airplane propellers ) . + The obvious extension to that dynamic method would be the animated crowd rendering. + - rendered objects may be easily replaced ( there is no need to process the whole + OSG graphs, because these graphs store only positional information ). + + The main disadvantages of a method : + - the maximum quantity of objects to render must be known beforehand + ( because texture buffer objects holding data between phases have constant size ). + - OSG statistics are flawed ( they don't know anymore how many objects are drawn ). + - osgUtil::Intersection does not work + + Example application may be used to make some performance tests, so below you + will find some extended parameter description : + --skip-dynamic - skip rendering of dynamic objects if you only want to + observe static object statistics + --skip-static - the same for static objects + --dynamic-area-size - size of the area for dynamic rendering. Default = 1000 meters + ( square 1000m x 1000m ). Along with density defines + how many dynamic objects is there in the example. + --static-area-size - the same for static objects. Default = 2000 meters + ( square 2000m x 2000m ). + + Example application defines some parameters (density, LOD ranges, object's triangle count). + You may manipulate its values using below described modifiers: + --density-modifier - density modifier in percent. Default = 100%. + Density ( along with LOD ranges ) defines maximum + quantity of rendered objects. registerType() function + accepts maximum density ( in objects per square kilometer ) + as its parameter. + --lod-modifier - defines the LOD ranges. Default = 100%. + --triangle-modifier - defines the number of triangles in finally rendered objects. + Default = 100 %. + --instances-per-cell - for static rendering the application builds OSG graph using + InstanceCell class ( this class is a modified version of Cell class + from osgforest example - it builds simple quadtree from a list + of static instances ). This parameter defines maximum number + of instances in a single osg::Group in quadtree. + If, for example, you modify it to value=100, you will see + really big cull time in OSG statistics ( because resulting + tree generated by InstanceCell will be very deep ). + Default value = 4096 . + --export-objects - write object geometries and quadtree of instances to osgt files + for later analysis. + --use-multi-draw - use glMultiDrawArraysIndirect() instead of glDrawArraysIndirect() in a + draw shader. Thanks to this we may render all ( different ) objects + using only one draw call. Requires OpenGL version 4.3. + + This application is inspired by Daniel Rákos work : "GPU based dynamic geometry LOD" that + may be found under this address : http://rastergrid.com/blog/2010/10/gpu-based-dynamic-geometry-lod/ + There are however some differences : + - Daniel Rákos uses GL queries to count objects to render, while this example + uses atomic counters ( no GPU->CPU roundtrip ) + - this example does not use transform feedback buffers to store intermediate data + ( it uses texture buffer objects instead ). + - I use only the vertex shader to cull objects, whereas Daniel Rákos uses vertex shader + and geometry shader ( because only geometry shader can send more than one primitive + to transform feedback buffers ). + - objects in the example are drawn using glDrawArraysIndirect() function, + instead of glDrawElementsInstanced(). + + Finally there are some things to consider/discuss : + - the whole algorithm exploits nice OpenGL feature that any GL buffer + may be bound as any type of buffer ( in our example a buffer is once bound + as a texture buffer object, and later is bound as GL_DRAW_INDIRECT_BUFFER ). + osg::TextureBuffer class has one handy method to do that trick ( bindBufferAs() ), + and new primitive sets use osg::TextureBuffer as input. + For now I added new primitive sets to example ( DrawArraysIndirect and + MultiDrawArraysIndirect defined in examples/osggpucull/DrawIndirectPrimitiveSet.h ), + but if Robert will accept its current implementations ( I mean - primitive + sets that have osg::TextureBuffer in constructor ), I may add it to + osg/include/PrimitiveSet header. + - I used BufferTemplate class writen and published by Aurelien in submission forum + some time ago. For some reason this class never got into osg/include, but is + really needed during creation of UBOs, TBOs, and possibly SSBOs in the future. + I added std::vector specialization to that template class. + - I needed to create similar osg::Geometries with variable number of vertices + ( to create different LODs in my example ). For this reason I've written + some code allowing me to create osg::Geometries from osg::Shape descendants. + This code may be found in ShapeToGeometry.* files. Examples of use are in + osggpucull.cpp . The question is : should this code stay in example, or should + it be moved to osgUtil ? + - this remark is important for NVidia cards on Linux and Windows : if + you have "Sync to VBlank" turned ON in nvidia-settings and you want to see + real GPU times in OSG statistics window, you must set the power management + settings to "Prefer maximum performance", because when "Adaptive mode" is used, + the graphic card's clock may be slowed down by the driver during program execution + ( On Linux when OpenGL application starts in adaptive mode, clock should work + as fast as possible, but after one minute of program execution, the clock slows down ). + This happens when GPU time in OSG statistics window is shorter than 3 ms. +*/ + #include #include #include