From 74226e5904f5f7ea6a81999d977f55a3a7e52655 Mon Sep 17 00:00:00 2001
From: Robert Osfield <robert@openscenegraph.com>
Date: Mon, 6 Nov 2017 20:49:23 +0000
Subject: [PATCH] Added Pawel's origin commit message into the osggpucull
 source as it examples a lot about how the example works

---
 examples/osggpucull/osggpucull.cpp | 131 +++++++++++++++++++++++++++++
 1 file changed, 131 insertions(+)

diff --git a/examples/osggpucull/osggpucull.cpp b/examples/osggpucull/osggpucull.cpp
index 4b87433b6..7b8bd8a1d 100644
--- a/examples/osggpucull/osggpucull.cpp
+++ b/examples/osggpucull/osggpucull.cpp
@@ -13,6 +13,137 @@
  *
 */
 
+ /** osggpucull example.
+
+  A geometry instancing rendering algorithm consisting of two consequent phases :
+
+    - first phase is a GLSL shader performing object culling and LOD picking ( a culling shader ).
+      Every culled object is represented as GL_POINT in the input osg::Geometry.
+      The output of the culling shader is a set of object LODs that need to be rendered.
+      The output is stored in texture buffer objects. No pixel is drawn to the screen
+      because GL_RASTERIZER_DISCARD mode is used.
+
+    - second phase draws osg::Geometry containing merged LODs using glDrawArraysIndirect()
+      function. Information about quantity of instances to render, its positions and other
+      parameters is sourced from texture buffer objects filled in the first phase.
+
+    The example uses various OpenGL 4.2 features such as texture buffer objects,
+    atomic counters, image units and functions defined in GL_ARB_shader_image_load_store
+    extension to achieve its goal and thus will not work on graphic cards with older OpenGL
+    versions.
+
+    The example was tested on Linux and Windows with NVidia 570 and 580 cards.
+    The tests on AMD cards were not conducted ( due to lack of it ).
+    The tests were performed using OSG revision 14088.
+
+    The main advantages of this rendering method :
+    - instanced rendering capable of drawing thousands of different objects with
+      almost no CPU intervention  ( cull and draw times are close to 0 ms ).
+    - input objects may be sourced from any OSG graph ( for example - information about
+      object points may be stored in a PagedLOD graph. This way we may cover the whole
+      countries with trees, buildings and other objects ).
+      Furthermore if we create osgDB plugins that generate data on the fly, we may
+      generate information for every grass blade for that country.
+    - every object may have its own parameters and thus may be distinct from other objects
+      of the same type.
+    - relatively low memory footprint ( single object information is stored in a few
+      vertex attributes ).
+    - no GPU->CPU roundtrip typical for such methods ( method uses atomic counters
+      and glDrawArraysIndirect() function instead of OpenGL queries. This way
+      information about quantity of rendered objects never goes back to CPU.
+      The typical GPU->CPU roundtrip cost is about 2 ms ).
+    - this example also shows how to render dynamic objects ( objects that may change
+      its position ) with moving parts ( like car wheels or airplane propellers ) .
+      The obvious extension to that dynamic method would be the animated crowd rendering.
+    - rendered objects may be easily replaced ( there is no need to process the whole
+      OSG graphs, because these graphs store only positional information ).
+
+    The main disadvantages of a method :
+    - the maximum quantity of objects to render must be known beforehand
+      ( because texture buffer objects holding data between phases have constant size ).
+    - OSG statistics are flawed ( they don't know anymore how many objects are drawn ).
+    - osgUtil::Intersection does not work
+
+    Example application may be used to make some performance tests, so below you
+    will find some extended parameter description :
+    --skip-dynamic       - skip rendering of dynamic objects if you only want to
+                           observe static object statistics
+    --skip-static        - the same for static objects
+    --dynamic-area-size  - size of the area for dynamic rendering. Default = 1000 meters
+                           ( square 1000m x 1000m ). Along with density defines
+                           how many dynamic objects is there in the example.
+    --static-area-size   - the same for static objects. Default = 2000 meters
+                           ( square 2000m x 2000m ).
+
+    Example application defines some parameters (density, LOD ranges, object's triangle count).
+    You may manipulate its values using below described modifiers:
+    --density-modifier   - density modifier in percent. Default = 100%.
+                           Density ( along with LOD ranges ) defines maximum
+                           quantity of rendered objects. registerType() function
+                           accepts maximum density ( in objects per square kilometer )
+                           as its parameter.
+    --lod-modifier       - defines the LOD ranges. Default = 100%.
+    --triangle-modifier  - defines the number of triangles in finally rendered objects.
+                           Default = 100 %.
+    --instances-per-cell - for static rendering the application builds OSG graph using
+                           InstanceCell class ( this class is a modified version of Cell class
+                           from osgforest example - it builds simple quadtree from a list
+                           of static instances ). This parameter defines maximum number
+                           of instances in a single osg::Group in quadtree.
+                           If, for example, you modify it to value=100, you will see
+                           really big cull time in OSG statistics ( because resulting
+                           tree generated by InstanceCell will be very deep ).
+                           Default value = 4096 .
+    --export-objects     - write object geometries and quadtree of instances to osgt files
+                           for later analysis.
+    --use-multi-draw     - use glMultiDrawArraysIndirect() instead of glDrawArraysIndirect() in a
+                           draw shader. Thanks to this we may render all ( different ) objects
+                           using only one draw call. Requires OpenGL version 4.3.
+
+    This application is inspired by Daniel Rákos work : "GPU based dynamic geometry LOD" that
+    may be found under this address : http://rastergrid.com/blog/2010/10/gpu-based-dynamic-geometry-lod/
+    There are however some differences :
+    - Daniel Rákos uses GL queries to count objects to render, while this example
+      uses atomic counters ( no GPU->CPU roundtrip )
+    - this example does not use transform feedback buffers to store intermediate data
+      ( it uses texture buffer objects instead ).
+    - I use only the vertex shader to cull objects, whereas Daniel Rákos uses vertex shader
+      and geometry shader ( because only geometry shader can send more than one primitive
+      to transform feedback buffers ).
+    - objects in the example are drawn using glDrawArraysIndirect() function,
+      instead of glDrawElementsInstanced().
+
+    Finally there are some things to consider/discuss  :
+    - the whole algorithm exploits nice OpenGL feature that any GL buffer
+      may be bound as any type of buffer ( in our example a buffer is once bound
+      as a texture buffer object, and later is bound as GL_DRAW_INDIRECT_BUFFER ).
+      osg::TextureBuffer class has one handy method to do that trick ( bindBufferAs() ),
+      and new primitive sets use osg::TextureBuffer as input.
+      For now I added new primitive sets to example ( DrawArraysIndirect and
+      MultiDrawArraysIndirect defined in examples/osggpucull/DrawIndirectPrimitiveSet.h ),
+      but if Robert will accept its current implementations ( I mean - primitive
+      sets that have osg::TextureBuffer in constructor ), I may add it to
+      osg/include/PrimitiveSet header.
+    - I used BufferTemplate class writen and published by Aurelien in submission forum
+      some time ago. For some reason this class never got into osg/include, but is
+      really needed during creation of UBOs, TBOs, and possibly SSBOs in the future.
+      I added std::vector specialization to that template class.
+    - I needed to create similar osg::Geometries with variable number of vertices
+      ( to create different LODs in my example ). For this reason I've written
+      some code allowing me to create osg::Geometries from osg::Shape descendants.
+      This code may be found in ShapeToGeometry.* files. Examples of use are in
+      osggpucull.cpp . The question is : should this code stay in example, or should
+      it be moved to osgUtil ?
+    - this remark is important for NVidia cards on Linux and Windows : if
+      you have "Sync to VBlank" turned ON in nvidia-settings and you want to see
+      real GPU times in OSG statistics window, you must set the power management
+      settings to "Prefer maximum performance", because when "Adaptive mode" is used,
+      the graphic card's clock may be slowed down by the driver during program execution
+      ( On Linux when OpenGL application starts in adaptive mode, clock should work
+      as fast as possible, but after one minute of program execution, the clock slows down ).
+      This happens when GPU time in OSG statistics window is shorter than 3 ms.
+*/
+
 #include <osg/Vec4i>
 #include <osg/Quat>
 #include <osg/Geometry>