Detector Geometry in Opticks

Detector Geometry in Opticks :
GPU Optical Simulation with NVIDIA® OptiX™

Open source, https://bitbucket.org/simoncblyth/opticks

Simon C Blyth, IHEP, CAS — Compute Accelerator Forum Meeting, March 10 2021, Virtual


Outline

/env/presentation/newtons-opticks.png
 


Outline Extras

/env/presentation/newtons-opticks.png
 

JUNO Optical Photon Simulation Problem...









NVIDIA Marbles At Night RTX Demo


NVIDIA Marbles At Night RTX Demo 2



Ampere : 2nd Generation RTX

NVIDIA:
"...triple double over Turing..."

GPU Ray Tracing (RT) APIs Converging

Three Similar Interfaces over same RTX tech:

NVIDIA OptiX (Linux, Windows) [2009]

Vulkan RT (Linux, Windows) [final spec 2020]

Microsoft DXR : DirectX 12 Ray Tracing (Windows) [2018]



Metal Ray Tracing API (macOS) [introduced 2020[1]]

[1] https://developer.apple.com/videos/play/wwdc2020/10012/


RTX Execution Pipeline : Common to DirectX RT, Vulkan NV RT, OptiX

Acceleration Structure (AS) traversal is central to pipeline performance

/env/presentation/nvidia/rtx_api_execution_pipeline_half.png

"The RTX Shader Binding Table (SBT) Three Ways", Will Usher

RG : Ray Generation

IS : Intersect

CH : Closest Hit

AH : Any Hit

MS : Miss

GPU Opticks

RG
Cerenkov
Scintillation
"bounce" loop
IS
primitives, CSG
CH
IS->RG

Spatial Index Acceleration Structure













Two-Level Hierarchy : Instance transforms (TLAS) over Geometry (BLAS)

OptiX supports multiple instance levels : IAS->IAS->GAS BUT: Simple two-level is faster : works in hardware RT Cores

AS
Acceleration Structure
TLAS (IAS)
4x4 transforms, refs to BLAS
BLAS (GAS)
triangles : vertices, indices
custom primitives : AABB
AABB
axis-aligned bounding box

SBT : Shader Binding Table

Flexibly binds together:

  1. geometry objects
  2. shader programs
  3. data for shader programs

Hidden in OptiX 1-6 APIs


Geant4OpticksWorkflow


NPY Serialization : Fundamental to Opticks Geometry Model

Separate address space -> cudaMemcpy
upload/download : host(CPU)<->device(GPU)

Array-oriented : separate data from compute

Opticks/NPY pkg : Array Interface Using glm::mat4 glm::vec4

Opticks/GGeo classes implemented with NPY arrays

[1] http://www.numpy.org/neps/nep-0001-npy-format.html


Persist NumPy Arrays to .npy Files from Python, Examine File Format

IPython persisting NumPy arrays:

In [1]: a = np.arange(10)           # array of 10 long (int64) 
In [2]: a
Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]: np.save("/tmp/a.npy", a )   # serialize array to file 

In [4]: a2 = np.load("/tmp/a.npy")  # load array into memory 
In [5]: assert np.all( a == a2 )    # check all elements the same 

IPython examine NPY file format:

In [6]: !xxd /tmp/a.npy             # xxd hexdump file contents 
                                    # minimal header : type and shape 

00000000: 934e 554d 5059 0100 7600 7b27 6465 7363  .NUMPY..v.{'desc
00000010: 7227 3a20 273c 6938 272c 2027 666f 7274  r': '<i8', 'fort
00000020: 7261 6e5f 6f72 6465 7227 3a20 4661 6c73  ran_order': Fals
00000030: 652c 2027 7368 6170 6527 3a20 2831 302c  e, 'shape': (10,
00000040: 292c 207d 2020 2020 2020 2020 2020 2020  ), }
00000050: 2020 2020 2020 2020 2020 2020 2020 2020
00000060: 2020 2020 2020 2020 2020 2020 2020 2020
00000070: 2020 2020 2020 2020 2020 2020 2020 200a                 .
00000080: 0000 0000 0000 0000 0100 0000 0000 0000  ................
00000090: 0200 0000 0000 0000 0300 0000 0000 0000  ................
..

Translation 1st Step : Geant4 -> Opticks/GGeo : 1->1 conversions

Structural volumes : G4PVPlacement ->

GVolume
JUNO: tree of ~300,000 GVolume

Solid shapes : G4VSolid ->

GMesh (collected into GMeshLib)
arrays: vertices, indices
ref to NCSG
NCSG
tree of NNode (CSG constituents)

Material/surface properties as function of wavelength

Translation steered by X4 package

https://bitbucket.org/simoncblyth/opticks/src/master/extg4/X4PhysicalVolume.hh


Translation 2nd Step : Opticks/GGeo Instancing : "Factorizes" Geometry

Structural volumes vs solid shapes
distinction for convenience only, distinction is movable

JUNO: ~300,000 GVolume : mostly small repeated groups (PMTs)

GGeo/GInstancer

  1. GVolume progeny digest : shapes+transforms -> subtree ident.
  2. find repeated digests, disqualifying repeats inside others
  3. label all nodes with repeat index, non-repeated remainder : 0

For each repeat+remainder create GMergedMesh:

GMergedMesh -> IAS+GAS

https://bitbucket.org/simoncblyth/opticks/src/master/ggeo/GInstancer.hh


Optimizing Geometry : Split BLAS to avoid overlapping bbox

/env/presentation/nvidia/optimize/split_blas_half.png

Optimization : deciding where to draw lines between:

  1. structure and solid (IAS and GAS)
  2. solids within GAS (bbox choice to minimize traversal intersection tests)

Where those lines are drawn defines the AS

https://developer.nvidia.com/blog/best-practices-using-nvidia-rtx-ray-tracing/


Optimizing Geometry : Merge BLAS when lots of overlaps

/env/presentation/nvidia/optimize/merge_blas_half.png

https://developer.nvidia.com/blog/best-practices-using-nvidia-rtx-ray-tracing/


Ray Intersection with Transformed Object -> Geometry Instancing

/env/presentation/instancing/ray_intersection_in_two_spaces_p308_shirley_ch13_half.png

Fig 13.5 "Realistic Ray Tracing", Peter Shirley

Advantages apply equally to acceleration structures

Equivalent Intersects -> same t

  1. ray with ellipsoid : M*p
  2. M-1 ray with sphere : p

Local Frame Advantages

  1. simpler intersect (sphere vs ellipsoid)
  2. closer to origin -> better precision

Geometry Instancing Advantages

Requirements


OpenGL Instancing 2


OptiX Instancing


G4VSolid -> CUDA Intersect Functions for ~10 Primitives

/env/presentation/tboolean_parade_sep2017.png

Sphere, Cylinder, Disc, Cone, Convex Polyhedron, Hyperboloid, Torus, ...


intersect_analytic.cu : for all shapes/concatenated-shapes/primitives

OptiXRap/cu/intersect_analytic.cu

425 RT_PROGRAM void intersect(int primIdx)
426 {
427     const Prim& prim    = primBuffer[primIdx];
428
429     unsigned partOffset  = prim.partOffset() ;
430     unsigned numParts    = prim.numParts() ;
431     unsigned primFlag    = prim.primFlag() ;
432
433     if(primFlag == CSG_FLAGNODETREE)
434     {
435         evaluative_csg( prim, primIdx );
436     }
474 }
229 RT_PROGRAM void bounds (int primIdx, float result[6])
230 {
251     optix::Aabb* aabb = (optix::Aabb*)result;
252     *aabb = optix::Aabb();
253
254     uint4 identity =
        identityBuffer[instance_index*primitive_count+primIdx] ;
...
271     const Prim& prim    = primBuffer[primIdx];
...
294     if(primFlag == CSG_FLAGNODETREE || primFlag == CSG_FLAGINVISIBLE )
295     {
301         csg_bounds_prim(primIdx, prim, aabb);
318     }
385 }

https://bitbucket.org/simoncblyth/opticks/src/master/optixrap/cu/intersect_analytic.cu


G4Boolean -> CUDA/OptiX Intersection Program Implementing CSG

Complete Binary Tree, pick between pairs of nearest intersects:

UNION tA < tB Enter B Exit B Miss B
Enter A ReturnA LoopA ReturnA
Exit A ReturnA ReturnB ReturnA
Miss A ReturnB ReturnB ReturnMiss
[1] Ray Tracing CSG Objects Using Single Hit Intersections, Andrew Kensler (2006)
with corrections by author of XRT Raytracer http://xrt.wikidot.com/doc:csg
[2] https://bitbucket.org/simoncblyth/opticks/src/tip/optixrap/cu/csg_intersect_boolean.h
Similar to binary expression tree evaluation using postorder traverse.

Opticks : Translates G4 Geometry to GPU, Without Approximation

G4 Structure Tree -> Instance+Global Arrays -> OptiX

Group structure into repeated instances + global remainder:

instancing -> huge memory savings for JUNO PMTs



        
        

j1808_top_rtx


j1808_top_ogl


Opticks Analytic Daya Bay Near Site, GPU Raytrace (3)


Opticks Analytic Daya Bay Near Site, GPU Raytrace (1)


scan-pf-1_Opticks_vs_Geant4 2





JUNO analytic, 400M photons from center Speedup
Geant4 Extrap. 95,600 s (26 hrs)  
Opticks RTX ON (i) 58 s 1650x

Useful Speedup > 1500x : But Why Not Giga Rays/s ? (1 Photon ~10 Rays)













OptiX Performance Tools and Tricks, David Hart, NVIDIA https://developer.nvidia.com/siggraph/2019/video/sig915-vid


Summary : Opticks Detector Geometry

/env/presentation/1px.png

Opticks : state-of-the-art GPU ray tracing applied to optical photon simulation and integrated with Geant4, giving a leap in performance that eliminates memory and time bottlenecks.

/env/presentation/1px.png
  • Drastic speedup -> better detector understanding -> greater precision
    • any simulation limited by optical photons can benefit
    • more photon limited -> more overall speedup (99% -> 100x)
/env/presentation/1px.png
https://bitbucket.org/simoncblyth/opticks code repository
https://simoncblyth.bitbucket.io presentations and videos
https://groups.io/g/opticks forum/mailing list archive
email:opticks+subscribe@groups.io subscribe to mailing list

Constructive Solid Geometry (CSG) : Shapes defined "by construction"

Simple by construction definition, implicit geometry.

CSG expressions

3D Parametric Ray : ray(t) = r0 + t rDir

Ray Geometry Intersection

How to pick exactly ?


CSG : Which primitive intersect to pick ?

Classical Roth diagram approach

Computational requirements:

BUT : High performance on GPU requires:

Classical approach not appropriate on GPU


CSG Complete Binary Tree Serialization -> simplifies GPU side

Geant4 solid -> CSG binary tree (leaf primitives, non-leaf operators, 4x4 transforms on any node)

Serialize to complete binary tree buffer:

Height 3 complete binary tree with level order indices:

                                                   depth     elevation

                     1                               0           3

          10                   11                    1           2

     100       101        110        111             2           1

 1000 1001  1010 1011  1100 1101  1110  1111         3           0

postorder_next(i,elevation) = i & 1 ? i >> 1 : (i << elevation) + (1 << elevation) ; // from pattern of bits

Postorder tree traverse visits all nodes, starting from leftmost, such that children are visited prior to their parents.


Evaluative CSG intersection Pseudocode : recursion emulated

fullTree = PACK( 1 << height, 1 >> 1 )  // leftmost, parent_of_root(=0)
tranche.push(fullTree, ray.tmin)

while (!tranche.empty)         // stack of begin/end indices 
{
    begin, end, tmin <- tranche.pop  ; node <- begin ;
    while( node != end )                   // over tranche of postorder traversal 
    {
        elevation = height - TREE_DEPTH(node) ;
        if(is_primitive(node)){ isect <- intersect_primitive(node, tmin) ;  csg.push(isect) }
        else{
            i_left, i_right = csg.pop, csg.pop            // csg stack of intersect normals, t 
            l_state = CLASSIFY(i_left, ray.direction, tmin)
            r_state = CLASSIFY(i_right, ray.direction, tmin)
            action = LUT(operator(node), leftIsCloser)(l_state, r_state)

            if(      action is ReturnLeft/Right)     csg.push(i_left or i_right)
            else if( action is LoopLeft/Right)
            {
                left = 2*node ; right = 2*node + 1 ;
                endTranche = PACK( node,  end );
                leftTranche = PACK(  left << (elevation-1), right << (elevation-1) )
                rightTranche = PACK(  right << (elevation-1),  node  )
                loopTranche = action ? leftTranche : rightTranche

                tranche.push(endTranche, tmin)
                tranche.push(loopTranche, tminAdvanced )  // subtree re-traversal with changed tmin 
                break ; // to next tranche
            }
        }
        node <- postorder_next(node, elevation)         // bit twiddling postorder 
    }
}
isect = csg.pop();         // winning intersect  

https://bitbucket.org/simoncblyth/opticks/src/tip/optixrap/cu/csg_intersect_boolean.h


CSG Deep Tree : JUNO "fastener"

/env/presentation/x016_deeptree.png

CSG Deep Tree : height 11 before balancing, too deep for GPU raytrace

 NTreeAnalyse height 11 count 25    ( un : union,  cy : cylinder, di : difference )
                                                                                       un

                                                                               un              di

                                                                       un          cy      cy      cy

                                                               un          cy

                                                       un          cy

                                               un          cy

                                       un          cy

                               un          cy

                       un          cy

               un          cy

       di          cy

   cy      cy

CSG trees are non-unique


CSG Deep Tree : Positivize tree using De Morgan's laws

1st step to allow balancing : Positivize : remove CSG difference di operators

                                                     ...    ...

                                               un          cy

                                       un          cy

                               un          cy

                       un          cy

               un          cy

       di          cy

   cy      cy

                                                     ...    ...

                                               un          cy

                                       un          cy

                               un          cy

                       un          cy

               un          cy

       in          cy

   cy      !cy


CSG Deep Tree : height 4 after balancing, OK for GPU raytrace

NTreeAnalyse height 4 count 25
                                                               un

                               un                                                      un

               un                              un                      un                      in

       un              un              un              un          cy          in          cy     !cy

   cy      cy      cy      cy      cy      cy      cy      cy              cy     !cy

un : union, in : intersect, cy : cylinder, !cy : complemented cylinder

Balancing positive tree:

  1. classify tree operators and their placement
    • mono-operator trees can easily be rearranged as union un and intersection in operators are commutative
    • mono-operator above bileaf level can also easily be rearranged as the bileaf can be split off and combined
  2. create complete binary tree of appropriate size filled with placeholders
  3. populate the tree replacing placeholders
  4. prune (pull primitives up to avoid placeholder pairings)

Not a general balancer : but succeeds with all CSG solid trees from Daya Bay and JUNO so far

https://bitbucket.org/simoncblyth/opticks/src/default/npy/NTreeBalance.cpp


CSG Examples


Torus : much more difficult/expensive than other primitives

3D parametric ray : ray(x,y,z;t) = rayOrigin + t * rayDirection

High order equation

Best Solution : replace torus


Torus : different artifacts as change implementation/params/viewpoint

/env/presentation/torus_cloud_artifact_2017_08_14.png /env/presentation/torus_cuts_artifact_2017_08_08.png /env/presentation/torus_fan_artifact_2017_07_28.png