Opticks : GPU Optical Photon Simulation using NVIDIA OptiX 7 and NVIDIA CUDA

Opticks : GPU Optical Photon Simulation
via NVIDIA® OptiX™ 7, NVIDIA® CUDA™

Open source, https://bitbucket.org/simoncblyth/opticks

Simon C Blyth, IHEP, CAS — (Presented by Dr Tao Lin, IHEP, CAS) — CHEP, Norfolk VA, 8 May 2023


Outline

newtons-opticks.png
 


JUNO_Intro_2


Optical Photon Simulation Problem...


Optical Photon Simulation ≈ Ray Traced Image Rendering

simulation
photon parameters at sensors (PMTs)
rendering
pixel values at image plane

Much in common : geometry, light sources, optical physics

Many Applications of ray tracing :


NVIDIA Ada : 3rd Generation RTX


Hardware accelerated Ray tracing (RT Cores) in the Data Center

NVIDIA L4 Tensor Core GPU (Released 2023/03)

NVIDIA L4 Tensor Core GPU (Data Center, low profile+power)


NVIDIA® OptiX™ Ray Tracing Engine -- Accessible GPU Ray Tracing

OptiX makes GPU ray tracing accessible

OptiX features

https://developer.nvidia.com/rtx/ray-tracing/optix

User provides (Green):


NVIDIA OptiX 7 : Entirely new thin API => Full Opticks Re-implementation

NVIDIA OptiX 6->7 : drastically slimmed down

Advantages of 6->7 transition

BUT: demanded full re-implementation of Opticks


Geant4 + Opticks + NVIDIA OptiX 7 : Hybrid Workflow

https://bitbucket.org/simoncblyth/opticks

Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory


Geant4 + Opticks + NVIDIA OptiX 7 : Hybrid Workflow 2


Primary Packages and Structs Of Re-Implemented Opticks

SysRap : many small CPU/GPU headers
  • stree.h,snode.h : geometry base types
  • sctx.h sphoton.h : event base types
  • NP.hh : serialization into NumPy .npy format files
QUDARap
  • QSim : optical photon simulation steering
  • QScint,QCerenkov,QProp,... : modular CUDA implementation
U4
  • U4Tree : convert geometry into stree.h
  • U4 : collect gensteps, return hits
CSG
  • CSGFoundry/CSGSolid/CSGPrim/CSGNode geometry model
  • csg_intersect_tree.h csg_intersect_node.h csg_intersect_leaf.h : CPU/GPU intersection functions
CSGOptiX
  • CSGOptiX.h : manage geometry convert from CSG to OptiX 7 IAS GAS, pipeline creation
  • CSGOptiX7.cu : compiled into ptx that becomes OptiX 7 pipeline
    • includes QUDARap headers for simulation
    • includes csg_intersect_tree.h,.. headers for CSG intersection
G4CX
  • G4CXOpticks : Top level Geant4 geometry interface

Full re-implementation of Opticks for NVIDIA OptiX 7 API

Old simulation (OptiXRap) New simulation (QUDARap/qsim.h + CSGOptiX, CSG)
  • implemented on top of old OptiX API
  • pure CUDA implementation
  • OptiX use kept separate, just for intersection
  • monolithic .cu
  • GPU only implementation
  • deep stack of support code
  • many small headers
  • many GPU+CPU headers
  • shallow stack : QUDARap depends only on SysRap
  • most code in GPU only context, even when not needing OptiX or CUDA
  • strict code segregation
    • code not needing GPU in SysRap not QUDARap
  • testing : GPU only, coarse
  • testing : CPU+GPU , fine-grained
  • curand mocking on CPU
  • limited CPU/GPU code sharing
  • maximal sharing : SEvt.hh, sphoton.h, ...
  • timeconsuming manual random alignment conducted via debugger
  • new systematic approach to random alignment

Goals of re-implementation : flexible, modular GPU simulation, easily testable, less code


Two-Level Hierarchy : Instance transforms (TLAS) over Geometry (BLAS)

OptiX supports multiple instance levels : IAS->IAS->GAS BUT: Simple two-level is faster : works in hardware RT Cores

AS
Acceleration Structure
TLAS (aka IAS)
4x4 transforms, refs to BLAS
BLAS (aka GAS)
triangles : vertices, indices
custom primitives : AABB
AABB
axis-aligned bounding box

SBT : Shader Binding Table

Flexibly binds together:

  1. geometry objects
  2. shader programs
  3. data for shader programs

Hidden in OptiX 1-6 APIs


Geometry Model Translation : Geant4 => CSGFoundry => NVIDIA OptiX 7

Geant4 Geometry Model (JUNO: 300k PV, deep hierarchy)

PV G4VPhysicalVolume placed, refs LV
LV G4LogicalVolume unplaced, refs SO
SO G4VSolid,G4BooleanSolid binary tree of SO "nodes"

Opticks CSGFoundry Geometry Model (index references)

struct Notes Geant4 Equivalent
CSGFoundry vectors of the below, easily serialized + uploaded None
qat4 4x4 transform refs CSGSolid using "spare" 4th column  
CSGSolid refs sequence of CSGPrim Groups of nearby PV, LV + Remainder
eg JUNO CSGSolid numPrim [3089, 5, 11, 14, 6, 1, 1, 1, 1, 130]
CSGPrim bbox, refs sequence of CSGNode, root of CSG Tree of nodes root G4VSolid
CSGNode CSG node parameters (JUNO: ~23k CSGNode) node G4VSolid

NVIDIA OptiX 7 Geometry Acceleration Structures (JUNO: 1 IAS + 10 GAS, 2-level hierarchy)

IAS Instance Acceleration Structures JUNO: 1 IAS created from vector of ~50k qat4 (JUNO)
GAS Geometry Acceleration Structures JUNO: 10 GAS created from 10 CSGSolid (which refs CSGPrim,CSGNode )

[9]cxr_i0_t8,_-1 : EXCLUDE SLOWEST


QUDARap : CUDA Optical Simulation Implementation

  CPU GPU header
context steering QSim.hh qsim.h
curandState setup QRng.hh qrng.h
property interpolation QProp.hh qprop.h
event handling QEvent.hh qevent.h
Cerenkov generation QCerenkov.hh qcerenkov.h
Scintillation generation QScint.hh qscint.h
texture handling QTex.hh  

Aims of counterpart code organization:


n-ary CSG Compound "List-Nodes" => Much Smaller CSG trees

CSG_CONTIGUOUS Union
user guarantees contiguous
  • like G4MultiUnion of prim only
CSG_DISCONTIGUOUS Union
user guarantees no overlaps
  • => simple, low resource intersect
  • eg "union of holes" to be CSG subtracted
CSG_OVERLAP Intersection
user guarantees overlap
  • eg general G4Sphere: inner radius, thetacut, phicut
Communicate shape more precisely
=> better suited intersect alg => less resources => faster

Generalized Opticks CSG into three levels : tree < node < leaf (avoids recursion in intersect)


CSG_CONTIGUOUS Union : n-ary (not bin-ary) CSG intersection

  1. zeroth pass : find nearest_enter and count first exits
  2. if zero exits => outside compound => return nearest_enter
  3. first pass : collect enter distances, farthest_exit
  4. order enter indices making enter distances ascend
    • n-ary : store, sort enters (cf bin-ary : compare two)
    • no tree overheads, but must store+sort distances
  5. 2nd pass : loop over enters in distance order
    • contiguous requirement : enter < farthest_exit so far
    • find Exits for Enters that qualify as contiguous, update farthest_exit
  6. return farthest_exit that qualifies as contiguous
         +----------------+     +-------------------+                  DISJOINT MUST BE DISQUALIFIED
         |B               |     |D                  |
    +----|----+      +----|-----|----+       +------|----------+             +-----------+
    |A   |    |      |C   |     |    |       |E     |          |             |           |
    |    |    |      |    |     |    |       |      |          |             |           |
    | 0 E1    X2     E3  X4    E5   X6      E7     X8        [X9]           E10         X11
    |    |    |      |    |     |    |       |      |          |             |           |
    |    |    |      |    |     |    |       |      |          |             |           |
    +----|----+      +----|-----|----+       +------|----------+             +-----------+
         |                |     |                   |
         +----------------+     +-------------------+

         E           E          E            E                               E
              X           X          X              X          X                         X
 

CSG_DISCONTIGUOUS Union : CSG intersection

User guarantees : absolutely no overlapping between constituents

 +-------+          +-------+          +-------+          +-------+         +-------+
 |       |          |       |          |       |          |       |         |       |
 |       |          |       |          |       |          |       |         |       |
 +-------+          +-------+          +-------+          +-------+         +-------+

 +-------+          +-------+          +-------+          +-------+         +-------+
 |       |          |       |          |       |          |       |         |       |
 |       |          |       |          |       |          |       |         |       |
 +-------+          +-------+          +-------+          +-------+         +-------+

 

Multi-Layer Thin Film (A,R,T) Calc using TMM Calc (Custom4 Package)

C4OpBoundaryProcess.hh
G4OpBoundaryProcess with C4CustomART.h
C4CustomART.h
integrate custom boundary process and TMM calculation
C4MultiLayrStack.h : CPU/GPU TMM calculation of (A,R,T)

based on complex refractive indices and layer thicknesses

  • GPU: using thrust::complex CPU:using std::complex

Custom4: Simplifies JUNO PMT Optical Model + Geometry

GEOM/FewPMT/U4SimtraceTest/1/figs/U4SimtraceTest/mpcap/FewPMT_demo.png