Opticks : Innovation in Optical Photon Simulation

Opticks : Innovation in Optical Photon Simulation via
state-of-the-art GPU Ray Tracing from NVIDIA® OptiX™

Open source, https://bitbucket.org/simoncblyth/opticks

Simon C Blyth, IHEP, CAS — Jan 2022, Hong Kong Workshop: Innovation in HEP Detectors & Computing

JUNO Optical Photon Simulation Problem...

Optical Photon Simulation ≈ Ray Traced Image Rendering

Much in common : geometry, light sources, optical physics

Many Applications of ray tracing :

Ray-tracing vs Rasterization

/env/presentation/nvidia/nv_rasterization.png /env/presentation/nvidia/nv_raytrace.png

Path Tracing in Production 1

Path Tracing in Production 2

The Rendering Equation 1

The Rendering Equation 2

The Rendering Equation 3

Samples per Pixel 1

Samples per Pixel 2

Optical Simulation : Computer Graphics vs Physics

CG Rendering "Simulation" Particle Physics Simulation
simulates: image formation, vision simulates photons: generation, propagation, detection
(red, green, blue) wavelength range eg 400-700 nm
ignore polarization polarization vector propagated throughout
participating media: clouds,fog,fire [1] bulk scattering: Rayleigh, MIE
human exposure times nanosecond time scales
equilibrium assumption transient phenomena
ignores light speed, time arrival time crucial, speed of light : 30 cm/ns

Despite differences many techniques+hardware+software directly applicable to physics eg:

Potentially Useful CG techniques for "billion photon simulations"

[1] search for: "Volumetric Rendering Equation"

SIGGRAPH_2018_Announcing_Worlds_First_Ray_Tracing_GPU 2

10 Giga Rays/s


Project Sol

Ampere : 2nd Generation RTX

NVIDIA Ampere (2020):
"...triple double over Turing (2018, 10 GigaRays/s)..."

NVIDIA Marbles At Night RTX Demo

GTC 2020, NVIDIA Marbles at Night RTX Demo

NVIDIA Marbles At Night RTX Demo 2

GTC 2020, NVIDIA Marbles at Night RTX Demo

GPU Ray Tracing (RT) APIs Give Access to NVIDIA RTX

Three Similar Interfaces over same RTX tech:

NVIDIA OptiX (Linux, Windows) [2009]

Vulkan RT (Linux, Windows) [final spec 2020]

Microsoft DXR : DirectX 12 Ray Tracing (Windows) [2018]

Metal Ray Tracing API (macOS) [introduced 2020[1]]

[1] https://developer.apple.com/videos/play/wwdc2020/10012/

Spatial Index Acceleration Structure

NVIDIA® OptiX™ Ray Tracing Engine -- http://developer.nvidia.com/optix

OptiX makes GPU ray tracing accessible

NVIDIA expertise:


User provides (Yellow):

[1] Turing+ GPUs eg NVIDIA TITAN RTX

NVIDIA OptiX 7 : Entirely new thin API (Introduced Aug 2019)

NVIDIA OptiX 6->7 : drastically slimmed down


More control/flexibility over everything.

  • Fully benefit from future GPUs
  • Keep pace with state-of-the-art GPU ray tracing

Demands much more developer effort than OptiX 6

  • Major re-implementation of Opticks required

LATEST: Opticks transition from 6->7 is ongoing


Geant4OpticksWorkflow 2

Opticks : Translates G4 Optical Physics to CUDA/OptiX

OptiX : single-ray programming model -> line-by-line translation

CUDA Ports of Geant4 classes
  • G4Cerenkov (only generation loop)
  • G4Scintillation (only generation loop)
  • G4OpAbsorption
  • G4OpRayleigh
  • G4OpBoundaryProcess (only a few surface types)
Modify Cherenkov + Scintillation Processes
  • collect genstep, copy to GPU for generation
  • avoids copying millions of photons to GPU
Scintillator Reemission
  • fraction of bulk absorbed "reborn" within same thread
  • wavelength generated by reemission texture lookup
Opticks (OptiX/Thrust GPU interoperation)
  • OptiX : upload gensteps
  • Thrust : seeding, distribute genstep indices to photons
  • OptiX : launch photon generation and propagation
  • Thrust : pullback photons that hit PMTs
  • Thrust : index photon step sequences (optional)

G4VSolid -> CUDA Intersect Functions for ~10 Primitives


Sphere, Cylinder, Disc, Cone, Convex Polyhedron, Hyperboloid, Torus, ...

G4Boolean -> CUDA/OptiX Intersection Program Implementing CSG

Complete Binary Tree, pick between pairs of nearest intersects:

UNION tA < tB Enter B Exit B Miss B
Enter A ReturnA LoopA ReturnA
Exit A ReturnA ReturnB ReturnA
Miss A ReturnB ReturnB ReturnMiss
[1] Ray Tracing CSG Objects Using Single Hit Intersections, Andrew Kensler (2006)
with corrections by author of XRT Raytracer http://xrt.wikidot.com/doc:csg
[2] https://bitbucket.org/simoncblyth/opticks/src/tip/optixrap/cu/csg_intersect_boolean.h
Similar to binary expression tree evaluation using postorder traverse.

Opticks : Translates G4 Geometry to GPU, Without Approximation

G4 Structure Tree -> Instance+Global Arrays -> OptiX

Group structure into repeated instances + global remainder:

instancing -> huge memory savings for JUNO PMTs


Translation 1st Step : Geant4 -> Opticks/GGeo : 1->1 conversions

Structural volumes : G4PVPlacement ->

JUNO: tree of ~300,000 GVolume

Solid shapes : G4VSolid ->

GMesh (collected into GMeshLib)
arrays: vertices, indices
ref to NCSG
tree of NNode (CSG constituents)

Material/surface properties as function of wavelength

Translation steered by X4 package


Translation 2nd Step : Opticks/GGeo Instancing : "Factorizes" Geometry

Structural volumes vs solid shapes
distinction for convenience only, distinction is movable

JUNO: ~300,000 GVolume : mostly small repeated groups (PMTs)


  1. GVolume progeny digest : shapes+transforms -> subtree ident.
  2. find repeated digests, disqualifying repeats inside others
  3. label all nodes with repeat index, non-repeated remainder : 0

For each repeat+remainder create GMergedMesh:

GMergedMesh -> IAS+GAS


"CSGFoundry" : Shared CPU/GPU Geometry Model (OptiX pre-7 & 7)

model, GPU upload
simple headers common to pre-7/7/CPU-testing
Convert Opticks/GGeo -> CSGFoundry
Simulation excluding geometry, generation
OptiX 7 + pre-7 geometry : depends on CSG, QUDARap

GAS : Geometry Acceleration Structure

IAS : Instance Acceleration Structure

CSG : Constructive Solid Geometry

Two-Level Hierarchy : Instance transforms (IAS) over Geometry (GAS)

OptiX supports multiple instance levels : IAS->IAS->GAS BUT: Simple two-level is faster : works in hardware RT Cores

Acceleration Structure
IAS (aka TLAS)
4x4 transforms, refs to GAS
GAS (aka BLAS)
custom primitives : AABB
triangles : vertices, indices
axis-aligned bounding box

SBT : Shader Binding Table

Flexibly binds together:

  1. geometry objects
  2. shader programs
  3. data for shader programs

Hidden in OptiX 1-6 APIs

Opticks Generality

Opticks Generality 2



-e t0, : NOT 0 : 3084:sWorld : exclude global remainder volumes


Comparison of ray traced render times of different geometry
simple way to find issues, eg over complex CSG, overlarge BBox


Same viewpoint inside JUNO Central Detector, vary included volumes
ray trace performance very sensitive to geometry and its modelling => BVH structure

[Dec 2021] JUNO : OptiX 7 Ray Trace Times ~2M-pix : TITAN RTX

idx -e time(s) relative enabled geometry description 3dbec4dc
0 5, 0.0004 0.0643 ONLY: 1:sStrutBallhead
1 9, 0.0004 0.0658 ONLY: 130:sPanel
2 7, 0.0005 0.0782 ONLY: 1:base_steel
3 8, 0.0006 0.0966 ONLY: 1:uni_acrylic1
4 6, 0.0006 0.1009 ONLY: 1:uni1
5 1, 0.0009 0.1476 ONLY: 5:PMT_3inch_pmt_solid FAST cf 20in
6 4, 0.0015 0.2386 ONLY: 4:mask_PMT_20inch_vetosMask
7 3, 0.0033 0.5373 ONLY: 5:HamamatsuR12860sMask SLOW cf 3in
8 0, 0.0040 0.6556 ONLY: 3084:sWorld
9 2, 0.0040 0.6627 ONLY: 5:NNVTMCPPMTsMask SLOW cf 3in
10 t4, 0.0050 0.8307 EXCL: 4:mask_PMT_20inch_vetosMask
11 t2, 0.0051 0.8391 EXCL: 5:NNVTMCPPMTsMask
12 t3, 0.0052 0.8514 EXCL: 5:HamamatsuR12860sMask
13 t6, 0.0053 0.8799 EXCL: 1:uni1
14 t7, 0.0054 0.8809 EXCL: 1:base_steel
15 t0 0.0054 0.8843 ALL
16 t5, 0.0054 0.8843 EXCL: 1:sStrutBallhead
17 t9, 0.0054 0.8855 EXCL: 130:sPanel
18 t1, 0.0054 0.8860 EXCL: 5:PMT_3inch_pmt_solid
19 t8, 0.0055 0.9013 EXCL: 1:uni_acrylic1
20 t0, 0.0059 0.9753 EXCL: 3084:sWorld
21 1,2,3,4 0.0061 1.0000 ONLY PMT
22 t8,0 0.0062 1.0217 EXCL: 1:uni_acrylic1 3084:sWorld

Validation of Opticks Simulation by Comparison with Geant4

Bi-simulations of all JUNO solids, with millions of photons

mis-aligned histories
mostly < 0.25%, < 0.50% for largest solids
deviant photons within matched history
< 0.05% (500/1M)

Primary sources of problems

Primary cause : float vs double

Geant4 uses double everywhere, Opticks only sparingly (observed double costing 10x slowdown with RTX)




Performance : Scanning from 1M to 400M Photons

Full JUNO Analytic Geometry j1808v5

Production Mode : does the minimum

Multi-Event Running, Measure:

avg time between successive launches, including overheads: (upload gensteps + launch + download hits)
avg of 10 OptiX launches


scan-pf-1_Opticks_vs_Geant4 2

JUNO analytic, 400M photons from center Speedup
Geant4 Extrap. 95,600 s (26 hrs)  
Opticks RTX ON (i) 58 s 1650x

scan-pf-1_Opticks_Speedup 2

JUNO analytic, 400M photons from center Speedup
Opticks RTX ON (i) 58s 1650x
Opticks RTX OFF (i) 275s 350x
Geant4 Extrap. 95,600s (26 hrs)  

Useful Speedup > 1500x : But Why Not Giga Rays/s ? (1 Photon ~10 Rays)

OptiX Performance Tools and Tricks, David Hart, NVIDIA https://developer.nvidia.com/siggraph/2019/video/sig915-vid