Opticks : GPU ray trace accelerated optical photon simulation

Opticks :
GPU ray trace accelerated optical photon simulation

Open source, https://bitbucket.org/simoncblyth/opticks

Simon C Blyth, IHEP, CAS — CHEP, Krakow, Poland — 21 October 2024


Outline

newtons-opticks.png
 


(JUNO) Optical Photon Simulation Problem...


Optical photons limit many simulations => lots of interest in Opticks

EXPT Reactor neutrino
Daya Bay neutrino oscillations
JUNO mass heirarchy + oscillations => NVIDIA CN Contacts
  Long baseline neutrino beam
DUNE FermiLab->Sanford, LAr TPC, => Assistance from Fermilab Geant4 Group
  Neutrinoless double beta decay, dark matter, other search
LZ LUX-ZEPLIN dark matter experiment, Sandford => NVIDIA US Contacts
LEGEND Large Enriched Germanium Experiment, Gran Sasso/SNOLAB
SABRE dark matter direct-detection, Australia
AMoRE Mo-based Rare process Experiment, S.Korea
nEXO next Enriched Xenon Observatory, LLNL
NEXT-CRAB0 High Pressure Gaseous Xenon TPC with a Direct VUV Camera Based Readout
  Neutrino telescope
KM3Net Cubic Kilometre Neutrino Telescope, Mediterranean
IceCube IceCube Neutrino Observatory, South Pole
  Air shower : gamma-ray and cosmic-ray observatory
LHAASO Large High Altitude Air Shower Observatory, Sichuan
  Accelerator
LHCb-RICH LHCb ring imaging Cherenkov sub-detector, CERN => NVIDIA EU Contacts

Optical Photon Simulation ≈ Ray Traced Image Rendering

simulation
photon parameters at sensors (PMTs)
rendering
pixel values at image plane

Much in common : geometry, light sources, optical physics

Many Applications of ray tracing :


NVIDIA RTX Generations

ray trace performance : ~2x every ~2 years


NVIDIA® OptiX™ Ray Tracing Engine -- Accessible GPU Ray Tracing

OptiX makes GPU ray tracing accessible

OptiX features

User provides (Green):

Latest Release : NVIDIA® OptiX™ 8.0.0 (Aug 2023) NEW:


Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow

https://bitbucket.org/simoncblyth/opticks

Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory


Geometry Model Translation : Geant4 => CSGFoundry => NVIDIA OptiX 7/8

Geant4 Geometry Model (JUNO: 400k PV, deep hierarchy)

PV G4VPhysicalVolume placed, refs LV
LV G4LogicalVolume unplaced, refs SO
SO G4VSolid,G4BooleanSolid binary tree of SO "nodes"

Opticks CSGFoundry Geometry Model (index references)

struct Notes Geant4 Equivalent
CSGFoundry vectors of the below, easily serialized + uploaded + used on GPU None
qat4 4x4 transform refs CSGSolid using "spare" 4th column (becomes IAS) Transforms ref from PV
CSGSolid refs sequence of CSGPrim Grouped Vols + Remainder
CSGPrim bbox, refs sequence of CSGNode, root of CSG Tree of nodes root G4VSolid
CSGNode CSG node parameters (JUNO: ~23k CSGNode) node G4VSolid

NVIDIA OptiX 7/8 Geometry Acceleration Structures (JUNO: 1 IAS + 10 GAS, 2-level hierarchy)

IAS Instance Acceleration Structures JUNO: 1 IAS created from vector of ~50k qat4 (JUNO)
GAS Geometry Acceleration Structures JUNO: 10 GAS created from 10 CSGSolid (which refs CSGPrim,CSGNode )

JUNO : Geant4 ~400k volumes "factorized" into 1 OptiX IAS referencing ~10 GAS


Ada_cxr_overview_emm_t0_elv_t_moi__ALL.jpg


Analytic + triangulated geometry

NEW FEATURE
Integration of analytic + triangulated geometry

cxr_min__eye_1,0,0__zoom_1__tmin_0.5__sSurftube_0V1_0:0:-1.jpg

Interactive ray traced visualization via OpenGL/OptiX interop

initial viewpoint, geometry exclusions via envvars

WASDQE+mouse 3D navigation


Ada_cxr_min__eye_1,0,0__zoom_1__tmin_0.5__sSurftube_0V1_0:0:-100000.jpg

Render on NVIDIA RTX 5000 Ada Generation in 0.0060 s (not 0.0200 s)


GuideTube : Torus Triangulated

GuideTube (39*2*2 = 156 G4Torus)
split in phi segments, radius breaks

Intersect with torus expensive on GPU

Triangulation using G4Polyhedron

G4Poly..::SetNumberOfRotationSteps

  NumberOfRotationSteps
HepPolyhedron Default 24
Top Right 48
Bottom Right 480

Adjustable: precision of intersect, number of triangles

GPUs evolved for triangles => fast even with many


List-node avoids deep CSG trees

Problematic deep CSG tree without list-node

+------------------------------------------+
|                                          |
|                                          |
|                           U              |
|                          / \             |
|                         /   \            |
|                        /     S           |
|                       U     / \          |
|                      / \   I   J         |
|                     U   H                |
|                    / \                   |
|                   U   G                  |
|                  / \                     |
|                 U   F                    |
|                / \                       |
|               U   E                      |
|              / \                         |
|             U   D                        |
|            / \                           |
|           U   C                          |
|          / \                             |
|         A   B                            |
|                                          |
+------------------------------------------+

U : Union
S : Subtraction
A-J : Tubs (cylinder) primitive

Simple G4MultiUnion is translated to Opticks list-node


Pure Optical TorchGenstep scan : 1M to 100M photons

TEST=medium_scan ~/opticks/cxs_min.sh

Generate optical only events with 1M->100M photons starting from CD center, gather and save only Hits.

OPTICKS_RUNNING_MODE=SRM_TORCH  ## "Torch" running enables num_photon scan
OPTICKS_NUM_PHOTON=M1,10,20,30,40,50,60,70,80,90,100
OPTICKS_NUM_EVENT=11
OPTICKS_EVENT_MODE=Hit

Compare simulation scans on two Dell Precision Workstations:

GPU (VRAM) Arch GPU Release CUDA(RT) Cores RTX Gen Driver CUDA OptiX
NVIDIA TITAN RTX(24G) Turing Dec 2018 4,608(72) 1st 515.43 11.7 7.5
NVIDIA RTX 5000(32G) Ada Aug 2023 12,800(100) 3rd 550.76 12.4 8.0

ALL1_scatter_10M_photon_22pc_hit_alt.png

4.5M hits from 20M photon TorchGenstep, 4.4(1.1) seconds
with: NVIDIA TITAN RTX(NVIDIA RTX 5000 Ada)  1st(3rd) gen RTX

AB_Substamp_ALL_Etime_vs_Photon_rtx_gen1_gen3.png

Event Time(s) vs PH(M)
PH(M) G1 G3 G1/G3
1 0.47 0.14 3.28
10 0.44 0.13 3.48
20 4.39 1.10 3.99
30 8.87 2.26 3.93
40 13.29 3.38 3.93
50 18.13 4.49 4.03
60 22.64 5.70 3.97
70 27.31 6.78 4.03
80 32.24 7.99 4.03
90 37.92 9.33 4.06
100 41.93 10.42 4.03

Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]

        
        

How much parallelized speedup actually useful to overall speedup?

optical photon simulation, P ~ 99% of CPU time

Traditional simulation use:


amdahl_p_sensitive.png

parallel/amdahl.png

Acknowledgements


Ilker Parmaksiz, NEXT-CRAB0 Prototype

New active bug reporting Opticks user : Ilker Parmaksiz