Huge CPU Memory+Time Expense
EXPT | Reactor neutrino |
Daya Bay | neutrino oscillations |
JUNO | mass heirarchy + oscillations => NVIDIA CN Contacts |
Long baseline neutrino beam | |
DUNE | FermiLab->Sanford, LAr TPC, => Assistance from Fermilab Geant4 Group |
Neutrinoless double beta decay, dark matter, other search | |
LZ | LUX-ZEPLIN dark matter experiment, Sandford => NVIDIA US Contacts |
LEGEND | Large Enriched Germanium Experiment, Gran Sasso/SNOLAB |
SABRE | dark matter direct-detection, Australia |
AMoRE | Mo-based Rare process Experiment, S.Korea |
nEXO | next Enriched Xenon Observatory, LLNL |
NEXT-CRAB0 | High Pressure Gaseous Xenon TPC with a Direct VUV Camera Based Readout |
Neutrino telescope | |
KM3Net | Cubic Kilometre Neutrino Telescope, Mediterranean |
IceCube | IceCube Neutrino Observatory, South Pole |
Air shower : gamma-ray and cosmic-ray observatory | |
LHAASO | Large High Altitude Air Shower Observatory, Sichuan |
Accelerator | |
LHCb-RICH | LHCb ring imaging Cherenkov sub-detector, CERN => NVIDIA EU Contacts |
Not a Photo, a Calculation
Much in common : geometry, light sources, optical physics
Many Applications of ray tracing :
ray trace performance : ~2x every ~2 years
Flexible Ray Tracing Pipeline
Green: User Programs, Grey: Fixed function/HW
Analogous to OpenGL rasterization pipeline
OptiX makes GPU ray tracing accessible
OptiX features
User provides (Green):
Latest Release : NVIDIA® OptiX™ 8.0.0 (Aug 2023) NEW:
https://bitbucket.org/simoncblyth/opticks |
Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory
CSGFoundry Model
Geant4 Geometry Model (JUNO: 400k PV, deep hierarchy)
PV | G4VPhysicalVolume | placed, refs LV |
LV | G4LogicalVolume | unplaced, refs SO |
SO | G4VSolid,G4BooleanSolid | binary tree of SO "nodes" |
Opticks CSGFoundry Geometry Model (index references)
struct | Notes | Geant4 Equivalent |
---|---|---|
CSGFoundry | vectors of the below, easily serialized + uploaded + used on GPU | None |
qat4 | 4x4 transform refs CSGSolid using "spare" 4th column (becomes IAS) | Transforms ref from PV |
CSGSolid | refs sequence of CSGPrim | Grouped Vols + Remainder |
CSGPrim | bbox, refs sequence of CSGNode, root of CSG Tree of nodes | root G4VSolid |
CSGNode | CSG node parameters (JUNO: ~23k CSGNode) | node G4VSolid |
NVIDIA OptiX 7/8 Geometry Acceleration Structures (JUNO: 1 IAS + 10 GAS, 2-level hierarchy)
IAS | Instance Acceleration Structures | JUNO: 1 IAS created from vector of ~50k qat4 (JUNO) |
GAS | Geometry Acceleration Structures | JUNO: 10 GAS created from 10 CSGSolid (which refs CSGPrim,CSGNode ) |
JUNO : Geant4 ~400k volumes "factorized" into 1 OptiX IAS referencing ~10 GAS
Full JUNO, Opticks, OptiX 7.5/8.0
raytrace 2M pixels | |
---|---|
TITAN RTX (1st) | 0.0118s (85 fps) |
Ada 5000 RTX (3rd) | 0.0031s (323 fps) |
Interactive ray traced visualization via OpenGL/OptiX interop
initial viewpoint, geometry exclusions via envvars
WASDQE+mouse 3D navigation
Render on NVIDIA RTX 5000 Ada Generation in 0.0060 s (not 0.0200 s)
Intersect with torus expensive on GPU
Triangulation using G4Polyhedron
G4Poly..::SetNumberOfRotationSteps
NumberOfRotationSteps | |
---|---|
HepPolyhedron Default | 24 |
Top Right | 48 |
Bottom Right | 480 |
Adjustable: precision of intersect, number of triangles
GPUs evolved for triangles => fast even with many
With list-node : shrink CSG tree
+------------------------------+ | U | | / \ | | / \ | | S U[A,B,C,D,E,F,G,H] | | / \ | | I J | +------------------------------+
Problematic deep CSG tree without list-node
+------------------------------------------+ | | | | | U | | / \ | | / \ | | / S | | U / \ | | / \ I J | | U H | | / \ | | U G | | / \ | | U F | | / \ | | U E | | / \ | | U D | | / \ | | U C | | / \ | | A B | | | +------------------------------------------+ U : Union S : Subtraction A-J : Tubs (cylinder) primitive
Simple G4MultiUnion is translated to Opticks list-node
TEST=medium_scan ~/opticks/cxs_min.sh
Generate optical only events with 1M->100M photons starting from CD center, gather and save only Hits.
OPTICKS_RUNNING_MODE=SRM_TORCH ## "Torch" running enables num_photon scan OPTICKS_NUM_PHOTON=M1,10,20,30,40,50,60,70,80,90,100 OPTICKS_NUM_EVENT=11 OPTICKS_EVENT_MODE=Hit
Compare simulation scans on two Dell Precision Workstations:
GPU (VRAM) | Arch | GPU Release | CUDA(RT) Cores | RTX Gen | Driver | CUDA | OptiX |
---|---|---|---|---|---|---|---|
NVIDIA TITAN RTX(24G) | Turing | Dec 2018 | 4,608(72) | 1st | 515.43 | 11.7 | 7.5 |
NVIDIA RTX 5000(32G) | Ada | Aug 2023 | 12,800(100) | 3rd | 550.76 | 12.4 | 8.0 |
PH(M) | G1 | G3 | G1/G3 |
---|---|---|---|
1 | 0.47 | 0.14 | 3.28 |
10 | 0.44 | 0.13 | 3.48 |
20 | 4.39 | 1.10 | 3.99 |
30 | 8.87 | 2.26 | 3.93 |
40 | 13.29 | 3.38 | 3.93 |
50 | 18.13 | 4.49 | 4.03 |
60 | 22.64 | 5.70 | 3.97 |
70 | 27.31 | 6.78 | 4.03 |
80 | 32.24 | 7.99 | 4.03 |
90 | 37.92 | 9.33 | 4.06 |
100 | 41.93 | 10.42 | 4.03 |
Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]
Amdahls "Law" : Expected Speedup
Overall speed limited by serial portion
optical photon simulation, P ~ 99% of CPU time
Traditional simulation use:
Extra Benefits of Adopting Opticks
=> using Opticks improves CPU simulation too !!
Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4, with automated geometry translation into GPU optimized form.
https://bitbucket.org/simoncblyth/opticks | day-to-day code repository |
https://simoncblyth.bitbucket.io | presentations and videos |
https://groups.io/g/opticks | forum/mailing list archive |
email: opticks+subscribe@groups.io | subscribe to mailing list |
simon.c.blyth@gmail.com | any questions |
Opticks History, early Chroma use, triangles...
Opticks reincarnated
Tri. : easy + fast on GPU, BUT:
Attempt to use mainly triangulated detector geometry : eventually led nowhere
Chroma : S.Siebert, A.LaTorre
Chroma tracks photons thru triangle-mesh geometry, using BVH acceleration structure, authors claim:
With a CUDA GPU Chroma has propagated 2.5M photons per second in a detector with 29k PMTs. This is 200x faster than GEANT4.
Issues:
Made efficiency fixes for MBP mobile GPU use (only CUDA capable device available to me):
https://bitbucket.org/simoncblyth/chroma/
CAVEAT : I LAST USED CHROMA IN 2015
Chroma : Disadvantages
Chroma : Fundamental Problem, triangles only
Geant4 analytic -> Triangles ? Problematic
(g4daeview.py) Chroma Raycast of Daya Bay geometry (3x3 CUDA kernel launches, 1.8s for 1.23M pixels, Geforce 750M GPU)
Split launch + use CUDA/OpenGL interop => enable mobile GPU render
(g4daeview.py) OpenGL rasterized render of triangulated geometry
(g4daeview.py) Chroma GPU photon propagation at 12 nanoseconds. The photons are generated by Geant4 simulation of a 100 GeV muon travelling from right to left. Photon colors indicate reemission (green), absorption(red), specular reflection (magenta), scattering(blue), no history (white).
OptiX raycast [50x Chroma]
MBP mobile GPU DYB raycast | |
---|---|
NVIDIA OptiX 5 | Chroma |
0.033s | 1.8s |
Why switch to NVIDIA OptiX ?
Initially used tri. with OptiX, later analytic CSG
"Opticks" started as synthesis:
Package name "Opticks", taken from world changing publication:
(GGeoView) Cerenkov photons from an 100 GeV muon travelling from right to left across Dayabay AD. Primaries are simulated by Geant4, Cerenkov "steps" of the primaries are transferred to the GPU. The dots represent OptiX calculated first intersections of GPU generated photons with colors corresponding to material boundaries: (red) GdDopedLS/Acrylic (green) LiquidScintillator/Acrylic, (blue) Acrylic/LiquidScintillator, (white) IwsWater:UnstStainlessSteel, (grey) others. The red lines represent the positions and directions of the "steps" with an arbitrary scaling for visibility.
Opticks History (2016) : Handling Huge Geometry (JUNO) with instancing
Instancing in OptiX and OpenGL avoids repetition of geometry data on GPU for repeated elements (eg PMTs). [Image is composite of OpenGL rasterized event representation and OptiX raytraced triangulated geom]
triangulated geometry : not practical for general simulation, but very useful for fast visualization
Analytic PMT (no triangles)
Near clipped, orthographic projection : gives cutaway raytrace render
NVIDIA OptiX provided no intersect (just accel. intersect)
Partition PMT at constituent joins (semi-manually)
Daya Bay Opticks Propagation : Triangulated geometry with Analytic PMT [composite OptiX raytrace geometry + OpenGL rasterized Cerenkov photons]
Torus artifacts
3D parametric ray : ray(x,y,z;t) = rayOrigin + t * rayDirection
High order equation
Best Solution : replace torus
CSG Binary Tree
Primitives combined via binary operators
Simple by construction definition, implicit geometry.
CSG expressions
3D Parametric Ray : ray(t) = r0 + t rDir
Ray Geometry Intersection
How to pick exactly ?
Outside/Inside Unions
dot(normal,rayDir) -> Enter/Exit
Pick between pairs of nearest intersects, eg:
UNION tA < tB | Enter B | Exit B | Miss B |
---|---|---|---|
Enter A | ReturnA | LoopA | ReturnA |
Exit A | ReturnA | ReturnB | ReturnA |
Miss A | ReturnB | ReturnB | ReturnMiss |
Bit Twiddling Navigation
Geant4 solid -> CSG binary tree (leaf primitives, non-leaf operators, 4x4 transforms on any node)
Serialize to complete binary tree buffer:
Height 3 complete binary tree with level order indices:
depth elevation 1 0 3 10 11 1 2 100 101 110 111 2 1 1000 1001 1010 1011 1100 1101 1110 1111 3 0
postorder_next(i,elevation) = i & 1 ? i >> 1 : (i << elevation) + (1 << elevation) ; // from pattern of bits
Postorder tree traverse visits all nodes, starting from leftmost, such that children are visited prior to their parents.
Pure analytic CSG Daya Bay near geometry, auto-converted from Geant4 to Opticks GPU geometry, NVIDIA OptiX GPU raytrace render [no triangles]
Pure analytic CSG JUNO geometry, auto-converted from Geant4 to Opticks GPU geometry, NVIDIA OptiX GPU raytrace render [no triangles] (GGeoView)
Approximate triangulated JUNO geometry [note impingement of torus guide tube and acrylic "sphere"], OpenGL rasterized render (GGeoView)
A+B photon histories => SEvt
Opticks Event : sysrap/SEvt.hh |
---|
sevent.h sctx.h sphoton.h srec.h ... |
serialize to NumPy .npy arrays |
=> A-B comparison, matplotlib/pyvista plotting
array | shape | notes |
---|---|---|
inphoton | (n,4,4) | input photons |
photon | (n,4,4) | final photons |
record | (n,32,4,4) | photon histories |
seq | (n,2,2) | uint64 histories |
aux | (n,32,4,4) | extra point info |
sframe | (4,4,4) | target M2W W2M |
Record of every point of every photon
A and B always same photon counts (due to gensteps)
Primary Issue : double vs float, also:
After debugged : fraction of percent diffs
Statistical Chi-squared comparison of photon history occurence between two simulations
c2sum/c2n:c2per(C2CUT) 280.88/188:1.494 (30)
np.c_[siq,_quo,siq,sabo2,sc2,sabo1][0:25] ## A-B history frequency chi2 comparison
0 TO BT BT BT BT SD 33322 33343 0.0066 1 2
1 TO BT BT BT BT SA 28160 28070 0.1441 8 0
2 TO BT BT BT BT BT SR SA 6270 6268 0.0003 10363 10565
3 TO BT BT BT BT BT SA 4552 4649 1.0226 8398 8433
4 TO BT BT BT BT BT SR BR SR SA 1154 1186 0.4376 21156 21014
5 TO BT BT BT BT BT SR BR SA 923 989 2.2782 20241 20201
6 TO BT BT BT BT BR BT BT BT BT BT BT AB 946 958 0.0756 10389 8432
7 TO BT BT BT BT BT SR SR SA 901 942 0.9121 10399 10410
8 TO BT BT AB 878 895 0.1630 26 102
9 TO BT BT BT BT BT SR BT BT BT BT BT BT BT AB 615 635 0.3200 20974 22027
10 TO BT BT BT BT BR BT BT BT BT AB 571 601 0.7679 8459 9208
11 TO BT BT BT BT BR BT BT BT BT BT BT BT BT SA 533 537 0.0150 7312 7299
12 TO BT BT BT BT BR BT BT BT BT BT BT BT BT BT BT BT BT SD 503 396 12.7353 12018 11465
13 TO BT BT BT BT BR BT BT BT BT BT BT BT BT SD 480 497 0.2958 7974 7967
14 TO BT BT BT BT BR BT BT BT BT BT BT BT BT BT BT BT BT SA 412 411 0.0012 11467 11471
15 TO BT BT BT BT BT SR SR SR SA 383 396 0.2169 10362 10368
When causes of discrepancy cannot be identified statistically
Comparison of two independent optical simulation implementations : ideal way find issues
Full photon step point details enable debug, here from input photons
Green : start position (100k input photons)
Red : end position, Cyan : other position
Opticks geometry translation is general, but often work needed to:
Potential to make your simulation unlimited by optical photons
While Opticks changing rapidly...
More active users very welcome,
New active bug reporting Opticks user : Ilker Parmaksiz
had contacts with ~5 LZ people
Extra Benefits of Adopting Opticks
=> using Opticks improves CPU simulation too !!
Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4, with automated geometry translation into GPU optimized form.
https://bitbucket.org/simoncblyth/opticks | day-to-day code repository |
https://simoncblyth.bitbucket.io | presentations and videos |
https://groups.io/g/opticks | forum/mailing list archive |
email: opticks+subscribe@groups.io | subscribe to mailing list |
simon.c.blyth@gmail.com | any questions |