Huge CPU Memory+Time Expense
EXPT | Reactor neutrino |
Daya Bay | neutrino oscillations |
JUNO | mass heirarchy + oscillations => NVIDIA CN Contacts |
Long baseline neutrino beam | |
DUNE | FermiLab->Sanford, LAr TPC, => Assistance from Fermilab Geant4 Group |
Neutrinoless double beta decay, dark matter, other search | |
LZ | LUX-ZEPLIN dark matter experiment, Sandford => NVIDIA US Contacts |
LEGEND | Large Enriched Germanium Experiment, Gran Sasso/SNOLAB |
SABRE | dark matter direct-detection, Australia |
AMoRE | Mo-based Rare process Experiment, S.Korea |
nEXO | next Enriched Xenon Observatory, LLNL |
NEXT-CRAB0 | High Pressure Gaseous Xenon TPC with a Direct VUV Camera Based Readout |
Neutrino telescope | |
KM3Net | Cubic Kilometre Neutrino Telescope, Mediterranean |
IceCube | IceCube Neutrino Observatory, South Pole |
Air shower : gamma-ray and cosmic-ray observatory | |
LHAASO | Large High Altitude Air Shower Observatory, Sichuan |
Accelerator | |
LHCb-RICH | LHCb ring imaging Cherenkov sub-detector, CERN => NVIDIA EU Contacts |
Not a Photo, a Calculation
Much in common : geometry, light sources, optical physics
Many Applications of ray tracing :
ray trace performance : ~2x every ~2 years
Flexible Ray Tracing Pipeline
Green: User Programs, Grey: Fixed function/HW
Analogous to OpenGL rasterization pipeline
OptiX makes GPU ray tracing accessible
OptiX features
User provides (Green):
Latest Release : NVIDIA® OptiX™ 8.0.0 (Aug 2023) NEW:
https://bitbucket.org/simoncblyth/opticks |
Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory
CSGFoundry Model
Geant4 Geometry Model (JUNO: 400k PV, deep hierarchy)
PV | G4VPhysicalVolume | placed, refs LV |
LV | G4LogicalVolume | unplaced, refs SO |
SO | G4VSolid,G4BooleanSolid | binary tree of SO "nodes" |
Opticks CSGFoundry Geometry Model (index references)
struct | Notes | Geant4 Equivalent |
---|---|---|
CSGFoundry | vectors of the below, easily serialized + uploaded + used on GPU | None |
qat4 | 4x4 transform refs CSGSolid using "spare" 4th column (becomes IAS) | Transforms ref from PV |
CSGSolid | refs sequence of CSGPrim | Grouped Vols + Remainder |
CSGPrim | bbox, refs sequence of CSGNode, root of CSG Tree of nodes | root G4VSolid |
CSGNode | CSG node parameters (JUNO: ~23k CSGNode) | node G4VSolid |
NVIDIA OptiX 7/8 Geometry Acceleration Structures (JUNO: 1 IAS + 10 GAS, 2-level hierarchy)
IAS | Instance Acceleration Structures | JUNO: 1 IAS created from vector of ~50k qat4 (JUNO) |
GAS | Geometry Acceleration Structures | JUNO: 10 GAS created from 10 CSGSolid (which refs CSGPrim,CSGNode ) |
JUNO : Geant4 ~400k volumes "factorized" into 1 OptiX IAS referencing ~10 GAS
Full JUNO, Opticks, OptiX 7.5/8.0
raytrace 2M pixels | |
---|---|
TITAN RTX (1st) | 0.0118s (85 fps) |
Ada 5000 RTX (3rd) | 0.0031s (323 fps) |
Interactive ray traced visualization via OpenGL/OptiX interop
initial viewpoint, geometry exclusions via envvars
WASDQE+mouse 3D navigation
Render on NVIDIA RTX 5000 Ada Generation in 0.0060 s (not 0.0200 s)
Intersect with torus expensive on GPU
Triangulation using G4Polyhedron
G4Poly..::SetNumberOfRotationSteps
NumberOfRotationSteps | |
---|---|
HepPolyhedron Default | 24 |
Top Right | 48 |
Bottom Right | 480 |
Adjustable: precision of intersect, number of triangles
GPUs evolved for triangles => fast even with many
With list-node : shrink CSG tree
+------------------------------+ | U | | / \ | | / \ | | S U[A,B,C,D,E,F,G,H] | | / \ | | I J | +------------------------------+
Problematic deep CSG tree without list-node
+------------------------------------------+ | | | | | U | | / \ | | / \ | | / S | | U / \ | | / \ I J | | U H | | / \ | | U G | | / \ | | U F | | / \ | | U E | | / \ | | U D | | / \ | | U C | | / \ | | A B | | | +------------------------------------------+ U : Union S : Subtraction A-J : Tubs (cylinder) primitive
Simple G4MultiUnion is translated to Opticks list-node
TEST=medium_scan ~/opticks/cxs_min.sh
Generate optical only events with 1M->100M photons starting from CD center, gather and save only Hits.
OPTICKS_RUNNING_MODE=SRM_TORCH ## "Torch" running enables num_photon scan OPTICKS_NUM_PHOTON=M1,10,20,30,40,50,60,70,80,90,100 OPTICKS_NUM_EVENT=11 OPTICKS_EVENT_MODE=Hit
Compare simulation scans on two Dell Precision Workstations:
GPU (VRAM) | Arch | GPU Release | CUDA(RT) Cores | RTX Gen | Driver | CUDA | OptiX |
---|---|---|---|---|---|---|---|
NVIDIA TITAN RTX(24G) | Turing | Dec 2018 | 4,608(72) | 1st | 515.43 | 11.7 | 7.5 |
NVIDIA RTX 5000(32G) | Ada | Aug 2023 | 12,800(100) | 3rd | 550.76 | 12.4 | 8.0 |
PH(M) | G1 | G3 | G1/G3 |
---|---|---|---|
1 | 0.47 | 0.14 | 3.28 |
10 | 0.44 | 0.13 | 3.48 |
20 | 4.39 | 1.10 | 3.99 |
30 | 8.87 | 2.26 | 3.93 |
40 | 13.29 | 3.38 | 3.93 |
50 | 18.13 | 4.49 | 4.03 |
60 | 22.64 | 5.70 | 3.97 |
70 | 27.31 | 6.78 | 4.03 |
80 | 32.24 | 7.99 | 4.03 |
90 | 37.92 | 9.33 | 4.06 |
100 | 41.93 | 10.42 | 4.03 |
Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]
Amdahls "Law" : Expected Speedup
Overall speed limited by serial portion
optical photon simulation, P ~ 99% of CPU time
Traditional simulation use:
Extra Benefits of Adopting Opticks
=> using Opticks improves CPU simulation too !!
Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4, with automated geometry translation into GPU optimized form.
https://bitbucket.org/simoncblyth/opticks | day-to-day code repository |
https://simoncblyth.bitbucket.io | presentations and videos |
https://groups.io/g/opticks | forum/mailing list archive |
email: opticks+subscribe@groups.io | subscribe to mailing list |
simon.c.blyth@gmail.com | any questions |
New active bug reporting Opticks user : Ilker Parmaksiz