Huge CPU Memory+Time Expense
| EXPT | Reactor neutrino |
| Daya Bay | neutrino oscillations |
| JUNO | mass heirarchy + oscillations => NVIDIA CN Contacts |
| Long baseline neutrino beam | |
| DUNE | FermiLab->Sanford, LAr TPC, => Assistance from Fermilab Geant4 Group |
| Neutrinoless double beta decay, dark matter, other search | |
| LZ | LUX-ZEPLIN dark matter experiment, Sandford => NVIDIA US Contacts |
| LEGEND | Large Enriched Germanium Experiment, Gran Sasso/SNOLAB |
| SABRE | dark matter direct-detection, Australia |
| AMoRE | Mo-based Rare process Experiment, S.Korea |
| nEXO | next Enriched Xenon Observatory, LLNL |
| NEXT-CRAB0 | High Pressure Gaseous Xenon TPC with a Direct VUV Camera Based Readout |
| Neutrino telescope | |
| KM3Net | Cubic Kilometre Neutrino Telescope, Mediterranean |
| IceCube | IceCube Neutrino Observatory, South Pole |
| Air shower : gamma-ray and cosmic-ray observatory | |
| LHAASO | Large High Altitude Air Shower Observatory, Sichuan |
| Accelerator | |
| LHCb-RICH | LHCb ring imaging Cherenkov sub-detector, CERN => NVIDIA EU Contacts |
Not a Photo, a Calculation
Much in common : geometry, light sources, optical physics
Many Applications of ray tracing :
Flexible Ray Tracing Pipeline
Green: User Programs, Grey: Fixed function/HW
Analogous to OpenGL rasterization pipeline
OptiX makes GPU ray tracing accessible
OptiX features
User provides (Green):
Latest Release : NVIDIA® OptiX™ 8.0.0 (Aug 2023) NEW:
| https://bitbucket.org/simoncblyth/opticks |
Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory
CSGFoundry Model
Geant4 Geometry Model (JUNO: 400k PV, deep hierarchy)
| PV | G4VPhysicalVolume | placed, refs LV |
| LV | G4LogicalVolume | unplaced, refs SO |
| SO | G4VSolid,G4BooleanSolid | binary tree of SO "nodes" |
Opticks CSGFoundry Geometry Model (index references)
| struct | Notes | Geant4 Equivalent |
|---|---|---|
| CSGFoundry | vectors of the below, easily serialized + uploaded + used on GPU | None |
| qat4 | 4x4 transform refs CSGSolid using "spare" 4th column (becomes IAS) | Transforms ref from PV |
| CSGSolid | refs sequence of CSGPrim | Grouped Vols + Remainder |
| CSGPrim | bbox, refs sequence of CSGNode, root of CSG Tree of nodes | root G4VSolid |
| CSGNode | CSG node parameters (JUNO: ~23k CSGNode) | node G4VSolid |
NVIDIA OptiX 7/8 Geometry Acceleration Structures (JUNO: 1 IAS + 10 GAS, 2-level hierarchy)
| IAS | Instance Acceleration Structures | JUNO: 1 IAS created from vector of ~50k qat4 (JUNO) |
| GAS | Geometry Acceleration Structures | JUNO: 10 GAS created from 10 CSGSolid (which refs CSGPrim,CSGNode ) |
JUNO : Geant4 ~400k volumes "factorized" into 1 OptiX IAS referencing ~10 GAS
Full JUNO, Opticks, OptiX 7.5/8.0
| raytrace 3.7M pixels | |
|---|---|
| 0.0118s (85 fps) | |
| 0.0031s (323 fps) | |
Cutaway ray traced render of JUNO CD
Mostly Analytic CSG
Guide Tube Torus Triangulated
WaterPool HBeam Overlaps FIXED with simpler approach
FastenerAcrylic translated to "list-node"
Testing triangulated GuideTube + XJ + SJ solids
Interactive ray traced visualization via OpenGL/OptiX interop
initial viewpoint, geometry exclusions via envvars
WASDQE+mouse 3D navigation
Render on NVIDIA RTX 5000 Ada Generation in 0.0060 s (not 0.0200 s)
Intersect with torus expensive on GPU
Triangulation using G4Polyhedron
G4Poly..::SetNumberOfRotationSteps
| NumberOfRotationSteps | |
|---|---|
| HepPolyhedron Default | 24 |
| Top Right | 48 |
| Bottom Right | 480 |
Adjustable: precision of intersect, number of triangles
GPUs evolved for triangles => fast even with many
RTX : Uses "builtin" RT Core triangle intersect
curand XORWOW workaround
XORWOW is default curand generator
Opticks longstanding workaround:
Workaround works, BUT:
Philox4_32_10 is alternative curand generator
Advantages:
Stat. quality of Philox randoms comparable to XORWOW[1]
[1] cuRAND generator tests https://docs.nvidia.com/cuda/curand/testing.html
Opticks Unlimited
Philox counter based RNG + Out-of-core:
=> simulate billion-photon evt, no special setup
Use sliced genstep array in: QSim::simulate
[NP::MakeMetaKVS_ranges2_table num_specs 8
SEvt__Init_RUN_META ==> CSGFoundry__Load_HEAD 655 ## init
CSGFoundry__Load_HEAD ==> CSGFoundry__Load_TAIL 4,235,189 ## load_geom
CSGOptiX__Create_HEAD ==> CSGOptiX__Create_TAIL 266,810 ## upload_geom
A000_QSim__simulate_HEAD ==> A000_QSim__simulate_LBEG 251 ## slice_genstep
A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,137,923 ## simulate slice
A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 3,975,867 ## download slice
A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,449,227 REP 46,587,150 ## simulate slice
A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 3,924,104 REP 7,899,971 ## download slice
A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,736,442 REP 70,323,592 ## simulate slice
A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 4,108,315 REP 12,008,286 ## download slice
A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 23,850,920 REP 94,174,512 ## simulate slice
A000_QSim__simulate_POST ==> A000_QSim__simulate_DOWN 4,119,275 REP 16,127,561 ## download slice
A000_QSim__simulate_LEND ==> A000_QSim__simulate_PCAT 15,900,158 ## concat slices
A000_QSim__simulate_BRES ==> A000_QSim__simulate_TAIL 117,551,399 ## save arrays
TOTAL: 248,256,535
]NP::MakeMetaKVS_ranges2_table num_keys:69
Auto Config based on VRAM
[1] SEventConfig::HeuristicMaxSlot_Rounded
| Out-of-core optical simulation | |
|---|---|
| four kernel executions, total time | 94 s |
| four hit slice downloads, total time | 16 s |
| saving 216M hits (13GB .npy file) | 117 s |
| loading geometry from /cvmfs | 4 s |
| total time | 248 s |
TEST=medium_scan ~/opticks/cxs_min.sh
Generate optical only events with 1M->100M photons starting from CD center, gather and save only Hits.
OPTICKS_RUNNING_MODE=SRM_TORCH ## "Torch" running enables num_photon scan OPTICKS_NUM_PHOTON=M1,10,20,30,40,50,60,70,80,90,100 OPTICKS_NUM_EVENT=11 OPTICKS_EVENT_MODE=Hit
Compare simulation scans on two Dell Precision Workstations:
| GPU (VRAM) | Arch | GPU Release | CUDA(RT) Cores | RTX Gen | Driver | CUDA | OptiX |
|---|---|---|---|---|---|---|---|
| NVIDIA TITAN RTX(24G) | Turing | Dec 2018 | 4,608(72) | 1st | 515.43 | 11.7 | 7.5 |
| NVIDIA RTX 5000(32G) | Ada | Aug 2023 | 12,800(100) | 3rd | 550.76 | 12.4 | 8.0 |
| PH(M) | G1 | G3 | G1/G3 |
|---|---|---|---|
| 1 | 0.47 | 0.14 | 3.28 |
| 10 | 0.44 | 0.13 | 3.48 |
| 20 | 4.39 | 1.10 | 3.99 |
| 30 | 8.87 | 2.26 | 3.93 |
| 40 | 13.29 | 3.38 | 3.93 |
| 50 | 18.13 | 4.49 | 4.03 |
| 60 | 22.64 | 5.70 | 3.97 |
| 70 | 27.31 | 6.78 | 4.03 |
| 80 | 32.24 | 7.99 | 4.03 |
| 90 | 37.92 | 9.33 | 4.06 |
| 100 | 41.93 | 10.42 | 4.03 |
Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]
Amdahls "Law" : Expected Speedup
Overall speed limited by serial portion
optical photon simulation, P ~ 99% of CPU time
Traditional simulation use:
Extra Benefits of Adopting Opticks
=> using Opticks improves CPU simulation too !!
Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4, with automated geometry translation into GPU optimized form.
| https://bitbucket.org/simoncblyth/opticks | day-to-day code repository |
| https://simoncblyth.bitbucket.io | presentations and videos |
| https://groups.io/g/opticks | forum/mailing list archive |
| email: opticks+subscribe@groups.io | subscribe to mailing list |
| simon.c.blyth@gmail.com | any questions |
New active bug reporting (+leak finding/fixing) Opticks user : Ilker Parmaksiz
Performance of an Optical TPC Geant4 Simulation with Opticks GPU-Accelerated Photon Propagation
NEXT Collaboration, I.Parmaksiz, Feb 18, 2025, https://doi.org/10.48550/arXiv.2502.13215
had contacts with ~5 LZ people
| GPU vendor | compute framework | ray trace framework | hardware RT | notes |
|---|---|---|---|---|
| NVIDIA | CUDA(2007-) | OptiX(2009-) | RTX/RT Core (2018-) | |
| Apple | Metal/MPS(2014-) | Metal/MPS(2020-) | From M3 (2023-) | |
| AMD | ROCm(2016-) | RadeonRays, HIP-RT(2022-) | From Radeon RX 6000 (2020-) | |
| Intel | oneAPI(?2020-) | Embree? | From Arc Alchemist (2022-) | uses SYCL |
| Huawei | ? | mobile only | mobile only | |
| Cross-vendor | Vulkan compute shaders | Vulkan ray trace extension | NVIDIA/AMD/Intel/? | Depends on vendor drivers |
| OpenCL | dead? | |||
| OpenMP | new support for GPU offloading |
Other GPU vendors such as Samsung and Qualcomm mostly focussed on mobile.