Opticks Experience of GPU Optical Photon Simulation with NVIDIA OptiX

Opticks GPU Optical Simulation with NVIDIA® OptiX™ - Development Experience : Problems and Successes

Open source, https://bitbucket.org/simoncblyth/opticks

Simon C Blyth, IHEP, CAS — HSF Simulation Working Group Meeting, 27 May 2020


Outline

/env/presentation/newtons-opticks.png


JUNO Optical Photon Simulation Problem...









CPU vs GPU architectures, Latency vs Throughput

/env/presentation/nvidia/cpu_vs_gpu_architecture.png

Waiting for memory read/write, is major source of latency...

CPU : latency-oriented : Minimize time to complete single task : avoid latency with caching
  • complex : caching system, branch prediction, speculative execution, ...
GPU : throughput-oriented : Maximize total work per unit time : hide latency with parallelism
  • many simple processing cores, hardware multithreading, SIMD (single instruction multiple data)
  • simpler : lots of compute (ALU), at expense of cache+control
  • design assumes abundant parallelism

Effective use of Totally different processor architecture -> Total reorganization of data and computation

Understanding Throughput-oriented Architectures https://cacm.acm.org/magazines/2010/11/100622-understanding-throughput-oriented-architectures/fulltext


Understanding GPU Graphical Origins -> Effective GPU Computation

GPUs evolved to rasterize 3D graphics at 30/60 fps

Simple Array Data Structures (N-million,4)

Constant "Uniform" 4x4 matrices : scaling+rotation+translation

Graphical Experience Informs Fast Computation on GPUs


Optical Photon Simulation ≈ Ray Traced Image Rendering

Much in common : geometry, light sources, optical physics


Many Applications of ray tracing :


Ray-tracing vs Rasterization

/env/presentation/nvidia/nv_rasterization.png /env/presentation/nvidia/nv_raytrace.png

TURING BUILT FOR RTX 2








Spatial Index Acceleration Structure













NVIDIA® OptiX™ Ray Tracing Engine -- http://developer.nvidia.com/optix

OptiX makes GPU ray tracing accessible

NVIDIA expertise:

Opticks provides (Yellow):

[1] Turing RTX GPUs


Geant4OpticksWorkflow


Opticks : Translates G4 Optical Physics to CUDA/OptiX

OptiX : single-ray programming model -> line-by-line translation

CUDA Ports of Geant4 classes
  • G4Cerenkov (only generation loop)
  • G4Scintillation (only generation loop)
  • G4OpAbsorption
  • G4OpRayleigh
  • G4OpBoundaryProcess (only a few surface types)
Modify Cherenkov + Scintillation Processes
  • collect genstep, copy to GPU for generation
  • avoids copying millions of photons to GPU
Scintillator Reemission
  • fraction of bulk absorbed "reborn" within same thread
  • wavelength generated by reemission texture lookup
Opticks (OptiX/Thrust GPU interoperation)
  • OptiX : upload gensteps
  • Thrust : seeding, distribute genstep indices to photons
  • OptiX : launch photon generation and propagation
  • Thrust : pullback photons that hit PMTs
  • Thrust : index photon step sequences (optional)

G4Solid -> CUDA Intersect Functions for ~10 Primitives

/env/presentation/tboolean_parade_sep2017.png

Sphere, Cylinder, Disc, Cone, Convex Polyhedron, Hyperboloid, Torus, ...


G4Boolean -> CUDA/OptiX Intersection Program Implementing CSG

Complete Binary Tree, pick between pairs of nearest intersects:

UNION tA < tB Enter B Exit B Miss B
Enter A ReturnA LoopA ReturnA
Exit A ReturnA ReturnB ReturnA
Miss A ReturnB ReturnB ReturnMiss
[1] Ray Tracing CSG Objects Using Single Hit Intersections, Andrew Kensler (2006)
with corrections by author of XRT Raytracer http://xrt.wikidot.com/doc:csg
[2] https://bitbucket.org/simoncblyth/opticks/src/master/optixrap/cu/csg_intersect_boolean.h
Similar to binary expression tree evaluation using postorder traverse.

CSG Complete Binary Tree Serialization -> simplifies GPU side

Geant4 solid -> CSG binary tree (leaf primitives, non-leaf operators, 4x4 transforms on any node)

Serialize to complete binary tree buffer:

Height 3 complete binary tree with level order indices:

                                                   depth     elevation

                     1                               0           3

          10                   11                    1           2

     100       101        110        111             2           1

 1000 1001  1010 1011  1100 1101  1110  1111         3           0

postorder_next(i,elevation) = i & 1 ? i >> 1 : (i << elevation) + (1 << elevation) ; // from pattern of bits

Postorder tree traverse visits all nodes, starting from leftmost, such that children are visited prior to their parents.


Opticks : Translates G4 Geometry to GPU, Without Approximation

G4 Structure Tree -> Instance+Global Arrays -> OptiX

Group structure into repeated instances + global remainder:

instancing -> huge memory savings for JUNO PMTs



        
        

j1808_top_rtx


j1808_top_ogl


Validation of Opticks Simulation by Comparison with Geant4

Bi-simulations of all JUNO solids, with millions of photons

mis-aligned histories
mostly < 0.25%, < 0.50% for largest solids
deviant photons within matched history
< 0.05% (500/1M)

Primary sources of problems

Primary cause : float vs double

Geant4 uses double everywhere, Opticks only sparingly (observed double costing 10x slowdown with RTX)

Conclude


scan-pf-check-GUI-TO-SC-BT5-SD


scan-pf-check-GUI-TO-BT5-SD


Performance : Scanning from 1M to 400M Photons

Full JUNO Analytic Geometry j1808v5

Production Mode : does the minimum

Multi-Event Running, Measure:

interval
avg time between successive launches, including overheads: (upload gensteps + launch + download hits)
launch
avg of 10 OptiX launches

NVIDIA Quadro RTX 8000 (48G)

谢谢 NVIDIA China
for loaning the card

scan-pf-1_NHit










scan-pf-1_Opticks_vs_Geant4 2





JUNO analytic, 400M photons from center Speedup
Geant4 Extrap. 95,600 s (26 hrs)  
Opticks RTX ON (i) 58 s 1650x

scan-pf-1_Opticks_Speedup 2










JUNO analytic, 400M photons from center Speedup
Opticks RTX ON (i) 58s 1650x
Opticks RTX OFF (i) 275s 350x
Geant4 Extrap. 95,600s (26 hrs)  

scan-pf-1_RTX_Speedup












5x Speedup from RTX with JUNO analytic geometry

Opticks Experience : Main Operational Problem : Manpower

Lots of interest, very little contribution, why ?

Tool Innovation is Disincentivized ?

Why GPU simulation development difficult ?


Opticks Experience : Main Technical Problem : Geometry Translation

Intersection Performance -> Simulation Performance, Drivers:

Analytic Geometry : translate volume -> surface based model

Coincident faces (even in CSG boolean constituents)

Analytic Torus Intersection


Opticks Experience : Problems with using NVIDIA OptiX

Optimization Issues

Linux GPU Cluster (eg Tesla V100) Deployment Issues

[1] NVIDIA RTX Server with 8x NVIDIA Quadro RTX 8000 : probably restricted to car, design, film companies ... [2] NVIDIA Quadro RTX 8000 PCIe Server Card (Passive)


Opticks Experience : Benefits from using NVIDIA OptiX

NVIDIA OptiX 3,4,5,6

NVIDIA OptiX 6

1 or 2 Releases per Year


Summary

/env/presentation/1px.png

Opticks : state-of-the-art GPU ray tracing applied to optical photon simulation and integrated with Geant4, giving a leap in performance that eliminates memory and time bottlenecks.

/env/presentation/1px.png
  • Drastic speedup -> better detector understanding -> greater precision
    • any simulation limited by optical photons can benefit
    • more photon limited -> more overall speedup (99% -> 100x)
/env/presentation/1px.png
https://bitbucket.org/simoncblyth/opticks code repository
https://simoncblyth.bitbucket.io presentations and videos
https://groups.io/g/opticks forum/mailing list archive
email:opticks+subscribe@groups.io subscribe to mailing list