Optical Photon Simulation with NVIDIA OptiX

http://simoncblyth.bitbucket.io/env/presentation/optical_photon_simulation_with_nvidia_optix.html (July 2015) http://simoncblyth.bitbucket.io/env/presentation/gpu_accelerated_geant4_simulation.html (Jan 2015)

Simon C Blyth, National Taiwan University
July 2015

Why not Chroma ?

Chroma Features

My additions to Chroma

Missing Features


https://bitbucket.org/simoncblyth/chroma (my fork)

Introducing NVIDIA OptiX Ray Tracing Engine [C++/C/CUDA]

OptiX provides: CUDA compiler optimized for Ray Tracing

NVIDIA expertise on efficient GPU/multi-GPU usage



Parallels between Realistic Image Synthesis and Optical Simulation

Realistic image creation uses physically based techniques and material definitions. Obvious parallels:

Same rate determining step: geometry intersection

Applying techniques/hardware developed for fast ray tracing can be hugely beneficial to optical photon simulation.

Chroma Raycast with entire geometry in view

Render Split into 3x3 CUDA kernel launches, 1 thread per pixel, ~1.8s for 1.23M pixels, 2.4M tris (with [1])

[1]MacBook Pro (2013), NVIDIA GeForce GT 750M 2048 MB (384 cores); Workstation GPU performance expected to scale by core count

OptiX raycast performance

DBNS geometry raycast comparison using mobile GPU

Performance improvement ~50x

OptiX Performance Scaling with GPU cores

OptiX sample rendering with 2 GPU IHEP workstation,

Performance linear with GPU cores, compared to laptop:

Future scaling possibilities, with VCA

OptiX apps can connect to remote Visual Computing Appliances

Clusters of ~10 VCAs are in use by design/advertising companies for interactive product rendering.

http://www.nvidia.com/object/visual-computing-appliance.html (8 Maxwell GPUs)


OptiX Programming Model

OptiX provides a ray tracing pipeline analogous to OpenGL rasterization pipeline.

Higher level API than pure CUDA, eg:

Optical Photon Simulation port currently using:

Ray Generation
entry/exit, Cerenkov/Scintillation generation
Triangle mesh intersection, boundary index lookup
Closest Hit
determine ray to boundary orientation


OptiX Adoption Costs

Adoption of OptiX is compelling

Costs of adoption

New C++ Packages Replacing Chroma

Basis packages

Geometry packages

GPU library interface packages

Main package


Selection of GPU development details

Some details of GPU developments described over the next pages

Optical Physics

Supplying the OptiX Programs

Handling Outputs

Porting Optical Physics from Geant4/Chroma into OptiX

Photon p ; State s ; PerRayData prd ;
while(bounce < bounce_max)  // PSEUDO-CODE

  ray = optix::make_Ray(p.pos, p.dir,...)
  rtTrace(geom, ray, prd)
  if(!prd.boundary) break  // MISS

  cmd = propagate_to_boundary(p, s)
  if(cmd == BREAK)    break     // ABSORB
  if(cmd == CONTINUE) continue  // REEMIT, SCATTER
  // survivors pass to boundary

     cmd = propagate_at_surface(p, s, g)
     if(cmd == BREAK) break         // SURFACE_ABSORB, SURFACE_DETECT
     propagate_at_boundary(p, s)    // BOUNDARY_REFLECT BOUNDARY_TRANSMIT


Optical Physics Implementation

Rayleigh Scattering
Direct port of G4OpRayleigh (Xin Qian patch)
  • Treated as subset of absorption, conferring rebirth
  • wavelength from reemission texture lookup
Boundary Reflect/Transmit
  • Snell's law rearranged to avoid transcendentals
  • Russian Roulette treatment of S or P polarization (simpler than G4)
propagate_at_surface: Absorb, Detect, Reflect Diffuse/Specular
  • surface properties still being debugged
  • G4 Unified model (SPECULARLOBE/SPIKE etc..) not yet ported



Random Number Generation in OptiX programs

cuRAND library from CUDA toolkit features:

cuRAND Initialization demands large stack size

Stack sizes 10x typical for OptiX programs were needed, resulting in slow OptiX running.


Packaged solution into CUDAWrap

Fast material/surface property lookup from boundary texture

AssimpWrap creates GGeo boundary instances and labels triangles with boundary indices, boundaries contain:

Properties are interpolated onto a common wavelength domain

Interleaved properties used to create single boundary texture 2d (wavelength, qty line) containing ~50 boundaries, 4 float4 each. CUDA tex2d property lookup:

float nmi = (nm - wavelength_domain.x)/wavelength_domain.z + 0.5f ;
float4 material1 = tex2D(wavelength_texture, nmi, line + 0.5f );

float refractive_index = material1.x ;
float absorption_length = material1.y ;
float scattering_length = material1.z ;

https://github.com/simoncblyth/assimp (my fork of Assimp)

Reemission wavelength lookup from Inverted CDF texture

Inverting Reemission CDF allows using texture lookup to obtain reemission wavelength from uniform random throws. Using 4096 probability bins.


Recording the steps of ~3 million photons

Up to 10 steps of the photon propagation are recorded.

Photon buffer : 4 * float4 = 512 bits/photon

Record buffer : 2 * short4 = 2*16*4 = 128 bits/record

Compression uses known domains of position (geometry center, extent), time (0:200ns), wavelength, polarization.

Union trickery allows recording ints into floats

GGeoView M1 Points

Indexing photon flag/material sequences


Selecting photons by flag/material sequences, requires indexing integer sequences.

Indexing history/material sequences for 3M photons:

Packaged indexing into ThrustRap ThrustIdx

GGeoView Flag Selection

Introducing CUDA Thrust


Distributed with CUDA

GPU performance without developing CUDA kernels

Mobile GPU Timings for Cerenkov and Scintillation photons

                    Cerenkov   Ck*4.59 Scintillation
           photons     0.61M      2.8M      2.8M
      genstep size      736K                1.3M
      photons size       37M                172M
      records size       97M                430M
   createOpenGLCtx     0.692         -     0.599
      loadGeometry     1.570         -     1.302
    interpGeometry     0.211         -     0.190
         initOptiX     4.216         -     6.521

       loadGenstep     0.011         -     0.014
 hostEvtAllocation **  3.540    16.275    16.179
         uploadEvt     0.232     1.066     0.552

 generatePropagate ++  1.404     6.453     7.907

       evtDownload **  0.348     1.602     1.780
           evtSave **  0.437     2.008     2.006

     sequenceIndex     0.134     0.614     0.359
                   ** scales by photon count

Operation with JUNO Geometry ?

The large number of PMTs may require a more memory efficient geometry representation using OptiX features:

Memory access (not calculation) typically limits GPU performance, improving memory efficiency expected to improve performance.

Next Steps

Test New Framework with IHEP 4-GPU workstation (together with Tao Lin)

Optical Photon Simulation

G4DAE Geometry Exporter

"Backup" : Details for Reference

On the following pages:

GGeo/OptiX Generated Scintillation Photons cf Geant4

GGeo/OptiX using inverted CDF reemission wavelength lookups (4096 bins)


GGeo/OptiX Generated Cerenkov Photons cf Geant4

Geant4/DetSim wavelength distribution has a blip at 200nm, corresponding to edge of water refractive index properties.


GGeoView Cerenkov Geom M1

C++ Infrastructure : foundation packages


Array persistency/manipulations inspired by NumPy, using NPY serialization format

  • 11 classes: G4StepNPY, PhotonsNPY, NPY, ...

Asynchronous IO of Geant4 Steps, Photons, Hits. Communicates with remote G4DAEOpticks process, receiving steps and replying with hits.

  • 7 classes : numpydelegate, udp_server, ...

cuRAND init and persist curandState (pure CUDA)

  • avoids large stack size requirement of cuRAND init within OptiX
  • 5 classes : cuRANDWrapper, LaunchSequence, LaunchCommon, ..




C++ Infrastructure : domain packages


GPU Geometry representation, NPY persistency

  • 22 classes: GNode, GMaterial, GProperty, ...

G4DAE -> GGeo geometry

  • 7 classes : AssimpGGeo, AssimpTree, ...

GGeo -> OptiX geometry, OptiX launch control

  • 7 classes : OptiXEngine, OptixGeometry, ...

OpenGL shader based 3D visualization

  • 29 classes : Scene, View, Camera, Rdr, Shdr, ...





propagate_to_boundary : ABSORB(REEMIT) / SCATTER / survive

 __device__ int propagate_to_boundary( Photon& p, State& s, curandState &rng)
      float absorption_distance = -s.material1.y*logf(curand_uniform(&rng));   // .y:absorption_length
      float scattering_distance = -s.material1.z*logf(curand_uniform(&rng));   // .z:scattering_length

      if (absorption_distance <= scattering_distance)
          if (absorption_distance <= s.distance_to_boundary)
              p.time += absorption_distance/(SPEED_OF_LIGHT/s.material1.x);    // .x:refractive_index
              p.position += absorption_distance*p.direction;

              if (curand_uniform(&rng) < s.material1.w) // .w:reemission_prob
                   // non-scintillators have zero reemission_prob
                  p.wavelength = reemission_lookup(curand_uniform(&rng));
                  p.direction = uniform_sphere(&rng);
                  p.polarization = normalize(cross(uniform_sphere(&rng), p.direction));

                  s.flag = BULK_REEMIT ;
                  return CONTINUE;
                  s.flag = BULK_ABSORB ;
                  return BREAK;
          //  otherwise sail to boundary
     // scattering ..

Comparison of GGeo/OptiX Generated Scintillation Photon Distributions

Position, direction, polarization XYZ + time, wavelength, weight


Comparison of GGeo/OptiX Generated Cerenkov Photon Distributions

Position, direction, polarization XYZ + time, wavelength, weight