- Optical Photon Simulation Problem...
- Huge CPU Memory+Time Expense

- Ray tracing and Rasterization
- Rendering + Photon Simulation : Limited by Ray Geometry Intersection
- NVIDIA OptiX Ray Tracing Engine
- Boundary Volume Heirarchy (BVH) algorithm

- GPU Geometry starts from ray-primitive intersection
- Ray intersection with general CSG binary trees, on GPU
- Torus : much more difficult/expensive than other primitives
- CSG : (Cylinder - Torus) PMT neck : spurious intersects
- CSG : Alternative PMT neck designs
- Translate Geant4 Geometry to GPU, Without Approximation
- Translate Geant4 Optical Physics to GPU (OptiX/CUDA)
- Random Aligned Validation : Direct comparison of GPU/CPU NumPy arrays
- Progress on Validation : JUNO Geometry Issues
- Progress on Ease-of-use
- Opticks : Aims to Revolutionize JUNO Muon Simulation
- Links

Huge CPU Memory+Time Expense

**JUNO Muon Simulation Bottleneck**- ~99% CPU time, memory constraints
**Optical photons : naturally parallel, simple :**- produced by Cerenkov+Scintillation
- yield only Photomultiplier hits

**-> Hybrid Solution : Geant4 + Opticks**

Not a Photo, a Calculation

**Ray Tracing Tools can Help Optical Photon Simulation**

- industry continuously improving ray trace performance
- NVIDIA Turing GPU : raytrace dedicated "RT Cores"
- Up to
*"11 GigaRays per second"*per GPU

- Up to

https://nvidianews.nvidia.com/news/nvidia-reveals-the-titan-of-turing-titan-rtx

OptiX Raytracing Pipeline

Analogous to OpenGL rasterization pipeline:

**OptiX makes GPU ray tracing accessible**

**accelerates**ray-geometry intersections- simple : single-ray programming model
- "...free to use within any application..."

**NVIDIA expertise:**

- ~linear scaling with CUDA cores across multiple GPUs
- acceleration structure creation + traversal (Blue)
- instanced sharing of geometry + acceleration structures
- compiler optimized for GPU ray tracing
- regular updates, profit from new GPU features:
- NVIDIA RTX™ with Volta, Turing GPUs

`https://developer.nvidia.com/rtx`

**User provides (Yellow):**

- ray generation
- geometry bounding box, intersects

- 3D parametric ray :
**ray(x,y,z;t) = rayOrigin + t * rayDirection** - implicit equation of primitive :
**f(x,y,z) = 0** - -> polynomial in
**t**, roots:**t > t_min**-> intersection positions + surface normals

Outside/Inside Unions

dot(normal,rayDir) -> Enter/Exit

**A + B**boundary not inside other**A * B**boundary inside other

Pick between pairs of nearest intersects, eg:

UNION tA < tB |
Enter B | Exit B | Miss B |
---|---|---|---|

Enter A |
ReturnA | LoopA | ReturnA |

Exit A |
ReturnA | ReturnB | ReturnA |

Miss A |
ReturnB | ReturnB | ReturnMiss |

*Nearest hit intersect algorithm*[1] avoids state- sometimes Loop : advance
**t_min**, re-intersect both - classification shows if inside/outside

- sometimes Loop : advance
*Evaluative*[2] implementation emulates recursion:- recursion not allowed in OptiX intersect programs
- bit twiddle traversal of complete binary tree
- stacks of postorder slices and intersects

- Identical geometry to Geant4
- solving the same polynomials
- near perfect intersection match

- [1] Ray Tracing CSG Objects Using Single Hit Intersections, Andrew Kensler (2006)
- with corrections by author of XRT Raytracer http://xrt.wikidot.com/doc:csg
- [2] https://bitbucket.org/simoncblyth/opticks/src/tip/optixrap/cu/csg_intersect_boolean.h
- Similar to binary expression tree evaluation using postorder traverse.

Torus artifacts

3D parametric ray : **ray(x,y,z;t) = rayOrigin + t * rayDirection**

- ray-torus intersection -> solve quartic polynomial in
**t** - A t^4 + B t^3 + C t^2 + D t + E = 0

**Solving Quartics**

- requires double precision
- very large difference between coefficients
- varying ray -> wide range of coefficients
- numerically problematic
- several mathematical approaches tried : no clear winner
- adopted approach[1] avoids artifacts in primitives, but still has issues in CSG combinations

**Best Solution : avoid Torus**

- avoids the expense as well as the problems
- eg model PMT neck with hyperboloid or polycone, not cylinder-torus

[1] Depressed quartic + resolvent cubic

https://bitbucket.org/simoncblyth/opticks/src/tip/optixrap/cu/csg_intersect_torus.h

https://bitbucket.org/simoncblyth/opticks/src/tip/optixrap/cu/SolveQuartic.h

- Wide variety of artifacts as change viewpoint, changing quartic coefficients

OptiX Raytrace and OpenGL rasterized wireframe comparing neck models:

- Ellipsoid + Hyperboloid + Cylinder
- Ellipsoid + (Cylinder - Torus) + Cylinder

- unfortunately Geant4 does not support z-cut hyperboloid, so use polycone ?

**Best Solution : use simpler model for optically unimportant PMT neck**

Hyperboloid and Cone defined using *closest point on ellipse to center of torus circle*

- Cylinder-Torus : purple line, Cone : green,
**simplest** - Hyperboloid : dashed magenta, works with
*Opticks*, BUT*G4Hype*has no z-range flexibility

https://bitbucket.org/simoncblyth/opticks/src/tip/ana/x018_torus_hyperboloid_plt.py

Direct Geometry Translation

- 2018:
**full reimplementation of translation** - automatic geo-management, simplifying usage
- unified handling of analytic and triangulated
- no separate export+import stages
- substantial reduction in Opticks code
- dependencies eliminated : Assimp, G4DAE
- G4 geometry[1] auto-translated to Opticks CSG
- geocache persisted, staleness check by digest

**Direct Geometry : Geant4 "World" -> Opticks CSG -> GPU**- much simpler : fully automated geo-management

**Material/Surface/Scintillator properties**- interpolated to standard wavelength domain
- interleaved into "boundary" texture
- "reemission" texture for wavelength generation

**Structure**- repeated geometry instances identified (progeny digests)
- instance transforms used in OptiX/OpenGL geometry
- merge CSG trees into global + instance buffers
- export meshes to glTF 2.0 for 3D visualization

**Ease of Use**- easy geometry : just handover "World"

- [1] G4 primitives used need corresponding Opticks implementations, contributions for
- any unsupported geometry are welcome

GPU Resident Photons

**Seeded on GPU**- associate photons ->
*gensteps*(via seed buffer) **Generated on GPU, using genstep param:**- number of photons to generate
- start/end position of step
*gensteps*: hybrid CPU+GPU generation

**Propagated on GPU**- Only photons hitting PMTs copied to CPU

Thrust: **high level C++ access to CUDA**

OptiX : single-ray programming model -> line-by-line translation

**CUDA Ports of Geant4 classes**- G4Cerenkov (only generation loop)
- G4Scintillation (only generation loop)
- G4OpAbsorption
- G4OpRayleigh
- G4OpBoundaryProcess (only a few surface types)

**Modify Cerenkov + Scintillation Processes**- collect
*genstep*, copy to GPU for generation - avoids copying millions of photons to GPU

- collect
**Scintillator Reemission**- fraction of bulk absorbed "reborn" within same thread
- wavelength generated by reemission texture lookup

**Opticks (OptiX/Thrust GPU interoperation)****OptiX**: upload gensteps**Thrust**: seeding, distribute genstep indices to photons**OptiX**: launch photon generation and propagation**Thrust**: pullback photons that hit PMTs**Thrust**: index photon step sequences (optional)

bi-simulation direct matching

**Align CPU/GPU Random Number Sequences**- G4 random engine providing
*cuRAND*sequence **"Align" CPU/GPU codes (some jumps)**- simplest possible direct comparison
**Simple geometries**- same geometry, same physics, same results

**JUNO geometry** : issue iteration in progress

**tboolean-box simple geometry test : compare Opticks events**

- 100k photons : position, time, polarization : 1.2M floats
- 34 deviations > 1e-4 (mm or ns), largest 4e-4
- deviants all involve scattering (more flops?)

In [11]: pdv = np.where(dv > 0.0001)[0] In [12]: ab.dumpline(pdv) 0 1230 : TO BR SC BT BR BT SA 1 2413 : TO BT BT SC BT BR BR BT SA 2 9041 : TO BT SC BR BR BR BR BT SA 3 14510 : TO SC BT BR BR BT SA 4 14747 : TO BT SC BR BR BR BR BR BR BR 5 14747 : TO BT SC BR BR BR BR BR BR BR ... In [20]: ab.b.ox[pdv,0] In [21]: ab.a.ox[pdv,0] Out[20]: Out[21]: A()sliced A()sliced A([ [-191.6262, -240.3634, 450. , 5.566 ], A([ [-191.626 , -240.3634, 450. , 5.566 ], [ 185.7708, -133.8457, 450. , 7.3141], [ 185.7708, -133.8456, 450. , 7.3141], [-450. , -104.4142, 311.143 , 9.0581], [-450. , -104.4142, 311.1431, 9.0581], [ 83.6955, 208.9171, -450. , 5.6188], [ 83.6954, 208.9172, -450. , 5.6188], [ 32.8972, 150. , 24.9922, 7.6757], [ 32.8973, 150. , 24.992 , 7.6757], [ 32.8972, 150. , 24.9922, 7.6757], [ 32.8973, 150. , 24.992 , 7.6757], [ 450. , -186.7449, 310.6051, 5.0707], [ 450. , -186.7451, 310.605 , 5.0707], [ 299.2227, 318.1443, -450. , 4.8717], [ 299.2229, 318.144 , -450. , 4.8717], ...

Dual pro/dev bi-simulation

- production executable
*Opticks*embedded inside*Geant4*app, with minimal*G4Opticks*API- development executable
*Geant4*embedded inside*Opticks*, steals*geocache*+*gensteps*from production, does fully instrumented propagations- best-of-both-worlds
- same bi-propagations duplicated in production + development environments

**5/40 JUNO solids with CSG translation issues**

- PMT_20inch_body/pmt_solid
use of "cylinder - torus"

- causes spurious intersects
- fix : polycone neck
- z-cut hyperboloid not supported by Geant4

- PMT_20inch_inner1/2_solid
uses depth 4 tree (31 nodes) where 1 primitive sufficient

- profligate modelling
- fix : z-cut ellipsoid cathode

- sAirTT
box subtract a cylinder with coincident face

- fix : grow subtracted cylinder to avoid coincidence

**Next step** : make these geometry changes then proceed to next issue

Aim of Geant4+Opticks Examples

- exposure to
*Opticks*for all*Geant4*users - quickstart for
*Geant4*+*Opticks*learning / evaluation - demonstrate utility of scintillation + Cerenkov
*gensteps*to*Geant4*members

*Geant4* + *Opticks* History

**2014**: 19th Geant4 Collaboration Meeting, Okinawa- proto-Opticks (G4DAE) presented
**2017**: 22nd Geant4 Collaboration Meeting, Australia- presented Opticks (CSG on GPU) to plenary session,
discussions on how
*Opticks*might be integrated with*Geant4*, conclude on an advanced example as initial target **2018**: 23rd CHEP Conference, Bulgariadiscussions with Geant4 EM/Optical coordinator reach agreement on high level structure of example

- primary concern from
*Geant4*members is that ongoing support for the examples will be provided

- primary concern from

**Ease-of-use was focus of 2018 developments**

- Direct Geometry Translation -> automated geometry management
- Modern CMake with BCM[1] -> automated configuration
*G4Opticks*API -> simple embedding of*Opticks*within*G4*apps

[1] Boost CMake 3.5+ modules : configure direct dependencies only https://github.com/BoostCMake/cmake_modules https://github.com/simoncblyth/bcm

100 GeV muon, millions of photons

State-of-the-art GPU ray tracing[2] applied to optical simulation

- replaces Geant4 optical simulation with GPU equivalent
- translate G4 geometry to GPU without approximation[3]
- port G4 optical physics to CUDA[4]

**Optical photons generated+propagated entirely on GPU**

- only photons hitting PMTs require CPU memory
- optical photon CPU memory --> ~zero

- muon workload perfect for GPUs,
*Opticks*> 1000x*Geant4*- optical photon CPU time --> ~zero

**Status : validation iteration ongoing**

- validation by direct comparison of random sequence aligned GPU and CPU simulations
- minor PMT geometry simplifications needed to proceed to next iteration

[1] Open source project http://bitbucket.org/simoncblyth/opticks

[2] NVIDIA OptiX ray tracing engine

[3] using innovative Constructive Solid Geometry implementation on GPU

[4] scattering, boundary, reemission, absorption

- https://simoncblyth.bitbucket.io
- Opticks presentations and videos
- https://juno.ihep.ac.cn/cgi-bin/Dev_DocDB/ShowDocument?docid=3927
- Opticks visualization screen capture movie of JUNO
- https://groups.io/g/opticks
- Opticks mailing list archive
- opticks+subscribe@groups.io
- send email to this address, to subscribe

- https://simoncblyth.bitbucket.io/opticks/index.html
- Opticks installation instructions
- https://bitbucket.org/simoncblyth/opticks
- Opticks code repository