JUNO+Opticks : Progress Towards Production

Open source, https://bitbucket.org/simoncblyth/opticks

Opticks replaces Geant4 optical photon simulation with an equivalent implementation that benefits from state-of-the-art GPU ray tracing from NVIDIA OptiX. All optically relevant aspects of Geant4 context must be translated+copied to GPU:

geometry : solids, structure, material+surface properties

generation : Cerenkov + Scintillation (using Gensteps from Geant4)

propagation : Rayleigh scattering, absorption, reemission, boundary

Achieving+maintaining equivalence is time consuming, BUT:

transformative performance benefits >1000x Geant4

Simon C Blyth — 29 March 2022

High Level Overview : 2022 January/February

January : Updating to JUNO trunk geometry

x3 slowdown compared to ~December trunk

many unintended overlaps : demonstrated with new 2D slice renders

XJfixtureConstruction : spurious internal intersects + performance bottleneck

Anti-coincidence fixes do not remove all spurious intersects

Reveals incompatibility between Opticks CSG intersection and balanced trees

switch OFF tree balancing avoids issue, but very bad for performance : NEED BETTER SOLUTION

February : Generalize Opticks CSG to handle "list-nodes" : n-ary (not binary) CSG

attempt to fix balancing (eg postorder preserving balance) and/or changing CSG intersection alg unsuccessful (difficult as problem only with high node count solids)

balancing was always a hack (not a solution) for handling high node count solids

conclude that although it may be possible to get balancing to work the best longterm solution is to take the simpler route of avoiding large trees by implementing "list-nodes"

list-nodes avoid large trees, tree overheads, tree balancing

CSG_CONTIGUOUS Union (Similar to G4MultiUnion)

CSG_DISCONTIGUOUS Union

CSG_OVERLAP Intersection

High Level Overview : 2022 March

March : Geometry Optimization + Migrating Sim from Old to New Workflow

fix "global leaf transform" bug following list-node CSG generalization
- add X4VolumeMaker G4OKVolumeTest for volume level debugging
act on several suggestions from NVIDIA engineers regards OptiX 7 usage (PRD pointer trick, launch flags, stacksize config)
- performance stays same (geometry otherwise bottlenecked?)
investigate x3 slower geometry compared to December
- bottleneck is global compound solid (~3000 Prim)
- implement dynamic CSG Prim include/exclude for fine-grained slowdown investigation
- apply dynamic geometry to do render scan : quantifying slow Prim
mid-March : switch gears from geometry to physics
- Simply skipping slow XJfixtureConstruction is faster way to get into production
  - BUT: ray trace bottlenecks so far always from geometry not physics
  - This is why my focus has been on a sequence of geometry issues over many months
- fill_state, rayleigh_scatter, propagate_to_boundary, propagate_at_boundary : brought over to qsim.h workflow
- QSimTest : fine grained testing as bring over functionality to new QUDARap/QSim workflow
  - random aligned comparison qsim-vs-bst : 1-in-a-million level match for S/P/X-polarization/normal-incidence/TIR
  - bst: opticks/examples/Geant4/BoundaryStandalone mocked Geant4 environment for Boundary unit testing

Month-by-month progress notes	https://bitbucket.org/simoncblyth/opticks/src/master/notes/progress.rst
Day-by-day commit messages	https://bitbucket.org/simoncblyth/opticks/commits/

https://bitbucket.org/simoncblyth/opticks/src/master/notes/progress.rst

Majority of recent months dominated by sequence of geometry issues

/env/presentation/notes/progress/2022_mar.png

Big Picture of Multi Year Transition to new workflow (OptiX 7 compatible)

Package	Role
CSG	GPU/CPU geometry model Inst/Solid/Prim/Node
CSG_GGeo	Translates GGeo into CSG
GeoChain	Geometry Translation Debug
QUDARap	CUDA Simulation
CSGOptiX	Geom (CSG) + Sim (QUDARap)

Design goals:

fine-grained GPU simulation testing QSimTest
CPU testing of GPU code (eg CSG)
works with OptiX 7, 6, 5 (7 is prime target)

Non-monolithic flexibility, easier: test + customize

eg JUNO MultiFilm PMT optical model
many tests just use CUDA, no OptiX

Selection of most important Opticks packages

Package	Indicative Functionality Examples
SysRap	SStr.hh, SSys.hh, NP.hh array
BoostRap	less need with C++17 (longterm:remove)
NPY	old array, primitives (longterm:remove)
OpticksCore	argument handling, central steering
GGeo	geometry model, GInstancer geo-factorization
ExtG4	geometry translation from G4 to GGeo
CFG4	Opticks Geant4 comparison eg CRecorder
CUDARap	curandState persist (removing)
ThrustRap	genstep->photon, photon index (removing)
OptiXRap	monolithic pre-7 simulation (removing)
G4Opticks	High Level Interface

Cherry-picking from removed packages into new ones

qudarap/QSimTest.cc fine-grained GPU testing qsim.h methods

Major design goal of new workflow : fine-grained testing (much better than monolithic)
some tests generate photon arrays, others mutate input arrays of photons

switch(type){
    case RNG_SEQUENCE:                  rng_sequence(num, ni_tranche_size)         ; break ;
    case WAVELENGTH_S  :                wavelength('S', num)                       ; break ;
    case WAVELENGTH_C  :                wavelength('C', num)                       ; break ;
    case SCINT_PHOTON_P:                scint_photon(num);                         ; break ;
    case CERENKOV_PHOTON_K:             cerenkov_photon(num, print_id);            ; break ;
    case CERENKOV_PHOTON_ENPROP_E:      cerenkov_photon_enprop(num, print_id);     ; break ;
    case GENERATE_PHOTON_G:             generate_photon();                         ; break ;
    case BOUNDARY_LOOKUP_ALL_A:         boundary_lookup_all()                      ; break ;
    case BOUNDARY_LOOKUP_LINE_WATER_W:  boundary_lookup_line("Water", x0, x1, nx)  ; break ;
    case BOUNDARY_LOOKUP_LINE_LS_L:     boundary_lookup_line("LS",    x0, x1, nx)  ; break ;
    case PROP_LOOKUP_Y:                 prop_lookup(-1, -1.f,16.f,1701)            ; break ;
    case FILL_STATE_0:                  fill_state(0)                              ; break ;
    case RAYLEIGH_SCATTER_ALIGN:        photon_launch_generate(num, type)          ; break ;
    case PROPAGATE_TO_BOUNDARY:         photon_launch_generate(8,   type)          ; break ;
    case PROPAGATE_AT_SURFACE:          photon_launch_generate(8,   type)          ; break ;
    case PROPAGATE_AT_BOUNDARY:
    case PROPAGATE_AT_BOUNDARY_NORMAL_INCIDENCE:
                                        photon_launch_generate(num, type)          ; break ;
    case HEMISPHERE_S_POLARIZED:
    case HEMISPHERE_P_POLARIZED:
    case HEMISPHERE_X_POLARIZED:
                                        photon_launch_generate(num, type)          ; break ;
    case PROPAGATE_AT_BOUNDARY_S_POLARIZED:
    case PROPAGATE_AT_BOUNDARY_P_POLARIZED:
    case PROPAGATE_AT_BOUNDARY_X_POLARIZED:
                                             photon_launch_mutate(num, type)       ; break ;
}

Standalone Geant4 Tests : Cerenkov, RayLeigh, Boundary

Provide photon arrays to compare against Opticks

Mocking just enough Geant4 environment to invoke PostStepDoIt of process

Common Inputs to both Opticks and Geant4 tests:

pre-cooked random numbers from curand (using OpticksRandom)
input photons (generated by other tests)
other params (eg refractive indices, momentum, polarization) via envvars

Small amount of Geant4 (and Opticks) code being run..

easy to match every random consumption
straightforward direct photon array A-B comparison
simple environment to debug discrepancies

`hemisphere_s_polarized_pvplt_polarized_p0.png`

1M photons all incident at origin (red:momentum, grey:polarization)

S-polarization direction transverse to plane of incidence (and momentum)

`hemisphere_p_polarized_pvplt_polarized_p0.png`

1M photons all incident at origin (red:momentum, grey:polarization)

P-polarization direction in plane of incidence, transverse to momentum

`hemisphere_x_polarized_pvplt_polarized_p0.png`

1M photons all incident at origin (red:momentum, grey:polarization)

"X"-polarized : equal admixture of S and P polarized

Boundary matched QSimTest.sh and G4OpBoundaryProcessTest.sh

epsilon:BoundaryStandalone blyth$ TEST=propagate_at_boundary_s_polarized NUM=1000000  ./G4OpBoundaryProcessTest.sh cf
a.shape (1000000, 4, 4) : /tmp/blyth/opticks/QSimTest/propagate_at_boundary_s_polarized/p.npy
b.shape (1000000, 4, 4) : /tmp/blyth/opticks/G4OpBoundaryProcessTest/propagate_at_boundary_s_polarized/p.npy

a_flag=a[:,3,3].view(np.uint32)                    : [2048 2048 2048 ... 2048 2048 1024]
b_flag=b[:,3,3].view(np.uint32)                    : [2048 2048 2048 ... 2048 2048 1024]
w_flag=np.where( a_flag != b_flag )                : (array([209411]),)   ## 1 REFLECT/TRANSMIT flag mismatch
ua_flag=np.unique(a_flag, return_counts=True)      : (array([1024, 2048], dtype=uint32), array([ 45024, 954976]))
ub_flag=np.unique(b_flag, return_counts=True)      : (array([1024, 2048], dtype=uint32), array([ 45025, 954975]))
a_TransCoeff=a[:,1,3]                              : [0.784 0.799 0.588 ... 0.853 0.481 0.959]
b_TransCoeff=b[:,1,3]                              : [0.784 0.799 0.588 ... 0.853 0.481 0.959]
w_TransCoeff=np.where( np.abs( a_TransCoeff - b_TransCoeff) > 1e-6 ) : (array([], dtype=int64),)
a_flat=a[:,0,3]                                    : [0.438 0.46  0.25  ... 0.557 0.184 0.992]
b_flat=b[:,0,3]                                    : [0.438 0.46  0.25  ... 0.557 0.184 0.992]
w_flat=np.where(a_flat != b_flat)                  : (array([], dtype=int64),)
w_ab0=np.where( np.abs(a[:,0] - b[:,0]) > 1e-6 )   : (array([], dtype=int64), array([], dtype=int64))
w_ab1=np.where( np.abs(a[:,1] - b[:,1]) > 1e-6 )   : (array([209411, 209411, 209411]), array([0, 1, 2])) ## diff direction
w_ab2=np.where( np.abs(a[:,2] - b[:,2]) > 1e-6 )   : (array([209411, 209411]), array([0, 1])) ## diff polarization
w_ab3=np.where( np.abs(a[:,3] - b[:,3]) > 1e-6 )   : (array([], dtype=int64), array([], dtype=int64))

In [1]: a[209411]
Out[1]:                                                        u_reflect > TransCoeff 
array([[ -0.136,  -0.264,  -0.955,   0.955],                       In [3]: a[209411,0,3] > a[209411,1,3]
       [ -0.09 ,  -0.176,  -0.98 ,   0.955],                       Out[3]: False   BOUNDARY_TRANSMIT 
       [  0.89 ,  -0.456,  -0.   , 500.   ],
       [  0.89 ,  -0.456,   0.   ,   0.   ]], dtype=float32)

In [2]: b[209411]
Out[2]:
array([[ -0.136,  -0.264,  -0.955,   0.955],                       In [4]: b[209411,0,3] > b[209411,1,3]
       [ -0.136,  -0.264,   0.955,   0.955],                       Out[4]: True     BOUNDARY_REFLECT  $
       [ -0.89 ,   0.456,  -0.   , 500.   ],
       [  0.89 ,  -0.456,   0.   ,   0.   ]], dtype=float32)
                                                   One-in-a-million on TransCoeff REFLECT/TRANSMIT cut edge
In [5]: a[209411,0,3] == b[209411,0,3]                             In [6]: a[209411,1,3] - b[209411,1,3]
Out[5]: True                                                       Out[6]: 5.9604645e-08

`cxr_overview_emm_t0_moi_-1_confirmed_fix.jpg`

./cxr_overview.sh: 2022 January 0.0126 (cf 0.0054 from 2021 Dec)

This Dec->Jan slowdown motivated development of dynamic Prim selection

Dynamic Prim Selection:: in memory CSGCopy with selection => construct any combination of Prim

Dynamic Prim Selection Speed Scan : One "G4VSolid" Geometries

cd ~/opticks/CSGOptiX
./grabsnap.sh --reverse --selectspec not_elv_t --candle t103 --rst

idx	-e	time(s)	relative	enabled geometry description 41c046fe
0	103	0.0225	3.0917	ONLY: solidXJfixture
1	112	0.0051	0.7009	ONLY: NNVTMCPPMTTail
2	105	0.0027	0.3644	ONLY: HamamatsuR12860Tail
3	111	0.0025	0.3422	ONLY: NNVTMCPPMTsMask
4	117	0.0022	0.3051	ONLY: NNVTMCPPMTsMask_virtual
5	0	0.0018	0.2496	ONLY: sTopRock_domeAir
6	1	0.0018	0.2474	ONLY: sTopRock_dome
7	94	0.0018	0.2406	ONLY: sTarget
8	123	0.0017	0.2392	ONLY: sChimneyAcrylic
9	127	0.0017	0.2378	ONLY: sInnerWater
10	128	0.0017	0.2366	ONLY: sReflectorInCD
11	95	0.0017	0.2363	ONLY: sAcrylic
12	14	0.0016	0.2160	ONLY: sAirTT
13	15	0.0016	0.2142	ONLY: sExpHall
14	109	0.0015	0.2025	ONLY: HamamatsuR12860_PMT_20inch_pmt_solid_1_4
15	104	0.0013	0.1847	ONLY: HamamatsuR12860sMask

SNAP_LIMIT=16 ./grabsnap.sh --reverse --selectspec not_elv_t --jpg --out
./image_grid.sh  ; ./pub1.sh path-returned-by-above

`image_grid_elv_scan_dyn_prim_scan.jpg`

Slowest 16 single selected solid renders: the worst solidXJfixture gives a blank render

Dynamic Prim Selection Speed Scan : Exclude G4VSolid Geometries

cd ~/opticks/CSGOptiX
./grabsnap.sh --selectspec only_elv_t --candle t103 --rst

idx	-e	time(s)	relative	enabled geometry description 41c046fe
0	t103	0.0073	1.0000	EXCL: solidXJfixture
1	t1	0.0118	1.6169	EXCL: sTopRock_dome
2	t0	0.0119	1.6306	EXCL: sTopRock_domeAir
3	t112	0.0119	1.6311	EXCL: NNVTMCPPMTTail
4	t	0.0119	1.6317	ALL
5	t127	0.0119	1.6393	EXCL: sInnerWater
6	t97	0.0120	1.6488	EXCL: sStrut
7	t29	0.0121	1.6558	EXCL: GLw2.equ_bt01_FlangeI_Web_FlangeII
8	t136	0.0121	1.6563	EXCL: sPoolLining
9	t95	0.0121	1.6576	EXCL: sAcrylic
10	t128	0.0121	1.6576	EXCL: sReflectorInCD
11	t105	0.0121	1.6657	EXCL: HamamatsuR12860Tail
12	t60	0.0121	1.6670	EXCL: GLb3.bt09_FlangeI_Web_FlangeII
13	t135	0.0121	1.6673	EXCL: sOuterWaterPool
14	t104	0.0122	1.6684	EXCL: HamamatsuR12860sMask
15	t137	0.0122	1.6687	EXCL: sBottomRock

SNAP_LIMIT=16 ./grabsnap.sh --selectspec only_elv_t --jpg --out
./image_grid.sh  ; ./pub1.sh  path-returned-by-above

`image_grid_elv_scan_exclude_grid.jpg`

Fastest 16 single excluded solid renders: unsuprisingly difficult to see any difference : other than performance

Prediction : 2022 April

Experience suggests that predictions are impossible due to:

changing geometry + geometry issues

BUT with caveat of excluding those

migrating existing functionality from old to new workflow is less predictable

Expect completed new workflow by end of April, what remains:

propagate_at_surface (the more complex propagate_at_boundary took 3~4 days)
photon step-by-step history recording (for A-B testing)
- not needed in production, but essential for validation
event handling : stream compaction for downloading hits
integrate new CSGOptiX workflow into G4Opticks (gensteps in, hits out)

Towards end of April

New workflow performance testing
Try to integrate MultiFilm PMT model (using LUT texture approach being developed by Yuxiang)
Prepare release for Installation onto GPU cluster

Prediction : After April : Geometry Optimization + Production Optimization

performance so far always limited by geometry

intersecting with geometry much more resource intensive that physics simulation
very sensitive to any geometry issues

Geometry Optimization : Measure Simulation+Render performance as make changes:

Expect geometry optimization needed to maximize performance even with XJfixtureConstruction skipped

adjust instancing criteria -> different geometry factorization
- more instancing expected to be faster but there is some balancing against overheads
apply CSG_CONTIGUOUS list-node to slower solids
try again suggestions from NVIDIA Engineers (PRD trick, flags, stack config)
- worthwhile to retry as bottlenecks eliminated : as differences may become visible

Production Optimization : New Territory, Experience needed to direct expts

Now many GPUs per node ?
Split/combine events depending on GPU memory
- large launches filling GPU memory expected to make better use of GPU
- BUT: what about implications for overall cluster throughput ?

Divider

"Extra" Slides

Following slides give background on

XJfixtureConstruction Solid
- two types of spurious intersects
- Tree balancing
Review Opticks CSG Tree Intersection algorithm
CSG list-nodes implementation, n-ary (not bin-ary) CSG intersection
- CSG_CONTIGUOUS Union (simular to G4MultiUnion)
- CSG_DISCONTIGUOUS Union (very fast imp for guaranteed discontiguous)
- CSG_OVERLAP Intersection (eg for G4Sphere with radius,phi,theta ranges)

Investigate XJfixtureConstruction Solid

spurious intersects not all removed by coincident face avoidance
reveals compatibility issue between CSG intersect alg and tree balancing
motivates development of list-nodes to remove need to balance
- CSG_CONTIGUOUS
- CSG_DISCONTIGUOUS
- CSG_OVERLAP
support for list-nodes : major generalization of Opticks CSG

cxr_geochain_XJfixtureConstruction_ALL_jpg

EYE=3,0.5,-1 UP=0,0,1 ZOOM=4 TMIN=0 PUB=1 ./cxr_geochain.sh

holy geometry : celtic cross (tubs + FOUR box) atop an altar (TWO box): union of annular tubs and SIX boxes

cxr_geochain_XJfixtureConstruction_ALL_otherside_jpg

EYE=-3,0.5,-1 UP=0,0,1 ZOOM=4 TMIN=0 PUB=otherside ./cxr_geochain.sh

cxr_geochain_XJfixtureConstruction_ALL_upper_jpg

EYE=-3,0.5,1 UP=0,0,1 ZOOM=4 TMIN=0 PUB=upper ./cxr_geochain.sh

cxr_geochain_XJfixtureConstruction_ALL_speckle_jpg

EYE=-3,0.5,0.8 UP=0,0,1 ZOOM=4 TMIN=2.7 CAM=0 PUB=speckle ./cxr_geochain.sh

XJfixtureConstruction CSG Tree (Geant4 Height 5, Opticks Height 6)

Geant4 : 6 G4Box, 1 G4Tubs with inner, 6 G4UnionSolid
Opticks : G4Tubs with inner => 2 Cylinder + Difference

NODE:13 PRIM:7

                                                                        Uni
                                                                        U
                                                                        9


                                                        Uni                             Uni
                                                        U                               U
                                                        7                               11


                                        Uni                     Box             Box             Box
                                        U                       U               U               U
                                        5                       8               10              12


                        Uni                     Box
                        U                       U
                        3                       6


        Uni                     Box
        U                       U
        1                       4


Tub             Box
U               U
0               2


0                  0.00            0.00           50.00          -50.00            0.00            0.00 zdelta

6                 11.50           11.50           65.00          -35.00           40.00           65.00 az1
-6               -11.50          -11.50           35.00          -65.00          -40.00          -65.00 az0

XJfixtureConstruction CSG Tree After Balancing (Height 3)


                                                                       un
                                                                      1
                                                                         0.00
                                                                        -0.00

                               un                                                                              un
                              2                                                                               3
                                 0.00                                                                            0.00
                                -0.00                                                                           -0.00

           un                                      un                                      in                                      un
          4                                       5                                       6                                       7
             0.00                                    0.00                                    0.00                                    0.00
            -0.00                                   -0.00                                   -0.00                                   -0.00

 bo                  bo                  bo                  bo                  cy                 !cy                  bo                  bo
8                   9                   10                  11                  12                  13                  14                  15
   6.50                6.50                6.50                6.50                6.50                6.57              -16.50               -6.50
  -6.50               -6.50               -6.50               -6.50               -6.50               -6.57              -33.50              -16.50

cxs_custom_XJfixtureConstructionXY

COMP=custom_XY ./cxs_solidXJfixture.sh: cross section view of standalone solidXJfixture
All interior red intersects are spurious : not easy to fix.: probable cause of history discrepancy

So many spurious intersects : Easier to start over with fewer constituent solids

`XJfixtureConstruction_YZ_two_types_of_spurious.jpg0`

`XJfixtureConstruction_YZ_two_types_of_spurious.jpg1`

`XJfixtureConstruction_XY_spurious_internal.jpg0`

rays generated in XY plane from genstep positions (yellow dots): yellow lines and arrows : rays and spurious internal intersects

`XJfixtureConstruction_XY_spurious_internal.jpg1`

"sd" signed distance, abs(sd) > epsilon
yellow : rays with spurious intersects
incompatibility of CSG alg with balanced trees ?

Fix attempts, so far unsuccessful

change CSG intersection alg
change balancing to preserve postorder sequence

Single hit CSG intersection Alg

Postorder sequence binary choice of intersects:

entire tree prior to "this" constituent
"this" constituent

Sometimes alg needs to "LOOP" to find "otherside"

Balancing => incomplete prior tree => spurious internal intersects

Algorithm for Ray Intersection onto Shape defined by CSG Tree

Tree intersect is constituent node intersect, which depends on:

CSG operators (UNION, INTERSECTION, DIFFERENCE)
comparison of distances to the constituent intersects

Algorithm:

classify constituent intersects Enter/Exit/Miss
determine action combining classifications using six lookup tables for UNION, INTERSECTION and DIFFERENCE operators and which intersect is closer.
actions RETURN_A RETURN_B RETURN_MISS push binary decision onto stack for use at higher level of tree
actions LOOP_A LOOP_B advance t_min for one constituent to get otherside and compare again

Surface-centric CSG (Opticks)

no use of "is this point inside the shape" queries

LOW RESOURCE USAGE : small stack of ~4*float4 isect

Volume-centric CSG (Geant4)

always asking "is this point inside the shape"

CSG Intersection Algorithm : UNION Lookup Table and Examples

+-----------------+-----------------+-----------------+-----------------+       UX1 : UNION FROM INSIDE ONE                     UX2 : UNION FROM OUTSIDE 0,1,2
| UNION           |                 |                 |                 |                                                       UX3 : UNION FROM INSIDE BOTH 3,4,5
|                 | B Enter         | B Exit          | B Miss          |
| A Closer        |                 |                 |                 |                    +-----------------------+                     +-----------------------+
| B Closer        |                 |                 |                 |                    |                     B |                     |                     B |
|                 |                 |                 |                 |         +------------------+               |          +------------------+               |
+-----------------+-----------------+-----------------+-----------------+         | A        |       |               |          | A        |       |               |
|                 |                 |                 |                 |         |          |       |               |          |          |       |               |
| A Enter         |                 |                 |                 |         |          |       |               |          |          |       |               |
|                 | RETURN_A        | LOOP_A          | RETURN_A        |         |   0- - - 1 - - - 2 - - - - - - -[3]     0 -[1]- - - -  2       |               |
|                 | RETURN_B        | RETURN_B        | RETURN_A        |         |          E       X               X          E          E   3 - 4 - - - - - - -[5]
|                 |                 |                 |                 |         |          |       |               |          |          |       X               X
+-----------------+-----------------+-----------------+-----------------+         |          +-------|---------------+          |          +-------|---------------+
|                 |                 |                 |                 |         |                  |                          |                  |
| A Exit          |                 |                 |                 |         |                  |                          |                  |
|                 | RETURN_A        | RETURN_B        | RETURN_A        |         +------------------+                          +------------------+
|                 | LOOP_B          | RETURN_A        | RETURN_A        |
|                 |                 |                 |                 |
+-----------------+-----------------+-----------------+-----------------+       0: origin                                     0: origin
|                 |                 |                 |                 |       1: B Enter                                    1: A Enter
| A Miss          |                 |                 |                 |       2: A Exit                                     2: B Enter
|                 | RETURN_B        | RETURN_B        | RETURN_MISS     |       1,2: B Closer        ==> LOOP_B               1,2: A Closer        ==> RETURN_A
|                 | RETURN_B        | RETURN_B        | RETURN_MISS     |       3: B Exit
|                 |                 |                 |                 |       2,3: A closer        ==> RETURN_B             3: origin
+-----------------+-----------------+-----------------+-----------------+                                                     4: A Exit
                                                                                                                              5: B Exit
Ordering states (Closer,Further)                                                                                              4,5 A Closer         ==> RETURN_B

(A Enter, B Enter)                                                                    UX4 : DISJOINT UNION  0,[1],2
    origin outside both -> return closest
                                                                                        +---------------------+                +---------------------+
(A Exit, B Exit)                                                                        | A                   |                |                   B |
    origin inside both  -> return furthest                                              |                     |                |                     |
                                                                                        |                     |                |                     |
(A Exit, B Enter)                                                                       |                     |                |                     |
    origin inside A                                                                     |                     |                |                     |
                                                                                        |         0- - - - - [1] -  - - - - - -2                     |
    * if A closer return A (means disjoint)                                             |                     X                E                     |
    * if B closer loop B (find otherside of B, then compare again)                      |                     |                |                     |
                                                                                        |                     |                |                     |
                                                                                        |                     |                |                     |
                                                                                        +---------------------+                +---------------------+

                                                                                    0: origin   1: A Exit   2: B Enter    1,2: A Closer => RETURN_A [1]

CSG Intersection Algorithm : INTERSECTION Lookup Table and Examples

+-----------------+-----------------+-----------------+-----------------+        Ordering states (Closer,Further)
| INTERSECTION    |                 |                 |                 |
|                 | B Enter         | B Exit          | B Miss          |        (Enter, Exit) -> Return Closer One : the Enter
| A Closer        |                 |                 |                 |
| B Closer        |                 |                 |                 |        (Exit, Exit) -> Return Closer Exit
|                 |                 |                 |                 |
+-----------------+-----------------+-----------------+-----------------+        (Exit, Enter) -> Loop Closer (the Exit)
|                 |                 |                 |                 |            One might first imagine that having just exited so there will be no otherside ?
| A Enter         |                 |                 |                 |            But that assumes simple convex constituent shapes whereas the alg needs to
|                 | LOOP_A          | RETURN_A        | RETURN_MISS     |            handle less simple shapes like torus or annulus
|                 | LOOP_B          | LOOP_B          | RETURN_MISS     |
|                 |                 |                 |                 |        (Enter, Enter) -> Loop Closer
+-----------------+-----------------+-----------------+-----------------+            Find otherside of Closer Enter
|                 |                 |                 |                 |            (by advancing t_min, intersecting that constituent again, comparing again)
| A Exit          |                 |                 |                 |
|                 | LOOP_A          | RETURN_A        | RETURN_MISS     |
|                 | RETURN_B        | RETURN_B        | RETURN_MISS     |                    +-------------+
|                 |                 |                 |                 |                    | A           |
+-----------------+-----------------+-----------------+-----------------+                    |      +------|-------+      0:Origin
|                 |                 |                 |                 |                    |      |      |     B |      1:A_Enter
| A Miss          |                 |                 |                 |              0- - -1 - - [2]- - -3       |      2:B_Enter
|                 | RETURN_MISS     | RETURN_MISS     | RETURN_MISS     |                    E      E      X       |      (1:A_Enter,2:B_Enter) => A_Closer => LOOP_A => 3
|                 | RETURN_MISS     | RETURN_MISS     | RETURN_MISS     |                    |      |      |       |      3:A Exit
|                 |                 |                 |                 |                    +------|------+       |      (2:B Enter,3:A Exit) => RETURN_B 2:B_Enter
+-----------------+-----------------+-----------------+-----------------+                           |              |
                                                                                                    +--------------+

`XJfixtureConstructionUnbalanced_XY_unbalanced_ok.jpg0`

`XJfixtureConstructionUnbalanced_XY_unbalanced_ok.jpg1`

CSG Compound "List-Nodes" => Much Smaller CSG trees

compound list nodes (references sub-node sequence by subNum subOffset)
greatly reduces size of tree

CSG_CONTIGUOUS Union: similar to G4MultiUnion : n-ary CSG intersection (needs distance sorting)
CSG_DISCONTIGUOUS Union: user guarantees no overlaps
CSG_OVERLAP Intersection: user guarantees overlap, aims to handle general G4Sphere with inner radius, thetacut and phicut

Communicating shape more precisely => better suited alg => less resources => faster

Generalized Opticks CSG into three levels : tree < node < leaf

Three levels avoids recursion in intersect (disallowed by NVIDIA OptiX)

CSG_CONTIGUOUS Union : n-ary (not bin-ary) CSG intersection

https://bitbucket.org/simoncblyth/opticks/src/master/CSG/csg_intersect_node.h intersect_node_contiguous

zeroth pass : find nearest_enter and count exits
zero exits => outside compound => return nearest_enter
first pass : collect enter distances, farthest_exit
order enter indices making enter distances ascend
- n-ary : store, sort enters (cf bin-ary : compare two)
- no tree overheads, but must store+sort distances
2nd pass : loop over enters in distance order
- contiguous requirement : tminAdvanced < farthest_exit
- find Exits for Enters that qualify as contiguous
return farthest_exit that qualifies as contiguous

         +----------------+     +-------------------+                  DISJOINT MUST BE DISQUALIFIED
         |B               |     |D                  |
    +----|----+      +----|-----|----+       +------|----------+             +-----------+
    |A   |    |      |C   |     |    |       |E     |          |             |           |
    |    |    |      |    |     |    |       |      |          |             |           |
    | 0 E1    X2     E3  X4    E5   X6      E7     X8        [X9]           E10         X11
    |    |    |      |    |     |    |       |      |          |             |           |
    |    |    |      |    |     |    |       |      |          |             |           |
    +----|----+      +----|-----|----+       +------|----------+             +-----------+
         |                |     |                   |
         +----------------+     +-------------------+

         E           E          E            E                               E
              X           X          X              X          X                         X

CSG_DISCONTIGUOUS Union : CSG intersection

https://bitbucket.org/simoncblyth/opticks/src/master/CSG/csg_intersect_node.h intersect_node_discontiguous

User guarantees : absolutely no overlapping between constituents

 +-------+          +-------+          +-------+          +-------+         +-------+
 |       |          |       |          |       |          |       |         |       |
 |       |          |       |          |       |          |       |         |       |
 +-------+          +-------+          +-------+          +-------+         +-------+

 +-------+          +-------+          +-------+          +-------+         +-------+
 |       |          |       |          |       |          |       |         |       |
 |       |          |       |          |       |          |       |         |       |
 +-------+          +-------+          +-------+          +-------+         +-------+

=> very simple low resource intersection : closest Enter or Exit
More closely suiting algorithm to geometry => better performance
this can help with "holes" subtracted from another solid : the "holes" usually do not overlap

CSG_OVERLAP Intersection : CSG intersection

https://bitbucket.org/simoncblyth/opticks/src/master/CSG/csg_intersect_node.h intersect_node_overlap

User guarantees : all constituents overlap => ( farthest_enter or nearest_exit ) with all overlap requirement:

farthest_enter < nearest_exit and max(enter_count, exit_count) == num_sub

                                                   1X,2M -> MISS
                                    +----------------1X-------------+
                                    |               /             C |
                                    |              0                |
                          +---------------------------------+       |
                          |         |                     B |       |
               +----------------------------+         0- - -1 - - - 2     2X,1M -> MISS
               | A        |         | . ABC |               X       X
               |          |         | . . . |               |       |
      0 - - - -1- - - - - 2- - - - [3]- - - 4               |       |
               E          E         E . . . X               |       |
               |          |         | . . . |               |       |
               |          |         | .0- -[1]- - - - - - - 2 - - - 3 - -
               |          |         | . . . X               X       X
               |          |         +-------------------------------+
               |          |                 |               |
         0 - - 1 - - - - -2 - - - - - - - - 3 - - - - - - - 4
               E          E   (2E,1M)-> M   X               X   (2X,1M) -> M
               |          +---------------------------------+
               |                            |
               +----------------------------+

Avoid tree overheads for suitable shapes (eg general sphere:inner-radius,phicut,thetacut)

`AltXJfixtureConstruction_XY_internal_spurious.jpg`

`AltXJfixtureConstruction_XY_sorted_enter_fix.jpg`