Opticks replaces Geant4 optical photon simulation with an equivalent implementation that benefits from state-of-the-art GPU ray tracing from NVIDIA OptiX. All optically relevant aspects of Geant4 context must be translated+copied to GPU:
- geometry : solids, structure, material+surface properties
- generation : Cerenkov + Scintillation (using Gensteps from Geant4)
- propagation : Rayleigh scattering, absorption, reemission, boundary
Achieving+maintaining equivalence is time consuming, BUT:
- transformative performance benefits >1000x Geant4
January : Updating to JUNO trunk geometry
- x3 slowdown compared to ~December trunk
- many unintended overlaps : demonstrated with new 2D slice renders
- XJfixtureConstruction : spurious internal intersects + performance bottleneck
- Anti-coincidence fixes do not remove all spurious intersects
- Reveals incompatibility between Opticks CSG intersection and balanced trees
- switch OFF tree balancing avoids issue, but very bad for performance : NEED BETTER SOLUTION
February : Generalize Opticks CSG to handle "list-nodes" : n-ary (not binary) CSG
- attempt to fix balancing (eg postorder preserving balance) and/or changing CSG intersection alg unsuccessful (difficult as problem only with high node count solids)
- balancing was always a hack (not a solution) for handling high node count solids
- conclude that although it may be possible to get balancing to work the best longterm solution is to take the simpler route of avoiding large trees by implementing "list-nodes"
- list-nodes avoid large trees, tree overheads, tree balancing
- CSG_CONTIGUOUS Union (Similar to G4MultiUnion)
- CSG_DISCONTIGUOUS Union
- CSG_OVERLAP Intersection
March : Geometry Optimization + Migrating Sim from Old to New Workflow
| Month-by-month progress notes | https://bitbucket.org/simoncblyth/opticks/src/master/notes/progress.rst |
| Day-by-day commit messages | https://bitbucket.org/simoncblyth/opticks/commits/ |
Majority of recent months dominated by sequence of geometry issues
New Workflow Packages
| Package | Role |
|---|---|
| CSG | GPU/CPU geometry model Inst/Solid/Prim/Node |
| CSG_GGeo | Translates GGeo into CSG |
| GeoChain | Geometry Translation Debug |
| QUDARap | CUDA Simulation |
| CSGOptiX | Geom (CSG) + Sim (QUDARap) |
Design goals:
Non-monolithic flexibility, easier: test + customize
Selection of most important Opticks packages
| Package | Indicative Functionality Examples |
|---|---|
| SysRap | SStr.hh, SSys.hh, NP.hh array |
| BoostRap | less need with C++17 (longterm:remove) |
| NPY | old array, primitives (longterm:remove) |
| OpticksCore | argument handling, central steering |
| GGeo | geometry model, GInstancer geo-factorization |
| ExtG4 | geometry translation from G4 to GGeo |
| CFG4 | Opticks Geant4 comparison eg CRecorder |
| CUDARap | curandState persist (removing) |
| ThrustRap | genstep->photon, photon index (removing) |
| OptiXRap | monolithic pre-7 simulation (removing) |
| G4Opticks | High Level Interface |
Cherry-picking from removed packages into new ones
switch(type){
case RNG_SEQUENCE: rng_sequence(num, ni_tranche_size) ; break ;
case WAVELENGTH_S : wavelength('S', num) ; break ;
case WAVELENGTH_C : wavelength('C', num) ; break ;
case SCINT_PHOTON_P: scint_photon(num); ; break ;
case CERENKOV_PHOTON_K: cerenkov_photon(num, print_id); ; break ;
case CERENKOV_PHOTON_ENPROP_E: cerenkov_photon_enprop(num, print_id); ; break ;
case GENERATE_PHOTON_G: generate_photon(); ; break ;
case BOUNDARY_LOOKUP_ALL_A: boundary_lookup_all() ; break ;
case BOUNDARY_LOOKUP_LINE_WATER_W: boundary_lookup_line("Water", x0, x1, nx) ; break ;
case BOUNDARY_LOOKUP_LINE_LS_L: boundary_lookup_line("LS", x0, x1, nx) ; break ;
case PROP_LOOKUP_Y: prop_lookup(-1, -1.f,16.f,1701) ; break ;
case FILL_STATE_0: fill_state(0) ; break ;
case RAYLEIGH_SCATTER_ALIGN: photon_launch_generate(num, type) ; break ;
case PROPAGATE_TO_BOUNDARY: photon_launch_generate(8, type) ; break ;
case PROPAGATE_AT_SURFACE: photon_launch_generate(8, type) ; break ;
case PROPAGATE_AT_BOUNDARY:
case PROPAGATE_AT_BOUNDARY_NORMAL_INCIDENCE:
photon_launch_generate(num, type) ; break ;
case HEMISPHERE_S_POLARIZED:
case HEMISPHERE_P_POLARIZED:
case HEMISPHERE_X_POLARIZED:
photon_launch_generate(num, type) ; break ;
case PROPAGATE_AT_BOUNDARY_S_POLARIZED:
case PROPAGATE_AT_BOUNDARY_P_POLARIZED:
case PROPAGATE_AT_BOUNDARY_X_POLARIZED:
photon_launch_mutate(num, type) ; break ;
}
Provide photon arrays to compare against Opticks
Mocking just enough Geant4 environment to invoke PostStepDoIt of process
Common Inputs to both Opticks and Geant4 tests:
Small amount of Geant4 (and Opticks) code being run..
1M photons all incident at origin (red:momentum, grey:polarization)
S-polarization direction transverse to plane of incidence (and momentum)
1M photons all incident at origin (red:momentum, grey:polarization)
P-polarization direction in plane of incidence, transverse to momentum
1M photons all incident at origin (red:momentum, grey:polarization)
"X"-polarized : equal admixture of S and P polarized
epsilon:BoundaryStandalone blyth$ TEST=propagate_at_boundary_s_polarized NUM=1000000 ./G4OpBoundaryProcessTest.sh cf a.shape (1000000, 4, 4) : /tmp/blyth/opticks/QSimTest/propagate_at_boundary_s_polarized/p.npy b.shape (1000000, 4, 4) : /tmp/blyth/opticks/G4OpBoundaryProcessTest/propagate_at_boundary_s_polarized/p.npy a_flag=a[:,3,3].view(np.uint32) : [2048 2048 2048 ... 2048 2048 1024] b_flag=b[:,3,3].view(np.uint32) : [2048 2048 2048 ... 2048 2048 1024] w_flag=np.where( a_flag != b_flag ) : (array([209411]),) ## 1 REFLECT/TRANSMIT flag mismatch ua_flag=np.unique(a_flag, return_counts=True) : (array([1024, 2048], dtype=uint32), array([ 45024, 954976])) ub_flag=np.unique(b_flag, return_counts=True) : (array([1024, 2048], dtype=uint32), array([ 45025, 954975])) a_TransCoeff=a[:,1,3] : [0.784 0.799 0.588 ... 0.853 0.481 0.959] b_TransCoeff=b[:,1,3] : [0.784 0.799 0.588 ... 0.853 0.481 0.959] w_TransCoeff=np.where( np.abs( a_TransCoeff - b_TransCoeff) > 1e-6 ) : (array([], dtype=int64),) a_flat=a[:,0,3] : [0.438 0.46 0.25 ... 0.557 0.184 0.992] b_flat=b[:,0,3] : [0.438 0.46 0.25 ... 0.557 0.184 0.992] w_flat=np.where(a_flat != b_flat) : (array([], dtype=int64),) w_ab0=np.where( np.abs(a[:,0] - b[:,0]) > 1e-6 ) : (array([], dtype=int64), array([], dtype=int64)) w_ab1=np.where( np.abs(a[:,1] - b[:,1]) > 1e-6 ) : (array([209411, 209411, 209411]), array([0, 1, 2])) ## diff direction w_ab2=np.where( np.abs(a[:,2] - b[:,2]) > 1e-6 ) : (array([209411, 209411]), array([0, 1])) ## diff polarization w_ab3=np.where( np.abs(a[:,3] - b[:,3]) > 1e-6 ) : (array([], dtype=int64), array([], dtype=int64)) In [1]: a[209411] Out[1]: u_reflect > TransCoeff array([[ -0.136, -0.264, -0.955, 0.955], In [3]: a[209411,0,3] > a[209411,1,3] [ -0.09 , -0.176, -0.98 , 0.955], Out[3]: False BOUNDARY_TRANSMIT [ 0.89 , -0.456, -0. , 500. ], [ 0.89 , -0.456, 0. , 0. ]], dtype=float32) In [2]: b[209411] Out[2]: array([[ -0.136, -0.264, -0.955, 0.955], In [4]: b[209411,0,3] > b[209411,1,3] [ -0.136, -0.264, 0.955, 0.955], Out[4]: True BOUNDARY_REFLECT $ [ -0.89 , 0.456, -0. , 500. ], [ 0.89 , -0.456, 0. , 0. ]], dtype=float32) One-in-a-million on TransCoeff REFLECT/TRANSMIT cut edge In [5]: a[209411,0,3] == b[209411,0,3] In [6]: a[209411,1,3] - b[209411,1,3] Out[5]: True Out[6]: 5.9604645e-08
This Dec->Jan slowdown motivated development of dynamic Prim selection
cd ~/opticks/CSGOptiX ./grabsnap.sh --reverse --selectspec not_elv_t --candle t103 --rst
ELV including single Prim
JUNO Geometry (January trunk) has 140 solids
| idx | -e | time(s) | relative | enabled geometry description 41c046fe |
|---|---|---|---|---|
| 0 | 103 | 0.0225 | 3.0917 | ONLY: solidXJfixture |
| 1 | 112 | 0.0051 | 0.7009 | ONLY: NNVTMCPPMTTail |
| 2 | 105 | 0.0027 | 0.3644 | ONLY: HamamatsuR12860Tail |
| 3 | 111 | 0.0025 | 0.3422 | ONLY: NNVTMCPPMTsMask |
| 4 | 117 | 0.0022 | 0.3051 | ONLY: NNVTMCPPMTsMask_virtual |
| 5 | 0 | 0.0018 | 0.2496 | ONLY: sTopRock_domeAir |
| 6 | 1 | 0.0018 | 0.2474 | ONLY: sTopRock_dome |
| 7 | 94 | 0.0018 | 0.2406 | ONLY: sTarget |
| 8 | 123 | 0.0017 | 0.2392 | ONLY: sChimneyAcrylic |
| 9 | 127 | 0.0017 | 0.2378 | ONLY: sInnerWater |
| 10 | 128 | 0.0017 | 0.2366 | ONLY: sReflectorInCD |
| 11 | 95 | 0.0017 | 0.2363 | ONLY: sAcrylic |
| 12 | 14 | 0.0016 | 0.2160 | ONLY: sAirTT |
| 13 | 15 | 0.0016 | 0.2142 | ONLY: sExpHall |
| 14 | 109 | 0.0015 | 0.2025 | ONLY: HamamatsuR12860_PMT_20inch_pmt_solid_1_4 |
| 15 | 104 | 0.0013 | 0.1847 | ONLY: HamamatsuR12860sMask |
SNAP_LIMIT=16 ./grabsnap.sh --reverse --selectspec not_elv_t --jpg --out ./image_grid.sh ; ./pub1.sh path-returned-by-above
cd ~/opticks/CSGOptiX ./grabsnap.sh --selectspec only_elv_t --candle t103 --rst
ELV excluding single Prim
| idx | -e | time(s) | relative | enabled geometry description 41c046fe |
|---|---|---|---|---|
| 0 | t103 | 0.0073 | 1.0000 | EXCL: solidXJfixture |
| 1 | t1 | 0.0118 | 1.6169 | EXCL: sTopRock_dome |
| 2 | t0 | 0.0119 | 1.6306 | EXCL: sTopRock_domeAir |
| 3 | t112 | 0.0119 | 1.6311 | EXCL: NNVTMCPPMTTail |
| 4 | t | 0.0119 | 1.6317 | ALL |
| 5 | t127 | 0.0119 | 1.6393 | EXCL: sInnerWater |
| 6 | t97 | 0.0120 | 1.6488 | EXCL: sStrut |
| 7 | t29 | 0.0121 | 1.6558 | EXCL: GLw2.equ_bt01_FlangeI_Web_FlangeII |
| 8 | t136 | 0.0121 | 1.6563 | EXCL: sPoolLining |
| 9 | t95 | 0.0121 | 1.6576 | EXCL: sAcrylic |
| 10 | t128 | 0.0121 | 1.6576 | EXCL: sReflectorInCD |
| 11 | t105 | 0.0121 | 1.6657 | EXCL: HamamatsuR12860Tail |
| 12 | t60 | 0.0121 | 1.6670 | EXCL: GLb3.bt09_FlangeI_Web_FlangeII |
| 13 | t135 | 0.0121 | 1.6673 | EXCL: sOuterWaterPool |
| 14 | t104 | 0.0122 | 1.6684 | EXCL: HamamatsuR12860sMask |
| 15 | t137 | 0.0122 | 1.6687 | EXCL: sBottomRock |
SNAP_LIMIT=16 ./grabsnap.sh --selectspec only_elv_t --jpg --out ./image_grid.sh ; ./pub1.sh path-returned-by-above
Experience suggests that predictions are impossible due to:
BUT with caveat of excluding those
Expect completed new workflow by end of April, what remains:
Towards end of April
performance so far always limited by geometry
Geometry Optimization : Measure Simulation+Render performance as make changes:
Expect geometry optimization needed to maximize performance even with XJfixtureConstruction skipped
Production Optimization : New Territory, Experience needed to direct expts
Following slides give background on
EYE=3,0.5,-1 UP=0,0,1 ZOOM=4 TMIN=0 PUB=1 ./cxr_geochain.sh
EYE=-3,0.5,-1 UP=0,0,1 ZOOM=4 TMIN=0 PUB=otherside ./cxr_geochain.sh
EYE=-3,0.5,1 UP=0,0,1 ZOOM=4 TMIN=0 PUB=upper ./cxr_geochain.sh
EYE=-3,0.5,0.8 UP=0,0,1 ZOOM=4 TMIN=2.7 CAM=0 PUB=speckle ./cxr_geochain.sh
NODE:13 PRIM:7
Uni
U
9
Uni Uni
U U
7 11
Uni Box Box Box
U U U U
5 8 10 12
Uni Box
U U
3 6
Uni Box
U U
1 4
Tub Box
U U
0 2
0 0.00 0.00 50.00 -50.00 0.00 0.00 zdelta
6 11.50 11.50 65.00 -35.00 40.00 65.00 az1
-6 -11.50 -11.50 35.00 -65.00 -40.00 -65.00 az0
un
1
0.00
-0.00
un un
2 3
0.00 0.00
-0.00 -0.00
un un in un
4 5 6 7
0.00 0.00 0.00 0.00
-0.00 -0.00 -0.00 -0.00
bo bo bo bo cy !cy bo bo
8 9 10 11 12 13 14 15
6.50 6.50 6.50 6.50 6.50 6.57 -16.50 -6.50
-6.50 -6.50 -6.50 -6.50 -6.50 -6.57 -33.50 -16.50
So many spurious intersects : Easier to start over with fewer constituent solids
Two types of spurious
White internal intersects : sd ~ 0
Yellow internal intersects : abs(sd) > epsilon
Balanced tree spurious intersects
Fix attempts, so far unsuccessful
Single hit CSG intersection Alg
Postorder sequence binary choice of intersects:
Sometimes alg needs to "LOOP" to find "otherside"
Constituent Intersect Classification
Tree intersect is constituent node intersect, which depends on:
Algorithm:
no use of "is this point inside the shape" queries
+-----------------+-----------------+-----------------+-----------------+ UX1 : UNION FROM INSIDE ONE UX2 : UNION FROM OUTSIDE 0,1,2
| UNION | | | | UX3 : UNION FROM INSIDE BOTH 3,4,5
| | B Enter | B Exit | B Miss |
| A Closer | | | | +-----------------------+ +-----------------------+
| B Closer | | | | | B | | B |
| | | | | +------------------+ | +------------------+ |
+-----------------+-----------------+-----------------+-----------------+ | A | | | | A | | |
| | | | | | | | | | | | |
| A Enter | | | | | | | | | | | |
| | RETURN_A | LOOP_A | RETURN_A | | 0- - - 1 - - - 2 - - - - - - -[3] 0 -[1]- - - - 2 | |
| | RETURN_B | RETURN_B | RETURN_A | | E X X E E 3 - 4 - - - - - - -[5]
| | | | | | | | | | | X X
+-----------------+-----------------+-----------------+-----------------+ | +-------|---------------+ | +-------|---------------+
| | | | | | | | |
| A Exit | | | | | | | |
| | RETURN_A | RETURN_B | RETURN_A | +------------------+ +------------------+
| | LOOP_B | RETURN_A | RETURN_A |
| | | | |
+-----------------+-----------------+-----------------+-----------------+ 0: origin 0: origin
| | | | | 1: B Enter 1: A Enter
| A Miss | | | | 2: A Exit 2: B Enter
| | RETURN_B | RETURN_B | RETURN_MISS | 1,2: B Closer ==> LOOP_B 1,2: A Closer ==> RETURN_A
| | RETURN_B | RETURN_B | RETURN_MISS | 3: B Exit
| | | | | 2,3: A closer ==> RETURN_B 3: origin
+-----------------+-----------------+-----------------+-----------------+ 4: A Exit
5: B Exit
Ordering states (Closer,Further) 4,5 A Closer ==> RETURN_B
(A Enter, B Enter) UX4 : DISJOINT UNION 0,[1],2
origin outside both -> return closest
+---------------------+ +---------------------+
(A Exit, B Exit) | A | | B |
origin inside both -> return furthest | | | |
| | | |
(A Exit, B Enter) | | | |
origin inside A | | | |
| 0- - - - - [1] - - - - - - -2 |
* if A closer return A (means disjoint) | X E |
* if B closer loop B (find otherside of B, then compare again) | | | |
| | | |
| | | |
+---------------------+ +---------------------+
0: origin 1: A Exit 2: B Enter 1,2: A Closer => RETURN_A [1]
+-----------------+-----------------+-----------------+-----------------+ Ordering states (Closer,Further)
| INTERSECTION | | | |
| | B Enter | B Exit | B Miss | (Enter, Exit) -> Return Closer One : the Enter
| A Closer | | | |
| B Closer | | | | (Exit, Exit) -> Return Closer Exit
| | | | |
+-----------------+-----------------+-----------------+-----------------+ (Exit, Enter) -> Loop Closer (the Exit)
| | | | | One might first imagine that having just exited so there will be no otherside ?
| A Enter | | | | But that assumes simple convex constituent shapes whereas the alg needs to
| | LOOP_A | RETURN_A | RETURN_MISS | handle less simple shapes like torus or annulus
| | LOOP_B | LOOP_B | RETURN_MISS |
| | | | | (Enter, Enter) -> Loop Closer
+-----------------+-----------------+-----------------+-----------------+ Find otherside of Closer Enter
| | | | | (by advancing t_min, intersecting that constituent again, comparing again)
| A Exit | | | |
| | LOOP_A | RETURN_A | RETURN_MISS |
| | RETURN_B | RETURN_B | RETURN_MISS | +-------------+
| | | | | | A |
+-----------------+-----------------+-----------------+-----------------+ | +------|-------+ 0:Origin
| | | | | | | | B | 1:A_Enter
| A Miss | | | | 0- - -1 - - [2]- - -3 | 2:B_Enter
| | RETURN_MISS | RETURN_MISS | RETURN_MISS | E E X | (1:A_Enter,2:B_Enter) => A_Closer => LOOP_A => 3
| | RETURN_MISS | RETURN_MISS | RETURN_MISS | | | | | 3:A Exit
| | | | | +------|------+ | (2:B Enter,3:A Exit) => RETURN_B 2:B_Enter
+-----------------+-----------------+-----------------+-----------------+ | |
+--------------+
Proceed with Unbalanced Trees ?
Speedup Unbalanced ?
Communicating shape more precisely => better suited alg => less resources => faster
Generalized Opticks CSG into three levels : tree < node < leaf
Three levels avoids recursion in intersect (disallowed by NVIDIA OptiX)
Alg works : but many TODOs
+----------------+ +-------------------+ DISJOINT MUST BE DISQUALIFIED
|B | |D |
+----|----+ +----|-----|----+ +------|----------+ +-----------+
|A | | |C | | | |E | | | |
| | | | | | | | | | | |
| 0 E1 X2 E3 X4 E5 X6 E7 X8 [X9] E10 X11
| | | | | | | | | | | |
| | | | | | | | | | | |
+----|----+ +----|-----|----+ +------|----------+ +-----------+
| | | |
+----------------+ +-------------------+
E E E E E
X X X X X X
User guarantees : absolutely no overlapping between constituents
+-------+ +-------+ +-------+ +-------+ +-------+ | | | | | | | | | | | | | | | | | | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | | | | | | | | | | | | | | | | | | +-------+ +-------+ +-------+ +-------+ +-------+
User guarantees : all constituents overlap => ( farthest_enter or nearest_exit ) with all overlap requirement:
farthest_enter < nearest_exit and max(enter_count, exit_count) == num_sub
1X,2M -> MISS
+----------------1X-------------+
| / C |
| 0 |
+---------------------------------+ |
| | B | |
+----------------------------+ 0- - -1 - - - 2 2X,1M -> MISS
| A | | . ABC | X X
| | | . . . | | |
0 - - - -1- - - - - 2- - - - [3]- - - 4 | |
E E E . . . X | |
| | | . . . | | |
| | | .0- -[1]- - - - - - - 2 - - - 3 - -
| | | . . . X X X
| | +-------------------------------+
| | | |
0 - - 1 - - - - -2 - - - - - - - - 3 - - - - - - - 4
E E (2E,1M)-> M X X (2X,1M) -> M
| +---------------------------------+
| |
+----------------------------+
Avoid tree overheads for suitable shapes (eg general sphere:inner-radius,phicut,thetacut)