Opticks replaces Geant4 optical photon simulation with an equivalent implementation that benefits from state-of-the-art GPU ray tracing from NVIDIA OptiX. All optically relevant aspects of Geant4 context must be translated+copied to GPU:
- geometry : solids, structure, material+surface properties
- generation : Cerenkov + Scintillation (using Gensteps from Geant4)
- propagation : Rayleigh scattering, absorption, reemission, boundary
Achieving+maintaining equivalence is time consuming, BUT:
- transformative performance benefits >1000x Geant4
January : Updating to JUNO trunk geometry
- x3 slowdown compared to ~December trunk
- many unintended overlaps : demonstrated with new 2D slice renders
- XJfixtureConstruction : spurious internal intersects + performance bottleneck
- Anti-coincidence fixes do not remove all spurious intersects
- Reveals incompatibility between Opticks CSG intersection and balanced trees
- switch OFF tree balancing avoids issue, but very bad for performance : NEED BETTER SOLUTION
February : Generalize Opticks CSG to handle "list-nodes" : n-ary (not binary) CSG
- attempt to fix balancing (eg postorder preserving balance) and/or changing CSG intersection alg unsuccessful (difficult as problem only with high node count solids)
- balancing was always a hack (not a solution) for handling high node count solids
- conclude that although it may be possible to get balancing to work the best longterm solution is to take the simpler route of avoiding large trees by implementing "list-nodes"
- list-nodes avoid large trees, tree overheads, tree balancing
- CSG_CONTIGUOUS Union (Similar to G4MultiUnion)
- CSG_DISCONTIGUOUS Union
- CSG_OVERLAP Intersection
March : Geometry Optimization + Migrating Sim from Old to New Workflow
Month-by-month progress notes | https://bitbucket.org/simoncblyth/opticks/src/master/notes/progress.rst |
Day-by-day commit messages | https://bitbucket.org/simoncblyth/opticks/commits/ |
Majority of recent months dominated by sequence of geometry issues
New Workflow Packages
Package | Role |
---|---|
CSG | GPU/CPU geometry model Inst/Solid/Prim/Node |
CSG_GGeo | Translates GGeo into CSG |
GeoChain | Geometry Translation Debug |
QUDARap | CUDA Simulation |
CSGOptiX | Geom (CSG) + Sim (QUDARap) |
Design goals:
Non-monolithic flexibility, easier: test + customize
Selection of most important Opticks packages
Package | Indicative Functionality Examples |
---|---|
SysRap | SStr.hh, SSys.hh, NP.hh array |
BoostRap | less need with C++17 (longterm:remove) |
NPY | old array, primitives (longterm:remove) |
OpticksCore | argument handling, central steering |
GGeo | geometry model, GInstancer geo-factorization |
ExtG4 | geometry translation from G4 to GGeo |
CFG4 | Opticks Geant4 comparison eg CRecorder |
CUDARap | curandState persist (removing) |
ThrustRap | genstep->photon, photon index (removing) |
OptiXRap | monolithic pre-7 simulation (removing) |
G4Opticks | High Level Interface |
Cherry-picking from removed packages into new ones
switch(type){ case RNG_SEQUENCE: rng_sequence(num, ni_tranche_size) ; break ; case WAVELENGTH_S : wavelength('S', num) ; break ; case WAVELENGTH_C : wavelength('C', num) ; break ; case SCINT_PHOTON_P: scint_photon(num); ; break ; case CERENKOV_PHOTON_K: cerenkov_photon(num, print_id); ; break ; case CERENKOV_PHOTON_ENPROP_E: cerenkov_photon_enprop(num, print_id); ; break ; case GENERATE_PHOTON_G: generate_photon(); ; break ; case BOUNDARY_LOOKUP_ALL_A: boundary_lookup_all() ; break ; case BOUNDARY_LOOKUP_LINE_WATER_W: boundary_lookup_line("Water", x0, x1, nx) ; break ; case BOUNDARY_LOOKUP_LINE_LS_L: boundary_lookup_line("LS", x0, x1, nx) ; break ; case PROP_LOOKUP_Y: prop_lookup(-1, -1.f,16.f,1701) ; break ; case FILL_STATE_0: fill_state(0) ; break ; case RAYLEIGH_SCATTER_ALIGN: photon_launch_generate(num, type) ; break ; case PROPAGATE_TO_BOUNDARY: photon_launch_generate(8, type) ; break ; case PROPAGATE_AT_SURFACE: photon_launch_generate(8, type) ; break ; case PROPAGATE_AT_BOUNDARY: case PROPAGATE_AT_BOUNDARY_NORMAL_INCIDENCE: photon_launch_generate(num, type) ; break ; case HEMISPHERE_S_POLARIZED: case HEMISPHERE_P_POLARIZED: case HEMISPHERE_X_POLARIZED: photon_launch_generate(num, type) ; break ; case PROPAGATE_AT_BOUNDARY_S_POLARIZED: case PROPAGATE_AT_BOUNDARY_P_POLARIZED: case PROPAGATE_AT_BOUNDARY_X_POLARIZED: photon_launch_mutate(num, type) ; break ; }
Provide photon arrays to compare against Opticks
Mocking just enough Geant4 environment to invoke PostStepDoIt of process
Common Inputs to both Opticks and Geant4 tests:
Small amount of Geant4 (and Opticks) code being run..
1M photons all incident at origin (red:momentum, grey:polarization)
S-polarization direction transverse to plane of incidence (and momentum)
1M photons all incident at origin (red:momentum, grey:polarization)
P-polarization direction in plane of incidence, transverse to momentum
1M photons all incident at origin (red:momentum, grey:polarization)
"X"-polarized : equal admixture of S and P polarized
epsilon:BoundaryStandalone blyth$ TEST=propagate_at_boundary_s_polarized NUM=1000000 ./G4OpBoundaryProcessTest.sh cf a.shape (1000000, 4, 4) : /tmp/blyth/opticks/QSimTest/propagate_at_boundary_s_polarized/p.npy b.shape (1000000, 4, 4) : /tmp/blyth/opticks/G4OpBoundaryProcessTest/propagate_at_boundary_s_polarized/p.npy a_flag=a[:,3,3].view(np.uint32) : [2048 2048 2048 ... 2048 2048 1024] b_flag=b[:,3,3].view(np.uint32) : [2048 2048 2048 ... 2048 2048 1024] w_flag=np.where( a_flag != b_flag ) : (array([209411]),) ## 1 REFLECT/TRANSMIT flag mismatch ua_flag=np.unique(a_flag, return_counts=True) : (array([1024, 2048], dtype=uint32), array([ 45024, 954976])) ub_flag=np.unique(b_flag, return_counts=True) : (array([1024, 2048], dtype=uint32), array([ 45025, 954975])) a_TransCoeff=a[:,1,3] : [0.784 0.799 0.588 ... 0.853 0.481 0.959] b_TransCoeff=b[:,1,3] : [0.784 0.799 0.588 ... 0.853 0.481 0.959] w_TransCoeff=np.where( np.abs( a_TransCoeff - b_TransCoeff) > 1e-6 ) : (array([], dtype=int64),) a_flat=a[:,0,3] : [0.438 0.46 0.25 ... 0.557 0.184 0.992] b_flat=b[:,0,3] : [0.438 0.46 0.25 ... 0.557 0.184 0.992] w_flat=np.where(a_flat != b_flat) : (array([], dtype=int64),) w_ab0=np.where( np.abs(a[:,0] - b[:,0]) > 1e-6 ) : (array([], dtype=int64), array([], dtype=int64)) w_ab1=np.where( np.abs(a[:,1] - b[:,1]) > 1e-6 ) : (array([209411, 209411, 209411]), array([0, 1, 2])) ## diff direction w_ab2=np.where( np.abs(a[:,2] - b[:,2]) > 1e-6 ) : (array([209411, 209411]), array([0, 1])) ## diff polarization w_ab3=np.where( np.abs(a[:,3] - b[:,3]) > 1e-6 ) : (array([], dtype=int64), array([], dtype=int64)) In [1]: a[209411] Out[1]: u_reflect > TransCoeff array([[ -0.136, -0.264, -0.955, 0.955], In [3]: a[209411,0,3] > a[209411,1,3] [ -0.09 , -0.176, -0.98 , 0.955], Out[3]: False BOUNDARY_TRANSMIT [ 0.89 , -0.456, -0. , 500. ], [ 0.89 , -0.456, 0. , 0. ]], dtype=float32) In [2]: b[209411] Out[2]: array([[ -0.136, -0.264, -0.955, 0.955], In [4]: b[209411,0,3] > b[209411,1,3] [ -0.136, -0.264, 0.955, 0.955], Out[4]: True BOUNDARY_REFLECT $ [ -0.89 , 0.456, -0. , 500. ], [ 0.89 , -0.456, 0. , 0. ]], dtype=float32) One-in-a-million on TransCoeff REFLECT/TRANSMIT cut edge In [5]: a[209411,0,3] == b[209411,0,3] In [6]: a[209411,1,3] - b[209411,1,3] Out[5]: True Out[6]: 5.9604645e-08
This Dec->Jan slowdown motivated development of dynamic Prim selection
cd ~/opticks/CSGOptiX ./grabsnap.sh --reverse --selectspec not_elv_t --candle t103 --rst
ELV including single Prim
JUNO Geometry (January trunk) has 140 solids
idx | -e | time(s) | relative | enabled geometry description 41c046fe |
---|---|---|---|---|
0 | 103 | 0.0225 | 3.0917 | ONLY: solidXJfixture |
1 | 112 | 0.0051 | 0.7009 | ONLY: NNVTMCPPMTTail |
2 | 105 | 0.0027 | 0.3644 | ONLY: HamamatsuR12860Tail |
3 | 111 | 0.0025 | 0.3422 | ONLY: NNVTMCPPMTsMask |
4 | 117 | 0.0022 | 0.3051 | ONLY: NNVTMCPPMTsMask_virtual |
5 | 0 | 0.0018 | 0.2496 | ONLY: sTopRock_domeAir |
6 | 1 | 0.0018 | 0.2474 | ONLY: sTopRock_dome |
7 | 94 | 0.0018 | 0.2406 | ONLY: sTarget |
8 | 123 | 0.0017 | 0.2392 | ONLY: sChimneyAcrylic |
9 | 127 | 0.0017 | 0.2378 | ONLY: sInnerWater |
10 | 128 | 0.0017 | 0.2366 | ONLY: sReflectorInCD |
11 | 95 | 0.0017 | 0.2363 | ONLY: sAcrylic |
12 | 14 | 0.0016 | 0.2160 | ONLY: sAirTT |
13 | 15 | 0.0016 | 0.2142 | ONLY: sExpHall |
14 | 109 | 0.0015 | 0.2025 | ONLY: HamamatsuR12860_PMT_20inch_pmt_solid_1_4 |
15 | 104 | 0.0013 | 0.1847 | ONLY: HamamatsuR12860sMask |
SNAP_LIMIT=16 ./grabsnap.sh --reverse --selectspec not_elv_t --jpg --out ./image_grid.sh ; ./pub1.sh path-returned-by-above
cd ~/opticks/CSGOptiX ./grabsnap.sh --selectspec only_elv_t --candle t103 --rst
ELV excluding single Prim
idx | -e | time(s) | relative | enabled geometry description 41c046fe |
---|---|---|---|---|
0 | t103 | 0.0073 | 1.0000 | EXCL: solidXJfixture |
1 | t1 | 0.0118 | 1.6169 | EXCL: sTopRock_dome |
2 | t0 | 0.0119 | 1.6306 | EXCL: sTopRock_domeAir |
3 | t112 | 0.0119 | 1.6311 | EXCL: NNVTMCPPMTTail |
4 | t | 0.0119 | 1.6317 | ALL |
5 | t127 | 0.0119 | 1.6393 | EXCL: sInnerWater |
6 | t97 | 0.0120 | 1.6488 | EXCL: sStrut |
7 | t29 | 0.0121 | 1.6558 | EXCL: GLw2.equ_bt01_FlangeI_Web_FlangeII |
8 | t136 | 0.0121 | 1.6563 | EXCL: sPoolLining |
9 | t95 | 0.0121 | 1.6576 | EXCL: sAcrylic |
10 | t128 | 0.0121 | 1.6576 | EXCL: sReflectorInCD |
11 | t105 | 0.0121 | 1.6657 | EXCL: HamamatsuR12860Tail |
12 | t60 | 0.0121 | 1.6670 | EXCL: GLb3.bt09_FlangeI_Web_FlangeII |
13 | t135 | 0.0121 | 1.6673 | EXCL: sOuterWaterPool |
14 | t104 | 0.0122 | 1.6684 | EXCL: HamamatsuR12860sMask |
15 | t137 | 0.0122 | 1.6687 | EXCL: sBottomRock |
SNAP_LIMIT=16 ./grabsnap.sh --selectspec only_elv_t --jpg --out ./image_grid.sh ; ./pub1.sh path-returned-by-above
Experience suggests that predictions are impossible due to:
BUT with caveat of excluding those
Expect completed new workflow by end of April, what remains:
Towards end of April
performance so far always limited by geometry
Geometry Optimization : Measure Simulation+Render performance as make changes:
Expect geometry optimization needed to maximize performance even with XJfixtureConstruction skipped
Production Optimization : New Territory, Experience needed to direct expts
Following slides give background on
EYE=3,0.5,-1 UP=0,0,1 ZOOM=4 TMIN=0 PUB=1 ./cxr_geochain.sh
EYE=-3,0.5,-1 UP=0,0,1 ZOOM=4 TMIN=0 PUB=otherside ./cxr_geochain.sh
EYE=-3,0.5,1 UP=0,0,1 ZOOM=4 TMIN=0 PUB=upper ./cxr_geochain.sh
EYE=-3,0.5,0.8 UP=0,0,1 ZOOM=4 TMIN=2.7 CAM=0 PUB=speckle ./cxr_geochain.sh
NODE:13 PRIM:7 Uni U 9 Uni Uni U U 7 11 Uni Box Box Box U U U U 5 8 10 12 Uni Box U U 3 6 Uni Box U U 1 4 Tub Box U U 0 2 0 0.00 0.00 50.00 -50.00 0.00 0.00 zdelta 6 11.50 11.50 65.00 -35.00 40.00 65.00 az1 -6 -11.50 -11.50 35.00 -65.00 -40.00 -65.00 az0
un 1 0.00 -0.00 un un 2 3 0.00 0.00 -0.00 -0.00 un un in un 4 5 6 7 0.00 0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 bo bo bo bo cy !cy bo bo 8 9 10 11 12 13 14 15 6.50 6.50 6.50 6.50 6.50 6.57 -16.50 -6.50 -6.50 -6.50 -6.50 -6.50 -6.50 -6.57 -33.50 -16.50
So many spurious intersects : Easier to start over with fewer constituent solids
Two types of spurious
White internal intersects : sd ~ 0
Yellow internal intersects : abs(sd) > epsilon
Balanced tree spurious intersects
Fix attempts, so far unsuccessful
Single hit CSG intersection Alg
Postorder sequence binary choice of intersects:
Sometimes alg needs to "LOOP" to find "otherside"
Constituent Intersect Classification
Tree intersect is constituent node intersect, which depends on:
Algorithm:
no use of "is this point inside the shape" queries
+-----------------+-----------------+-----------------+-----------------+ UX1 : UNION FROM INSIDE ONE UX2 : UNION FROM OUTSIDE 0,1,2 | UNION | | | | UX3 : UNION FROM INSIDE BOTH 3,4,5 | | B Enter | B Exit | B Miss | | A Closer | | | | +-----------------------+ +-----------------------+ | B Closer | | | | | B | | B | | | | | | +------------------+ | +------------------+ | +-----------------+-----------------+-----------------+-----------------+ | A | | | | A | | | | | | | | | | | | | | | | | A Enter | | | | | | | | | | | | | | RETURN_A | LOOP_A | RETURN_A | | 0- - - 1 - - - 2 - - - - - - -[3] 0 -[1]- - - - 2 | | | | RETURN_B | RETURN_B | RETURN_A | | E X X E E 3 - 4 - - - - - - -[5] | | | | | | | | | | | X X +-----------------+-----------------+-----------------+-----------------+ | +-------|---------------+ | +-------|---------------+ | | | | | | | | | | A Exit | | | | | | | | | | RETURN_A | RETURN_B | RETURN_A | +------------------+ +------------------+ | | LOOP_B | RETURN_A | RETURN_A | | | | | | +-----------------+-----------------+-----------------+-----------------+ 0: origin 0: origin | | | | | 1: B Enter 1: A Enter | A Miss | | | | 2: A Exit 2: B Enter | | RETURN_B | RETURN_B | RETURN_MISS | 1,2: B Closer ==> LOOP_B 1,2: A Closer ==> RETURN_A | | RETURN_B | RETURN_B | RETURN_MISS | 3: B Exit | | | | | 2,3: A closer ==> RETURN_B 3: origin +-----------------+-----------------+-----------------+-----------------+ 4: A Exit 5: B Exit Ordering states (Closer,Further) 4,5 A Closer ==> RETURN_B (A Enter, B Enter) UX4 : DISJOINT UNION 0,[1],2 origin outside both -> return closest +---------------------+ +---------------------+ (A Exit, B Exit) | A | | B | origin inside both -> return furthest | | | | | | | | (A Exit, B Enter) | | | | origin inside A | | | | | 0- - - - - [1] - - - - - - -2 | * if A closer return A (means disjoint) | X E | * if B closer loop B (find otherside of B, then compare again) | | | | | | | | | | | | +---------------------+ +---------------------+ 0: origin 1: A Exit 2: B Enter 1,2: A Closer => RETURN_A [1]
+-----------------+-----------------+-----------------+-----------------+ Ordering states (Closer,Further) | INTERSECTION | | | | | | B Enter | B Exit | B Miss | (Enter, Exit) -> Return Closer One : the Enter | A Closer | | | | | B Closer | | | | (Exit, Exit) -> Return Closer Exit | | | | | +-----------------+-----------------+-----------------+-----------------+ (Exit, Enter) -> Loop Closer (the Exit) | | | | | One might first imagine that having just exited so there will be no otherside ? | A Enter | | | | But that assumes simple convex constituent shapes whereas the alg needs to | | LOOP_A | RETURN_A | RETURN_MISS | handle less simple shapes like torus or annulus | | LOOP_B | LOOP_B | RETURN_MISS | | | | | | (Enter, Enter) -> Loop Closer +-----------------+-----------------+-----------------+-----------------+ Find otherside of Closer Enter | | | | | (by advancing t_min, intersecting that constituent again, comparing again) | A Exit | | | | | | LOOP_A | RETURN_A | RETURN_MISS | | | RETURN_B | RETURN_B | RETURN_MISS | +-------------+ | | | | | | A | +-----------------+-----------------+-----------------+-----------------+ | +------|-------+ 0:Origin | | | | | | | | B | 1:A_Enter | A Miss | | | | 0- - -1 - - [2]- - -3 | 2:B_Enter | | RETURN_MISS | RETURN_MISS | RETURN_MISS | E E X | (1:A_Enter,2:B_Enter) => A_Closer => LOOP_A => 3 | | RETURN_MISS | RETURN_MISS | RETURN_MISS | | | | | 3:A Exit | | | | | +------|------+ | (2:B Enter,3:A Exit) => RETURN_B 2:B_Enter +-----------------+-----------------+-----------------+-----------------+ | | +--------------+
Proceed with Unbalanced Trees ?
Speedup Unbalanced ?
Communicating shape more precisely => better suited alg => less resources => faster
Generalized Opticks CSG into three levels : tree < node < leaf
Three levels avoids recursion in intersect (disallowed by NVIDIA OptiX)
Alg works : but many TODOs
+----------------+ +-------------------+ DISJOINT MUST BE DISQUALIFIED |B | |D | +----|----+ +----|-----|----+ +------|----------+ +-----------+ |A | | |C | | | |E | | | | | | | | | | | | | | | | | 0 E1 X2 E3 X4 E5 X6 E7 X8 [X9] E10 X11 | | | | | | | | | | | | | | | | | | | | | | | | +----|----+ +----|-----|----+ +------|----------+ +-----------+ | | | | +----------------+ +-------------------+ E E E E E X X X X X X
User guarantees : absolutely no overlapping between constituents
+-------+ +-------+ +-------+ +-------+ +-------+ | | | | | | | | | | | | | | | | | | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | | | | | | | | | | | | | | | | | | +-------+ +-------+ +-------+ +-------+ +-------+
User guarantees : all constituents overlap => ( farthest_enter or nearest_exit ) with all overlap requirement:
farthest_enter < nearest_exit and max(enter_count, exit_count) == num_sub
1X,2M -> MISS +----------------1X-------------+ | / C | | 0 | +---------------------------------+ | | | B | | +----------------------------+ 0- - -1 - - - 2 2X,1M -> MISS | A | | . ABC | X X | | | . . . | | | 0 - - - -1- - - - - 2- - - - [3]- - - 4 | | E E E . . . X | | | | | . . . | | | | | | .0- -[1]- - - - - - - 2 - - - 3 - - | | | . . . X X X | | +-------------------------------+ | | | | 0 - - 1 - - - - -2 - - - - - - - - 3 - - - - - - - 4 E E (2E,1M)-> M X X (2X,1M) -> M | +---------------------------------+ | | +----------------------------+
Avoid tree overheads for suitable shapes (eg general sphere:inner-radius,phicut,thetacut)