/cvmfs/opticks.ihep.ac.cn/ok/releases/Opticks-v0.2.1/x86_64-CentOS7-gcc1120-geant4_10_04_p02-dbg
Geant4 10.4.2 | (Opticks with newer Geant4 already in use elsewhere) |
Custom4 0.1.8 | small package : but deeply coupled with : Geant4 + JUNOSW + Opticks |
OptiX 7.5 CUDA 11.7 | straightforward update, so far did not exploit new features |
Opticks-v0.2.1 | latest release https://github.com/simoncblyth/opticks/releases/tag/v0.2.1 |
Updated environment setup + distribution scripts:
Testing distribution using ctests:
cd $OPTICKS_PREFIX/tests ctest -N # list tests ctest —-output-on-failure # run all ctest -R CSGFoundry_CreateFromSimTest --output-on-failure # run selected
EYE=0,1.5,0 TMIN=1.3 ZOOM=4 ~/opticks/cxr_min.sh ## CSGOptiXRMTest
Using GEOM J23_1_0_rc3_ok0
Deferred geometry, switched off by tut_detsim.py options.
--no-guide_tube | OptiX 7.1 has curves : thought might enable G4Torus translation, but docs show are one-sided : so instead triangulate torus[T] ? | |
--debug-disable-xj | XJfixture XJanchor | Deep CSG trees require dev. to see if "listnode" (similar to G4MultiUnion) can provide solution |
--debug-disable-sj | SJCLSanchor SJFixture SJReceiver SJFixture | |
--debug-disable-fa | FastenerAcrylic |
Virtual surface shifts used to avoid degeneracy, together with defaults:
export Tub3inchPMTV3Manager__VIRTUAL_DELTA_MM=0.10 ## 1.e-3 export HamamatsuMaskManager__MAGIC_virtual_thickness_MM=0.10 ## 0.05 export NNVTMaskManager__MAGIC_virtual_thickness_MM=0.10 ## 0.05
sigma_alpha/polish ground surface handling ?
[T] torus quartic analytic solution is painful : instead simply use appropriate triangulation approx, more precise that analytic with much less pain
idx | control script | initialization time (seconds) | Notes |
---|---|---|---|
[1] | ~/j/okjob.sh | 149 | JUNOSW+Opticks (tut_detsim.py "main") |
[2] | ~/opticks/g4cx/tests/G4CXTest_GEOM.sh | 127 | InputPhoton, TorchGenstep, NOT YET InputGenstep |
[3] | ~/opticks/CSGOptiX/cxs_min.sh | <2 | InputPhoton, TorchGenstep, InputGenstep |
G4CXTest.cc usinh G4CXApp.h
#include "OPTICKS_LOG.hh"
#include "G4CXApp.h"
int main(int argc, char** argv)
{
OPTICKS_LOG(argc, argv);
return G4CXApp::Main();
}
Enables pure optical simulation comparison
Test | Status |
---|---|
InputPhotons targetting PMTs | chi2 matched, no known issues |
TorchGenstep from CD center | chi2 marginal : chimney issue ? |
QCF qcf : a.q 1000000 b.q 1000000 c2sum : 567.1130 c2n : 506.0000 c2per: 1.1208 C2CUT: 200 CHI2 ISSUE WITH TORCH RUNNING c2sum/c2n:c2per(C2CUT) 567.11/506:1.121 (200) pv[0.031,< 0.05 : NOT:null-hyp ] INPUT PHOTONS TARGETTING PMT CHI2 OK np.c_[siq,_quo,siq,sabo2,sc2,sabo1][0:40] ## A-B history frequency chi2 comparison [[' 0' 'TO AB ' ' 0' '126549 126732' ' 0.1322' ' 2 5'] [' 1' 'TO BT BT BT BT BT BT SD ' ' 1' ' 70494 70173' ' 0.7325' ' 18 2'] [' 2' 'TO BT BT BT BT BT BT SA ' ' 2' ' 57103 56944' ' 0.2217' ' 5 25'] [' 3' 'TO SC AB ' ' 3' ' 51434 51739' ' 0.9016' ' 4 9'] [' 4' 'TO SC BT BT BT BT BT BT SD ' ' 4' ' 35878 36119' ' 0.8067' ' 58 45'] [' 5' 'TO SC BT BT BT BT BT BT SA ' ' 5' ' 29676 30164' ' 3.9797' ' 124 4'] [' 6' 'TO SC SC AB ' ' 6' ' 19993 19499' ' 6.1794' ' 137 124'] [' 7' 'TO BT BT SA ' ' 7' ' 18932 18837' ' 0.2390' ' 71 14'] [' 8' 'TO RE AB ' ' 8' ' 18319 18272' ' 0.0604' ' 9 64'] [' 9' 'TO SC SC BT BT BT BT BT BT SD ' ' 9' ' 15454 15701' ' 1.9582' ' 19 85'] ['10' 'TO SC SC BT BT BT BT BT BT SA ' '10' ' 12785 12696' ' 0.3109' ' 24 3'] ['11' 'TO BT BT AB ' '11' ' 10993 11100' ' 0.5182' ' 72 188'] ['12' 'TO BT AB ' '12' ' 9250 9727' '11.9897' ' 36 96'] ## ABSLEN ACRYLIC ? ['13' 'TO BT BT BT BT BT BT BT SA ' '13' ' 7476 7627' ' 1.5097' ' 176 162'] ['14' 'TO SC SC SC AB ' '14' ' 7544 7545' ' 0.0001' ' 90 84'] ['15' 'TO RE BT BT BT BT BT BT SD ' '15' ' 7419 7364' ' 0.2046' ' 197 6'] ['16' 'TO SC RE AB ' '16' ' 7137 7191' ' 0.2035' ' 110 93'] ['17' 'TO RE BT BT BT BT BT BT SA ' '17' ' 7126 7104' ' 0.0340' ' 48 181'] ['18' 'TO SC BT BT AB ' '18' ' 6419 6527' ' 0.9010' ' 153 89'] ['19' 'TO BT BT BT BT BT BT BT SR SA ' '19' ' 6385 6367' ' 0.0254' ' 16 139'] ['20' 'TO BT BT BT BT SD ' '20' ' 6146 6190' ' 0.1569' ' 13 99'] ['21' 'TO SC SC SC BT BT BT BT BT BT SD ' '21' ' 6148 6175' ' 0.0592' ' 145 194'] ['22' 'TO SC BT BT SA ' '22' ' 6087 6170' ' 0.5620' ' 120 185'] ['23' 'TO SC BT AB ' '23' ' 5589 5782' ' 3.2758' ' 8 17'] ['24' 'TO BT BT DR BT SA ' '24' ' 5449 5543' ' 0.8039' ' 600 246'] ['25' 'TO RE RE AB ' '25' ' 5538 5420' ' 1.2707' ' 267 125'] ['26' 'TO BT BT BT SA ' '26' ' 5532 5259' ' 6.9066' ' 745 7'] ['27' 'TO SC SC SC BT BT BT BT BT BT SA ' '27' ' 5084 4974' ' 1.2030' ' 23 31'] ['28' 'TO SC BT BT BT BT BT BT BT SA ' '28' ' 4609 4610' ' 0.0001' ' 20 63'] ['29' 'TO BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT SD ' '29' ' 3809 3813' ' 0.0021' ' 362 812'] ['30' 'TO RE SC AB ' '30' ' 3660 3565' ' 1.2491' ' 54 30'] ['31' 'TO SC RE BT BT BT BT BT BT SD ' '31' ' 3192 3134' ' 0.5318' ' 292 136'] ['32' 'TO SC BT BT BT BT BT BT BT SR SA ' '32' ' 3145 3173' ' 0.1241' ' 243 419'] ['33' 'TO BT BT BT BT BT BT BT SD ' '33' ' 3168 3138' ' 0.1427' ' 181 424'] ['34' 'TO BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT SA ' '34' ' 3142 3163' ' 0.0699' ' 22 257'] ['35' 'TO BT BT BT BT BT BT BT SR SR SA ' '35' ' 3043 3096' ' 0.4576' ' 286 1591'] ['36' 'TO SC SC BT BT AB ' '36' ' 2878 2987' ' 2.0257' ' 636 252'] ['37' 'TO SC RE BT BT BT BT BT BT SA ' '37' ' 2877 2960' ' 1.1802' ' 151 301'] ['38' 'TO BT BT BT BT AB ' '38' ' 2857 2834' ' 0.0930' ' 225 228'] ['39' 'TO SC BT BT BT BT SD ' '39' ' 2841 2800' ' 0.2980' ' 224 323']]
np.c_[siq,_quo,siq,sabo2,sc2,sabo1][bzero] ## in A but not B [['1107' 'TO BT BT BT BT BT BT BT BT SD ' '1107' ' 41 0' ' 0.0000' ' 11355 -1'] ['1305' 'TO BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT' '1305' ' 33 0' ' 0.0000' ' 11040 -1'] ['1623' 'TO BT BT DR BT BT BT SD ' '1623' ' 26 0' ' 0.0000' ' 1930 -1'] ['2375' 'TO BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT BT SD ' '2375' ' 17 0' ' 0.0000' ' 10972 -1'] ['3264' 'TO SC BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT BT SD ' '3264' ' 12 0' ' 0.0000' ' 22140 -1']] In [1]: w = a.q_startswith("TO BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT") ; w Out[1]: array([ 11040, 15219, 118322, 152607, 165838, 215978, 299136, 374379, 395244, 422394, 427598, 434101, 443666, 445392, 479186, 531698, 549984, 592656, 604821, 637582, 656052, 736283, 777988, 789501, 821402, 837105, 853410, 898084, 903045, 923645, 927731, 974750, 989689]) In [2]: a.f.record[w[0],:,0] ## PHOTON STEP POINT POSITIONS : ALL SIMILAR : GOING UP THE CHIMNEY Out[2]: array([[ -1.594, 0.835, 99.984, 0. ], ## photon step point (x,y,z,t) [ -284.142, 148.854, 17823.998, 81.302], [ -315.513, 165.289, 20000. , 88.563], [ -332.369, 174.119, 21750. , 94.401], [ -332.383, 174.127, 21752. , 94.407], [ -344.817, 180.641, 23500. , 103.396], [ -368.795, 193.202, 25752. , 110.911], [ -409.762, 214.663, 29599.7 , 123.75 ], [ -409.764, 214.664, 29599.85 , 123.75 ], ... [ -412.414, 216.053, 29848.799, 124.581], [ -412.424, 216.058, 29849.7 , 124.584], [ -412.426, 216.059, 29849.85 , 124.585], [ -412.532, 216.114, 29859.85 , 124.618], [ -412.534, 216.115, 29860. , 124.618], [ -412.543, 216.12 , 29860.9 , 124.621], [ -412.55 , 216.124, 29861.5 , 124.623], [ -412.56 , 216.129, 29862.5 , 124.627]], dtype=float32)
[2] G4CXTest_GEOM.sh | |
---|---|
1M photon Torch Genstep at CD center | |
Red | End points |
Green | Step points |
Cyan | Hit points |
TEST=large_scan ~/opticks/cxs_min.sh
Generate 20 optical only events with 0.1M->100M photons starting from CD center, gather and save only Hits.
OPTICKS_RUNNING_MODE=SRM_TORCH OPTICKS_NUM_PHOTON=H1:10,M2,3,5,7,10,20,40,60,80,100 OPTICKS_NUM_EVENT=20 OPTICKS_EVENT_MODE=Hit
Test Hardware | Notes |
---|---|
DELL Precison Workstation with NVIDIA TITAN RTX(24G) | Primary test hardware |
DELL Precision Workstation with NVIDIA TITAN V(12G) | VRAM limited |
DELL Precision Workstation with NVIDIA Quadro RTX 8000 (48G) | TODO : try for 400M photons |
GPU cluster nodes with NVIDIA V100 (32GB) | Basic function tests only so far |
~/o/cxs_min.sh ## 2.2M hits from 10M photon TorchGenstep, 3.1 seconds
Release preprocessor macros : adds: PRODUCTION , removes: DEBUG_TAG, DEBUG_PIDX,...
Examine flattened kernel source CSGOptiX/CSGOptiX7.cu (103k lines) : all includes included
~/opticks/preprocessor.sh > /tmp/out.cc ## using gcc -E -C -P
Grepping Kernel PTX : Parallel Thread Execution ~Assembly code
Grepping PTX for doubles and printf, and then removing from source : opticks-ptx bash function eg:
grep \\.f64 $OPTICKS_PREFIX/ptx/CSGOptiX_generated_CSGOptiX7.cu.ptx
Opticks SEvt Metadata
Opticks Event => folders of NumPy .npy (NPFold.h/NP.hh)
sreport executable:
Usage on workstation/GPU job and laptop:
~/o/cxs_min.sh ## create SEvt
Laptop, rsync small metadata summary from remote:
JOB=N7 ~/o/sreport.sh grab JOB=N7 PLOT=Substamp_ALL_Etime_vs_Photon ~/o/sreport.sh
Effective automated reporting+plotting are essential for optimization
Debug : 0.341 seconds per million photons
Release : 0.314 seconds per million photons
... A018_QSim__simulate_PREL : 1701933491020126,19102924,1300668 2023-12-07T15:18:11.020126 92,039,765 92,038,118 2,598 A018_QSim__simulate_POST : 1701933526625966,19102924,1300668 2023-12-07T15:18:46.625966 127,645,605 127,643,958 35,605,840 SEvt__endIndex_A018 : 1701933526626230,19102924,1300668 2023-12-07T15:18:46.626230 127,645,869 127,644,222 264 SEvt__endOfEvent_LAST_EGPU : 1701933531837026,19102924,1300668 2023-12-07T15:18:51.837026 132,856,665 132,855,018 5,210,796 SEvt__EndOfRun : 1701933531837143,19102924,1300668 2023-12-07T15:18:51.837143 132,856,782 132,855,135 117 A018_QSim__simulate_TAIL : 1701933531837486,19102924,1300668 2023-12-07T15:18:51.837486 132,857,125 132,855,478 343 CSGOptiX__SimulateMain_TAIL : 1701933531837541,19102924,1300668 2023-12-07T15:18:51.837541 132,857,180 132,855,533 55 juncture:4 [SEvt__Init_RUN_META,SEvt__BeginOfRun,SEvt__EndOfRun,SEvt__Init_RUN_META] time ranges between junctures SEvt__Init_RUN_META : -1 : 0 : 2023-12-07T15:16:38.980361 JUNCTURE SEvt__BeginOfRun : 22,181,663 : 22,181,663 : 2023-12-07T15:17:01.162024 JUNCTURE SEvt__EndOfRun : 110,675,119 : 132,856,782 : 2023-12-07T15:18:51.837143 JUNCTURE SEvt__Init_RUN_META : -132,856,782 : 0 : 2023-12-07T15:16:38.980361 JUNCTURE ranges:6 time ranges between pairs of stamps SEvt__Init_RUN_META ==> CSGFoundry__Load_HEAD 1,774 ## init CSGFoundry__Load_HEAD ==> CSGFoundry__Load_TAIL 1,325,321 ## load_geom CSGOptiX__Create_HEAD ==> CSGOptiX__Create_TAIL 20,854,325 ## upload_geom A000_QSim__simulate_HEAD ==> A000_QSim__simulate_PREL 19,450 ## upload_genstep A000_QSim__simulate_PREL ==> A000_QSim__simulate_POST 55,697 ## simulate A000_QSim__simulate_POST ==> A000_QSim__simulate_TAIL 7,686 ## download A001_QSim__simulate_HEAD ==> A001_QSim__simulate_PREL 1,037 ## upload_genstep A001_QSim__simulate_PREL ==> A001_QSim__simulate_POST 103,109 ## simulate A001_QSim__simulate_POST ==> A001_QSim__simulate_TAIL 11,304 ## download A002_QSim__simulate_HEAD ==> A002_QSim__simulate_PREL 1,022 ## upload_genstep A002_QSim__simulate_PREL ==> A002_QSim__simulate_POST 112,313 ## simulate A002_QSim__simulate_POST ==> A002_QSim__simulate_TAIL 16,068 ## download A003_QSim__simulate_HEAD ==> A003_QSim__simulate_PREL 988 ## upload_genstep ...
"Debug" : rather slow hit downloads ?
Unclear why "Release" downloads so much faster than "Debug"
Now back to [2] G4CXTest_GEOM.sh optical only comparison
Only got to 80M : due to U4Recorder memory leak
"Release" benefits B:U4Recorder more than A:CSGOptiX
U4Recorder leaking badly! [Geant4 propagation recorded into Opticks SEvt]
([3] cxs_min.sh) Pure Opticks (no Geant4 or U4Recorder) : no leak
B:U4Recorder / A:CSGOptiX : ratio only ~190 !
Absolute Comparison with ancient Opticks Measurements.. ? [Below presented at CHEP 2019] 58s / 400M photons
JUNO analytic, 400M photons from center | Speedup | |
---|---|---|
Geant4 Extrap. | 95,600 s (26 hrs) | |
Opticks RTX ON (i) | 58 s | 1650x |
JUNO analytic, 400M photons from center | Speedup | Notes | |
---|---|---|---|
Geant4 Extrap. | 95,600 s (26 hrs) | Ancient (2019) | |
Opticks RTX ON (i) | 58 s | 1650x | Ancient (2019) |
Current Opticks | 124 s (~2x slower) | "770x" | extrapolated from 31s for 100M |
Practically everything different between these measurements : nevertheless, its natural to compare
~300 ns photon lifetime limit ?
OPTICKS_MAX_BOUNCE=32 ## curr. OPTICKS_MAX_NS=300 ## IDEA
Expected Primary Cause of 2x slowdown : "bouncy" POM
Use cxs_min_scan.sh to vary OPTICKS_MAX_BOUNCE from 0->32
Slow hit increase above MAX_BOUNCE 20
Using ~/o/cxs_min.sh script with:
OPTICKS_RUNNING_MODE : SRM_TORCH OPTICKS_EVENT_MODE : HitPhoton (picked with VERSION 3) OPTICKS_NUM_PHOTON : H1 (100K) OPTICKS_MAX_PHOTON : M1
Workstation:
~/o/cxs_min_scan.sh ## o is symbolic link to opticks
Laptop:
~/o/cxs_min.sh grab PLOT=Substamp_ONE_maxb_scan PICK=A ~/o/sreport.sh PLOT=Substamp_ONE_maxb_scan PICK=A ~/o/sreport.sh mpcap PLOT=Substamp_ONE_maxb_scan PICK=A PUB=expensive_tail ~/o/sreport.sh mppub vi ~/opticks/notes/issues/OPTICKS_MAX_BOUNCE_scanning.rst ## notes
TODO: check performance with MAX_TIME = 200,300,400 ns
Small truncation bump at 32
sequence nibbles
sseq.h and seq.npy sequence array:
Using ~/o/cxs_min.sh script with:
OPTICKS_RUNNING_MODE : SRM_TORCH OPTICKS_EVENT_MODE : HitPhotonSeq OPTICKS_NUM_PHOTON : M1 OPTICKS_MAX_PHOTON : M1
Workstation:
VERSION=4 ~/o/cxs_min.sh
Laptop:
VERSION=4 ~/o/cxs_min.sh grab VERSION=4 MODE=2 PLOT=seqnib ~/o/cxs_min.sh ana VERSION=4 MODE=2 PLOT=seqnib ~/o/cxs_min.sh mpcap VERSION=4 MODE=2 PLOT=seqnib PUB=small_truncation_bump ~/o/cxs_min.sh mppub
S(n) Expected Speedup
optical photon simulation, P ~ 99% of CPU time
Must consider processing "big picture"
Very dependant on the parallel fraction
Theoretical Overall Speedup for various parallel fractions and parallelized speedups | |||
---|---|---|---|
Parallelized Speedup | |||
Parallel Fraction | 100x | 1000x | Notes |
95% | 17x | 20x | Little benefit beyond ~100x parellized speedup |
96% | 20x | 24x | |
97% | 25x | 32x | |
98% | 34x | 48x | Substantial benefit from more parallelized speedup |
99% | 50x | 91x |
In [4]: Amdahl.Overall_Speedup(np.array([100,1000]),0.95) Out[4]: array([16.807, 19.627]) In [5]: Amdahl.Overall_Speedup(np.array([100,1000]),0.99) Out[5]: array([50.251, 90.992])
Try Optix-IR (Intermediate Representation) alternative to PTX (new in OptiX 7.1)