rainbow_cfg4.py performance notes ===================================== Seconds for rainbow 1M ----------------------- CAVEAT: using laptop GPU with only 384 cores, desktop GPUs expected x20-30 faster Disabling step-by-step recording has large improvement factor for Opticks of about x3 but not much impact on cfg4-. The result match between G4 and Op remains unchanged. Seems like reducing the number and size of buffers in context is a big win for Opticks. With step by step and sequence recording:: Op 4.6 5.8 # Opticks timings rather fickle to slight code changes, maybe stack G4 56.8 55.9 Just final photon recording:: Op 1.8 G4 47.9 CAVEAT: above Op is Op/INTEROP -------------------------------- Actually this behavior is for Opticks INTEROP mode using OpenGL buffers, in compute mode with OptiX buffers there is almost no difference between enabling step-by-step recording and not. It seems like OpenGL constrains performance once total buffer size gets too big. Matching curand buffer to requirement --------------------------------------- * tried using 1M cuRAND buffer matching the requirement rather than using default 3M all the time, saw no change in propagation time :: # change ggeoview-rng-max value down to 1M ggeoview-rng-prep # create the states cache # opticks-/OpticksCfg.hh accordingly Compute Mode, ie no OpenGL ----------------------------- Revived "--compute" mode of ggv binary which uses OptiX owned buffers as opposed to the usual interop approach of using OpenGL buffers. Both with and without step recording is giving similar times in compute mode. This is very different from interop mode where cutting down on buffers gives big wins. :: Op 0.75 0.65 G4 57. 56. A related cmp mode controlled by "--cmp" option uses different computeTest binary, is not operational and little motivation now that "--compute" mode works. Could create package without OpenGL dependencies if there is a need. :: ggv-;ggv-rainbow --compute ggv-;ggv-rainbow --compute --nostep ggv-;ggv-rainbow --compute --nostep --dbg * look at how time scales with photon count Split the prelaunch from launch timings ----------------------------------------- Kernel validation, compilation and prelaunch does not need to be done for each event so can exclude it from timings. Doing this get:: Op (interop mode) 1.509 Op (--compute) 0.290 Op (--compute --nostep) 0.294 # skipping step recording not advantageous Op (--compute) 0.1416 # hmm some performance instability In ">console" login mode "ggv-rainbow" gives error that no GPU available Immediately after login getting:: Op (--compute) 0.148 Testing in Console Mode ------------------------- :: /usr/local/env/opticks/rainbow/mdtorch/5/20160102_170136/t_delta.ini:propagate=0.14798854396212846 /usr/local/env/opticks/rainbow/mdtorch/5/20160102_171121/t_delta.ini:propagate=0.44531063502654433 # try >console mode /usr/local/env/opticks/rainbow/mdtorch/5/20160102_171142/t_delta.ini:propagate=0.45501201006118208 /usr/local/env/opticks/rainbow/mdtorch/5/20160102_171156/t_delta.ini:propagate=0.33855076995678246 /usr/local/env/opticks/rainbow/mdtorch/5/20160102_171213/t_delta.ini:propagate=0.46851423906628042 /usr/local/env/opticks/rainbow/mdtorch/5/20160102_171226/t_delta.ini:propagate=0.33861030195839703 /usr/local/env/opticks/rainbow/mdtorch/5/20160102_171527/t_delta.ini:propagate=1.5933509200112894 # GUI interop mode /usr/local/env/opticks/rainbow/mdtorch/5/20160102_171548/t_delta.ini:propagate=0.27229616406839341 # GUI --compute mode Immediately after switching back to automatic graphics switching, then shortly after that:: 0.142 0.293 To do the nostep check ------------------------ After standard comparison:: ggv-;ggv-rainbow ggv-;ggv-rainbow --cfg4 * recompile optixrap- without RECORD define * run with --nostep option:: ggv-;ggv-rainbow --nostep ggv-;ggv-rainbow --cfg4 --nostep