rainbow_cfg4.py performance notes

Seconds for rainbow 1M

CAVEAT: using laptop GPU with only 384 cores, desktop GPUs expected x20-30 faster

Disabling step-by-step recording has large improvement factor for Opticks of about x3 but not much impact on cfg4-. The result match between G4 and Op remains unchanged.

Seems like reducing the number and size of buffers in context is a big win for Opticks.

With step by step and sequence recording:

Op   4.6    5.8      # Opticks timings rather fickle to slight code changes, maybe stack
G4   56.8  55.9

Just final photon recording:

Op    1.8
G4   47.9

CAVEAT: above Op is Op/INTEROP

Actually this behavior is for Opticks INTEROP mode using OpenGL buffers, in compute mode with OptiX buffers there is almost no difference between enabling step-by-step recording and not. It seems like OpenGL constrains performance once total buffer size gets too big.

Matching curand buffer to requirement

  • tried using 1M cuRAND buffer matching the requirement rather than using default 3M all the time, saw no change in propagation time
# change ggeoview-rng-max value down to 1M

ggeoview-rng-prep  # create the states cache

#  opticks-/OpticksCfg.hh accordingly

Compute Mode, ie no OpenGL

Revived “–compute” mode of ggv binary which uses OptiX owned buffers as opposed to the usual interop approach of using OpenGL buffers. Both with and without step recording is giving similar times in compute mode. This is very different from interop mode where cutting down on buffers gives big wins.

Op   0.75  0.65
G4   57.   56.

A related cmp mode controlled by “–cmp” option uses different computeTest binary, is not operational and little motivation now that “–compute” mode works. Could create package without OpenGL dependencies if there is a need.

ggv-;ggv-rainbow --compute
ggv-;ggv-rainbow --compute --nostep
ggv-;ggv-rainbow --compute --nostep --dbg
  • look at how time scales with photon count

Split the prelaunch from launch timings

Kernel validation, compilation and prelaunch does not need to be done for each event so can exclude it from timings.

Doing this get:

Op (interop mode)         1.509
Op (--compute)            0.290
Op (--compute --nostep)   0.294     # skipping step recording not advantageous
Op (--compute)            0.1416    # hmm some performance instability

In “>console” login mode “ggv-rainbow” gives error that no GPU available

Immediately after login getting:

Op (--compute)            0.148

Testing in Console Mode

/usr/local/env/opticks/rainbow/mdtorch/5/20160102_170136/t_delta.ini:propagate=0.14798854396212846

/usr/local/env/opticks/rainbow/mdtorch/5/20160102_171121/t_delta.ini:propagate=0.44531063502654433  # try >console mode
/usr/local/env/opticks/rainbow/mdtorch/5/20160102_171142/t_delta.ini:propagate=0.45501201006118208
/usr/local/env/opticks/rainbow/mdtorch/5/20160102_171156/t_delta.ini:propagate=0.33855076995678246
/usr/local/env/opticks/rainbow/mdtorch/5/20160102_171213/t_delta.ini:propagate=0.46851423906628042
/usr/local/env/opticks/rainbow/mdtorch/5/20160102_171226/t_delta.ini:propagate=0.33861030195839703

/usr/local/env/opticks/rainbow/mdtorch/5/20160102_171527/t_delta.ini:propagate=1.5933509200112894   # GUI interop mode
/usr/local/env/opticks/rainbow/mdtorch/5/20160102_171548/t_delta.ini:propagate=0.27229616406839341  # GUI --compute mode

Immediately after switching back to automatic graphics switching, then shortly after that:

0.142
0.293

To do the nostep check

After standard comparison:

ggv-;ggv-rainbow
ggv-;ggv-rainbow --cfg4
  • recompile optixrap- without RECORD define

  • run with –nostep option:

    ggv-;ggv-rainbow --nostep
    ggv-;ggv-rainbow --cfg4 --nostep