running_with_more_photons

New Opticks : How to Increase the max number of photons/gensteps ?

Setting envvars can change the defaults:

OPTICKS_MAX_GENSTEP
OPTICKS_MAX_PHOTON

Those defaults can be overridden in the code by calling SEventConfig static functions prior to calling G4CXOpticks::SetGeometry:

SEventConfig::SetMaxGenstep
SEventConfig::SetMaxPhoton

In addition to that config it is also necessary to create QCurandState files corresponding to the MaxPhoton you set. That can be done using the qudarap-prepare-installation bash function:

qudarap-prepare-sizes-Linux-(){  echo ${OPTICKS_QUDARAP_RNGMAX:-1,3,10} ; }
qudarap-prepare-sizes-Darwin-(){ echo ${OPTICKS_QUDARAP_RNGMAX:-1,3} ; }
qudarap-prepare-sizes(){ $FUNCNAME-$(uname)- | tr "," "\n"  ; }

qudarap-prepare-installation()
{
   local sizes=$(qudarap-prepare-sizes)
   local size
   for size in $sizes ; do
       QCurandState_SPEC=$size:0:0  QCurandStateTest
       rc=$? ; [ $rc -ne 0 ] && return $rc
   done
   return 0
}

Check your environment with SEventConfigTest binary:

epsilon:~ blyth$ SEventConfigTest
test_EstimateAlloc@20:
SEventConfig::Desc
       OPTICKS_EVENT_MODE          EventMode  : Default
     OPTICKS_RUNNING_MODE        RunningMode  : 0
                            RunningModeLabel  : SRM_DEFAULT
     OPTICKS_G4STATE_SPEC        G4StateSpec  : 1000:38
                            G4StateSpecNotes  : 38=2*17+4 is appropriate for MixMaxRng
    OPTICKS_G4STATE_RERUN       G4StateRerun  : -1
      OPTICKS_MAX_GENSTEP         MaxGenstep  : 1000000
       OPTICKS_MAX_PHOTON          MaxPhoton  : 1000000
     OPTICKS_MAX_SIMTRACE        MaxSimtrace  : 1000000     MaxCurandState  : 1000000
       OPTICKS_MAX_BOUNCE          MaxBounce  : 9
                              MaxBounceNotes  : MaxBounceLimit:31, MaxRecordLimit:32 (see sseq.h)
       OPTICKS_MAX_RECORD          MaxRecord  : 0
          OPTICKS_MAX_REC             MaxRec  : 0
          OPTICKS_MAX_SEQ             MaxSeq  : 0
          OPTICKS_MAX_PRD             MaxPrd  : 0
          OPTICKS_MAX_TAG             MaxTag  : 0
         OPTICKS_MAX_FLAT            MaxFlat  : 0
         OPTICKS_HIT_MASK            HitMask  : 64
                                HitMaskLabel  : SD
       OPTICKS_MAX_EXTENT          MaxExtent  : 1000
         OPTICKS_MAX_TIME            MaxTime  : 10
          OPTICKS_RG_MODE             RGMode  : 2
                                 RGModeLabel  : simulate
        OPTICKS_COMP_MASK           CompMask  : 262
                               CompMaskLabel  : genstep,photon,hit
         OPTICKS_OUT_FOLD            OutFold  : $DefaultOutputDir
         OPTICKS_OUT_NAME            OutName  : -
OPTICKS_PROPAGATE_EPSILON   PropagateEpsilon  :     0.0500
     OPTICKS_INPUT_PHOTON        InputPhoton  : -

New Opticks : Background on QCurandState files

The QCurandState files allows using curand in the simulation without having to initialize it every time. This is much much faster than calling curand_init everytime. If you initialized curand when simulating the performance would be factors of 100 worse, because the large stack means that far fewer threads can run at the same time.

  • Opticks uses curand to generate randoms on device.
  • Initializing curand requires calling curand_init, but doing that requires a lot of stack. Far more stack than is needed for ray tracing and simulation. So to avoid having to initialize curand every time, and have large stacks and few threads in flight, this is done at Opticks installation time and the resulting curandState objects are saved to file.
  • When simulation is done the QCurandState files are loaded and uploaded to device.
  • The QCurandState files are necessary. They do not contain random numbers, they contain initialized curandState objects needed to generate random numbers on device.

I plan at some point to support splitting the QCurandState into multiple smaller files to avoid the inconvenience of very large > 4GB files. And also to avoid having to recreate all the states when you want to extend the size.

Old Opticks

Not Enough RNG issue:

-------------------------
Number of Scintillation Photons:  12177
Number of Cerenkov Photons:  3543830

-------->Storing hits in the ROOT file: in this event there are 37929 hits in the tracker chambers:
HC: volTPCActive_lArTPC_HC

###[ EventAction::EndOfEventAction G4Opticks.propagateOpticalPhotons

2020-11-19 09:41:50.832 FATAL [20312] [OpticksEvent::resize@1100] NOT ENOUGH RNG : USE OPTION --rngmax 3/10/100  num_photons 3556007 rng_max 3000000
G4OpticksTest: /home/wenzel/gputest/opticks/optickscore/OpticksEvent.cc:1106: void OpticksEvent::resize(): Assertion `enoughRng && " need to prepare and persist more RNG states up to maximual per propagation number"' failed.
Aborted (core dumped)
--------------------------

Initializing cuRAND requires a large GPU stack size so this is done in separate CUDA launches which are done during installation by cudarap-prepare-installation

epsilon:opticks blyth$ t opticks-prepare-installation
opticks-prepare-installation ()
{
    local msg="=== $FUNCNAME :";
    echo $msg generating RNG seeds into installcache;
    cudarap-;
    cudarap-prepare-installation
}
epsilon:opticks blyth$ cudarap-
epsilon:opticks blyth$ t cudarap-prepare-installation
cudarap-prepare-installation ()
{
    local size;
    cudarap-prepare-sizes | while read size; do
        CUDARAP_RNGMAX_M=$size cudarap-prepare-rng-;
    done
}
epsilon:opticks blyth$ t cudarap-prepare-rng-
cudarap-prepare-rng- ()
{
    local msg="=== $FUNCNAME :";
    local path=$(cudarap-rngpath);
    [ -f "$path" ] && echo $msg path $path exists already && return 0;
    CUDARAP_RNG_DIR=$(cudarap-rngdir) CUDARAP_RNG_MAX=$(cudarap-rngmax) $(cudarap-ibin)
}

Running cudarap-prepare-installation again will just list the curandState files:

[blyth@localhost ~]$ cudarap-
[blyth@localhost ~]$ cudarap-prepare-installation
=== cudarap-prepare-rng- : path /home/blyth/.opticks/rngcache/RNG/cuRANDWrapper_1000000_0_0.bin exists already
=== cudarap-prepare-rng- : path /home/blyth/.opticks/rngcache/RNG/cuRANDWrapper_3000000_0_0.bin exists already
=== cudarap-prepare-rng- : path /home/blyth/.opticks/rngcache/RNG/cuRANDWrapper_10000000_0_0.bin exists already
[blyth@localhost ~]$

Bash functions for creation of larger files are:

597 cudarap-prepare-rng-400M(){ CUDARAP_RNGMAX_M=400 cudarap-prepare-rng- ; }
598 cudarap-prepare-rng-200M(){ CUDARAP_RNGMAX_M=200 cudarap-prepare-rng- ; }
599 cudarap-prepare-rng-100M(){ CUDARAP_RNGMAX_M=100 cudarap-prepare-rng- ; }
600 cudarap-prepare-rng-10M(){  CUDARAP_RNGMAX_M=10  cudarap-prepare-rng- ; }
601 cudarap-prepare-rng-2M(){   CUDARAP_RNGMAX_M=2   cudarap-prepare-rng- ; }
602 cudarap-prepare-rng-1M(){   CUDARAP_RNGMAX_M=1   cudarap-prepare-rng- ; }

The curandState file that is used depends on the –rngmax option, from okc/OpticksCfg.cc it is apparent that the default rngmax is 3M:

113     m_rngmax(3),
114     m_rngmaxscale(1000000),

Assuming you have the 10M already saved you can increase the maximum number of photons Opticks can handle with, eg:

--rngmax 10

It is also possible to change the seed and offset from their defaults of zero with:

--rngseed 1
--rngoffset 42

If 10M photons is insufficient use the below to initialize more curandState slots, eg for 100M:

CUDARAP_RNGMAX_M=100 cudarap-prepare-rng-

When using embedded Opticks withing G4Opticks

Typically the executables command line is not parsed by Opticks when using an embedded Opticks as when using G4Opticks. Opticks is instanciated when the G4Opticks::setGeometry method is called, thus to change config of Opticks invoke G4Opticks::setEmbeddedCommandlineExtra prior to calling G4Opticks::setGeometry for example:

const char* extra = "--rngmax 10 --rngseed 1 --rngoffset 42" ;
m_g4ok->setEmbeddedCommandlineExtra(extra);

What is the maximum number of photons that can be handled at once ?

The maximum is limited by GPU VRAM. Each photon takes 112 bytes:

  • 64 bytes (4*4*4 bytes for 16 32-bit floats/ints) of parameters
  • 48 bytes of curandState.

400M photons corresponding to about 45G has been found to be close to the maximum possible when using a 48G VRAM GPU (NVIDIA Quadro RTX 8000).

oxrap/ORng : populates rng_states in the OptiX GPU context

032 /**
 33 ORng
 34 ====
 35
 36 Uploads persisted curand rng_states to GPU.
 37 Canonical instance m_orng is ctor resident of OPropagator.
 38
 39 Work is mainly done by cudarap-/cuRANDWrapper
 40
 41 TODO: investigate Thrust based alternatives for curand initialization
 42       potential for eliminating cudawrap-
 43
 44 **/
...
073 void ORng::init()
 74 {
 75     unsigned rng_max = m_ok->getRngMax();
...

110     // OptiX owned RNG states buffer (not CUDA owned)
111     m_rng_states = m_context->createBuffer( RT_BUFFER_INPUT, RT_FORMAT_USER);
112
113     m_rng_states->setElementSize(sizeof(curandState));
114
115     if(num_mask == 0)
116     {
117         m_rng_states->setSize(rng_max);
118
119         curandState* host_rng_states = static_cast<curandState*>( m_rng_states->map() );
120
121         m_rng_wrapper->setItems(rng_max); // why ? to identify which cache file to load i suppose
122
123         m_rng_wrapper->LoadIntoHostBuffer(host_rng_states, rng_max );
124
125         m_rng_states->unmap();
126     }
127     else
128     {
129         m_rng_states->setSize(num_mask);
130
131         curandState* host_rng_states = static_cast<curandState*>( m_rng_states->map() );
132
133         m_rng_wrapper->setItems(rng_max); // still need to load the full cache
134
135         m_rng_wrapper->LoadIntoHostBufferMasked(host_rng_states, m_mask ) ; // but make partial copy
136
137         m_rng_states->unmap();
138     }
139
140     m_context["rng_states"]->setBuffer(m_rng_states);
141 }
142

oxrap/OPropagator : instanciates ORng

65 OPropagator::OPropagator(Opticks* ok, OEvent* oevt, OpticksEntry* entry)
66     :
67     m_log(new SLog("OPropagator::OPropagator","", LEVEL)),
68     m_ok(ok),
69     m_oevt(oevt),
70     m_ocontext(m_oevt->getOContext()),
71     m_context(m_ocontext->getContext()),
72     m_orng(new ORng(m_ok, m_ocontext)),
73     m_propagateoverride(m_ok->getPropagateOverride()),
74     m_nopropagate(false),
75     m_entry(entry),
76     m_entry_index(entry->getIndex()),
77     m_prelaunch(false),