Links

Content Skeleton

This Page

Previous topic

Optical Photon Marshalling

Next topic

BVH Splits

Propagate Timeouts

Objectives

  • robustify propagation
  • bookkeeping
    • local worker lockfile to prevent two propagators running simultaneously on same node
    • python metadata dict containing propagator name and parameters : could be the lockfile
    • how to do this remotely? multipart zmq message with 2nd frame containing metadata ?
      • probably json https://github.com/vivkin/gason as easy to translate from python dict and parsers available in every language : speed is not a great concern as not much metadata
    • sqlite based recording of
      • propagation parameters and times
      • photon/hits/hitlist digests
    • exercise bookkeeping by checking dependency on params like max_steps and cohort size
  • hit checking
    • flag analysis : NAN_ABORT trace
    • many slot 22 : suggests truncation : check just storage truncation, not step truncation
    • correlate the photons with the hits
    • hit time distributions
    • hit wavelength distributions

mock001 : normal incidence NAN_ABORT

Perfect normal incidence resulted in an NAN_ABORT. Observed with a vertical photon incident on the horizontal AD table, on which the ADs sit. Worked around by special casing normal incidence.

mock001 : axis aligned photons issue

Need to login GPU machine in “>console” only mode in order to use the debugger. Most convenient to do that over ssh, for multiple connections and copy and pasting.

(chroma_env)delta:g4daeview blyth$ g4daechroma.sh --cuda-gdb

Suspect a handful of stuck threads:

(cuda-gdb) info threads
  4 Thread 0x261b of process 44071  0x00007fff9292664a in -[PAStackshot initWithGlobalTrace].tracebuf () from /usr/lib/system/libsystem_kernel.dylib
  3 Thread 0x240b of process 44071  0x00007fff9292664a in -[PAStackshot initWithGlobalTrace].tracebuf () from /usr/lib/system/libsystem_kernel.dylib
  2 Thread 0x23f7 of process 44071  0x00007fff92921a1a in -[PAStackshot initWithGlobalTrace].tracebuf () from /usr/lib/system/libsystem_kernel.dylib
* 1 Thread 0x2203 of process 44071  0x00007fff89fc1b2d in pthread_threadid_np () from /usr/lib/system/libsystem_pthread.dylib
(cuda-gdb) kill
Kill the program being debugged? (y or n) y
[Termination of CUDA Kernel 9 (init_rng<<<(1025,1,1),(64,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 8 (reduce_kernel_stage2<<<(1,1,1),(512,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 7 (reduce_kernel_stage1<<<(3,1,1),(512,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 5 (init_rng<<<(1025,1,1),(64,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 4 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 3 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 2 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 1 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 0 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 10 (propagate_hit<<<(2,1,1),(64,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 6 (propagate_hit<<<(66,1,1),(64,1,1)>>>) on Device 0]
(cuda-gdb)

Suspicion turned out to be correct, axis aligned (actually z-aligned vertical photons) were not reaching intersection. They were just continuously testing against boxes, this was due to some infinities meaning that the x,y info was effectively not used. Workaround was to special case axis aligned photons.

Code Improvements

  • provide high level API for ease of use
    • single header “G4DAEChroma.hh”
    • all normal operations through “chroma” instance ?
    • reposition photons and hits outside transport

Dependencies

  • eliminate cnpy to avoid dependency duplication
    • need to migrate G4DAETransformCache to numpy.hpp
  • cordon ROOT + ZMQRoot + ChromaPhotonList usage behind a definition

DONE

  • reproducibility/seeding : reproducibility established by reset_rng_states that does reset for every propagation
  • compare vbo to non-vbo propagation : match made
  • move propagation relevant constants into daedirectconfig.py so both propagations use the same
  • getting sensor ids back to caller
  • is the transform to local yielding expected coordinates ? yep small
# vbo propagation
# photon_id / slot / history / pmtid

In [23]: h[:,3,:4].view(np.int32)
Out[23]:
array([[      62,       22,       68, 16844311],
       [      74,       22,       84, 16844311],
       [      82,       22,       68, 16844810],
       [      93,       22,      580, 16843276],
       [     164,       22,       68, 16843021],
       [     170,       11,       68, 16844291],
       [     194,        5,      516, 16844033],
       [     248,        6,      516, 16843800],

Issues

timeouts

Some photon lists like mock001 succeed to propagate, some like mock007 causing timeouts.

pycuda errors that manifest as timeouts can be due to the GPU equivalent of a segfault which kills the context, and subsequently causes the timeout as the host has no context to talk to on device.

Are certain photon parameters causing “segfaults” on GPU ?

 File "/usr/local/env/chroma_env/lib/python2.7/site-packages/env/geant4/geometry/collada/g4daeview/daephotons.py", line 222, in propagate
    self.propagator.interop_propagate( vbo, max_steps=max_steps, max_slots=max_slots )
  File "/usr/local/env/chroma_env/lib/python2.7/site-packages/env/geant4/geometry/collada/g4daeview/daephotonspropagator.py", line 192, in interop_propagate
    self.propagate( vbo_dev_ptr, max_steps=max_steps, max_slots=max_slots )
  File "/usr/local/env/chroma_env/lib/python2.7/site-packages/env/geant4/geometry/collada/g4daeview/daephotonspropagator.py", line 160, in propagate
    t = get_time()
  File "/usr/local/env/chroma_env/lib/python2.7/site-packages/pycuda/driver.py", line 453, in get_call_time
    end.synchronize()
pycuda._driver.LaunchError: cuEventSynchronize failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuEventDestroy failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuEventDestroy failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuGLUnmapBufferObject failed: launch timeout
(chroma_env)delta:g4daeview blyth$
(chroma_env)delta:g4daeview blyth$
(chroma_env)delta:g4daeview blyth$ g4daeview.sh --load mock007

mock photons

Using the transform cache, samples of photons were prepared with directions oriented with respect to the PMTs. Eg bullseye photons.

To visualize initial photons load with -P/–nopropagate

g4daeview.sh --load mock002 --nopropagate --geometry-regexp PmtHemiCathode
//transport->GetPhotons()->Save("mock002");  // ldir +y
//transport->GetPhotons()->Save("mock003");  // ldir +x
//transport->GetPhotons()->Save("mock004");  // ldir +z
//transport->GetPhotons()->Save("mock005");  // lpos (0,0,100) ldir (0,0,-1)  try to shoot directly at PMT
//transport->GetPhotons()->Save("mock006");  // lpos (0,0,500) ldir (0,0,-1)  try to shoot directly at PMT
//transport->GetPhotons()->Save("mock007");  // lpos (0,0,1500) ldir (0,0,-1)  try to shoot directly at PMT

vbo vs non-vbo hit count difference

In [36]: h.shape
Out[36]: (146, 4, 4)

In [37]: h = ph("h1")

In [38]: h.shape
Out[38]: (33, 4, 4)

mocknuwa propagation testing over network

While running:

# non-vbo  propagation using propagate_hit.cu gpu/photon_hit.py GPUPhotonsHit
g4daechroma.sh

# vbo propagation with the GUI
g4daeview.sh --live
g4daeview.sh --zmqendpoint=tcp://localhost:5002

# the broker
czmq-;czmq-broker-local

Provoke a propagation with:

mocknuwa.sh 1

file based propagation testing

debug propagation with:

daedirectpropagation.sh mock001

visualize initial positions by holding propagation

g4daeview.sh --load mock002 --nopropagate --geometry-regexp PmtHemiCathode
udp.py --load mock002
udp.py --load mock003
udp.py --propagate

vbo propagation

Kernel invoked from interop_propagate daephotons.py:

182     def propagate(self, max_steps=100):
...
216         vbo = self.renderer.pbuffer
217
218         self.propagator.update_constants()
219
220         if not self.config.args.propagate:
221             log.warn("propagation is inhibited by config: -P/--nopropagate ")
222         else:
223             log.warn("propagation proceeding")
224             self.propagator.interop_propagate( vbo, max_steps=max_steps, max_slots=max_slots )
225         pass
226
227         propagated = vbo.read()

kernel call daephotonspropagator.py:

.92     def propagate(self,
 93                   vbo_dev_ptr,
 94                   max_steps=100,
 95                   max_slots=30,
 96                   use_weights=False,
 97                   scatter_first=0):
 98         """
...
145                     grid=(blocks, 1)
146                     args = ( np.int32(first_photon),
147                              np.int32(photons_this_round),
148                              self.input_queue_gpu[1:].gpudata,
149                              self.output_queue_gpu.gpudata,
150                              self.ctx.rng_states,
151                              vbo_dev_ptr,
152                              np.int32(nsteps),
153                              np.int32(max_slots),
154                              np.int32(use_weights),
155                              np.int32(scatter_first),
156                              self.ctx.gpu_geometry.gpudata)
157
158                     get_time = self.kernel.prepared_timed_call( grid, block, *args )

cuda/propagate_vbo.cu:

488 __global__ void
489 propagate_vbo( int first_photon,
490                int nthreads,
491                unsigned int *input_queue,
492                unsigned int *output_queue,
493                curandState *rng_states,
494                float4 *vbo,
495                int max_steps,
496                int max_slots,
497                int use_weights,
498                int scatter_first,
499                Geometry *g)
500 {

Hmm, can i access the maps from the Geometry struct GPU side ? Nope not there:

54 struct Geometry
55 {
56     float3 *vertices;
57     uint3 *triangles;
58     unsigned int *material_codes;
59     unsigned int *colors;
60     uint4 *primary_nodes;
61     uint4 *extra_nodes;
62     Material **materials;
63     Surface **surfaces;
64     float3 world_origin;
65     float world_scale;
66     int nprimary_nodes;
67 };

 4 struct Detector
 5 {
 6     // Order in decreasing size to avoid alignment problems
 7     int *solid_id_to_channel_index;

non-vbo propagation

Must use GPUDetector (not GPUGeometry) to have the mapping arrays.

gpu/detector.py:

16 class GPUDetector(GPUGeometry):
17     def __init__(self, detector, wavelengths=None, print_usage=False):
18         GPUGeometry.__init__(self, detector, wavelengths=wavelengths, print_usage=False)
19
20         self.solid_id_to_channel_index_gpu = \
21             ga.to_gpu(detector.solid_id_to_channel_index.astype(np.int32))
22         self.solid_id_to_channel_id_gpu = \
23             ga.to_gpu(detector.solid_id_to_channel_id.astype(np.int32))
24

gpu/photon_hit.py:

176         solid_id_map_gpu = gpu_geometry.solid_id_map
177         solid_id_to_channel_id_gpu = gpu_geometry.solid_id_to_channel_id_gpu
178
...
197                     grid = (blocks, 1)
198                     args = (
199                         np.int32(first_photon),
200                         np.int32(photons_this_round),
201                         self.input_queue_gpu[1:].gpudata,
202                         self.output_queue_gpu.gpudata,
203                         rng_states,
204                         self.pos.gpudata,
205                         self.dir.gpudata,
206                         self.wavelengths.gpudata,
207                         self.pol.gpudata,
208                         self.t.gpudata,
209                         self.flags.gpudata,
210                         self.last_hit_triangles.gpudata,
211                         self.weights.gpudata,
212                         np.int32(nsteps),
213                         np.int32(use_weights),
214                         np.int32(scatter_first),
215                         gpu_geometry.gpudata,
216                         solid_id_map_gpu.gpudata,
217                         solid_id_to_channel_id_gpu.gpudata,
218                             )
219                     get_time = self.propagate_hit_kernel.prepared_timed_call( grid, block, *args )
220                     t = get_time()

cuda/propagate_hit.cu:

118 // iiPPPPPPPPPPPiiiP
119
120 __global__ void
121 propagate_hit(
122       int first_photon,
123       int nthreads,
124       unsigned int *input_queue,
125       unsigned int *output_queue,
126       curandState *rng_states,
127       float3 *positions,
128       float3 *directions,
129       float *wavelengths,
130       float3 *polarizations,
131       float *times,
132       unsigned int *histories,
133       int *last_hit_triangles,
134       float *weights,
135       int max_steps,
136       int use_weights,
137       int scatter_first,
138       Geometry *g,
139       int* solid_map,
140       int* solid_id_to_channel_id )
141 {
...
233     if ((p.history & SURFACE_DETECT) != 0) {
234
235         //
236         // kludgy mis-use of lht for outputting
237         // various things like
238         //       solid_id:    index like, zero based
239         //       channel_id:  the pmtid, encoding site/ad/ring/...
240         //
241         int triangle_id = last_hit_triangles[photon_id];
242         if (triangle_id > -1) {
243             int solid_id = solid_map[triangle_id];
244             int channel_id = solid_id_to_channel_id[solid_id];
245             last_hit_triangles[photon_id] = channel_id ;
246         } else {
247             last_hit_triangles[photon_id] = -2 ;
248         }

threading sensor ids back to caller (vbo)

In [7]: h = ph("h1")

In [8]: a = h[:,3,0].view(np.int32)

In [9]: b = h[:,3,1].view(np.int32)

In [10]: c = h[:,3,2].view(np.int32)

In [11]: a[a != 0]
Out[11]:
array([ 750,  276,  816,  342,  486,  702, 1044,  936,  696,  696, 1050,
       1194,  372,  390,  756, 1086,  762, 1134,  786,  726, 1026,  408,
        912,   48,  102,   78,  756,  942,  954, 1164,  108,  876, 1092,
        702,  504,  414,  702,  498,  522,  546,  768,  324, 1086, 1008,

        ...

In [13]: np.set_printoptions(formatter={'int':hex})

In [14]: b[b != 0]
Out[14]:
array([0x1010516, 0x101020f, 0x1010609, 0x1010302, 0x1010402, 0x101050e,
       0x1010717, 0x1010705, 0x101050d, 0x101050d, 0x1010718, 0x1010818,
       0x1010307, 0x101030a, 0x1010517, 0x1010806, 0x1010518, 0x101080e,
       0x1010604, 0x1010512, 0x1010714, 0x101030d, 0x1010701, 0x1010101,
       ...


In [16]: np.set_printoptions(formatter={'int':None})

In [17]: c[c != 0]
Out[17]: array([888, 888, 888, ..., 888, 888, 888], dtype=int32)

threading sensor ids back to caller (non-vbo)

In [12]: a = ph("1")

In [13]: h = ph("h1")

In [14]: a.shape
Out[14]: (4165, 4, 4)

In [15]: h.shape
Out[15]: (52, 4, 4)

In [16]: np.set_printoptions(formatter={'int':hex})

In [17]: h[:,3,3]
Out[17]:
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32)

In [18]: h[:,3,3].view(np.int32)
Out[18]:
array([0x1010516, 0x1010302, 0x1010402, 0x1010717, 0x1010718, 0x1010517,
       0x1010518, 0x1010701, 0x1010106, 0x1010706, 0x1010708, 0x101010b,
       0x101050e, 0x101040c, 0x1010601, 0x1010201, 0x101020d, 0x101020d,
       0x1010502, 0x1010209, 0x101070d, 0x1010602, 0x1010715, 0x1010108,
       0x1010407, 0x1010418, 0x101040b, 0x101060c, 0x1010709, 0x1010409,
       0x101050d, 0x101050d, 0x1010613, 0x1010707, 0x1010516, 0x101020d,
       0x1010201, 0x1010308, 0x101040f, 0x101010e, 0x1010109, 0x1010417,
       0x101050c, 0x1010309, 0x1010213, 0x101050c, 0x1010402, 0x101040e,
       0x1010716, 0x1010315, 0x101010f, 0x1010416], dtype=int32)

Hmm for comparison need photon index in the hits array