G4DAEChroma.cpp : ZMQResponder, CUDA kernel call ==================================================== Planned C++ implementation of `g4daechroma.py` with few dependencies that just collects photons bytes streams, deserializes into arrays, copies to GPU, runs kernel, copy back, serialize and reply. Progress --------- * `chromacpp-` prelim look at directory traversal and reading npy files * `--geocache` is now operational, so have the cache Motivation ----------- * installation simplicity, just a few Chroma/CUDA kernel calls and copying vectors, * convenience of forking from Geant4 process to run the GPU propagation on same machine at the generation * speed Dependencies ------------- * CUDA (+thrust?) * ZMQ * (de)serialization Initialization ---------------- #. load geometry/materials/surface data from some persisted/cached format #. copy geometry/materials/surface data to GPU building GPU structs holding the GPU pointers from the copies Propagation -------------- #. ZMQ poll for bytes #. deserialize bytes into photon data objects of some type * container object with std::vector members * container with numpy array members * single numpy array, using union int/float trickery for simple decoding from C++ #. copy arrays to GPU #. kernel call #. get arrays back to CPU #. recompose transport object with propagated photons #. serialize #. ZMQ reply #. avoid intermediate `ChromaPhotonList` and instead load photons into the transport class NPY Geometry Cache -------------------- * see :doc:`/numpy/numpy_persistency` Do not want to recreate pycollada parsing again in C++, so need so do that in python and persist the gleaned data in numpy NPY serialization format, which can be read from C/C++. Intermediate geometry cache format that is written by the graphical `g4daeview.py` and can be used as an initialization speedup for the viewer. Performance not an major issue for initialization, so just use NPY as convenient to be written from python/numpy and read in C++ with CNPY. Use directory structure with single npy files:: materials/000/absorption_length.npy reemission_probability.npy ... 001/absorption_length.npy ... surfaces/001/... geometry/000/vertices.npy triangles.npy Potential Dependencies ------------------------ CuPP ^^^^^ #. http://www.plm.eecs.uni-kassel.de/CuPP/ * facilitating CUDA from C++, like pycuda does from python ? * probably unnecessary boost::program_options ^^^^^^^^^^^^^^^^^^^^^^^^^ * http://stackoverflow.com/questions/7399688/cuda-with-boost * maybe unnecessary, handle config via bash script, envvars `boost::numeric::ublas::vector` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * http://stackoverflow.com/questions/9059774/boostnumericublasvector-internal-data-storage-pointer :: boost::numeric::ublas::vector vector; double* ptr = &vector[0]; vector> vector; cudaMemcpy(device_dest, vector.data().begin(), vector.data().size(), cudaMemcpyHostToDevice); cuda::thrust ^^^^^^^^^^^^^^ * comes with CUDA, so not really an extra dependency * http://docs.nvidia.com/cuda/thrust/ * allows to hide the memcpy :: // Copy host_vector H to device_vector D thrust::device_vector D = H; CUDA Unified Memory ^^^^^^^^^^^^^^^^^^^^ * needs CUDA 6, compute capability 3, MBP should be capable of this * http://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/ Other Libs ------------ #. pyublas : integrate Boost.UBlas and Boost.Python * http://documen.tician.de/pyublas/ * allows to fill arrays in C++ that can be viewed as numpy arrays at python level off the same data, **NO COPYING** * what about serialization ? #. Boost-python * https://github.com/abingham/boost_python_tutorial * http://www.quora.com/How-do-I-convert-C++-vector-to-NumPy-array-using-Boost-Python