DAEVertexBuffer Dev Notes =========================== Interop -------- Motivation: To avoid buffer recreation for visualization need to get Chroma/CUDA to use the OpenGL buffers. Interop steps gleaned from raycaster PBO usage: #. import pycuda.gl as cuda_gl # special GL enabled CUDA context (handle in DAEChromaContext) #. cuda_pbo = cuda_gl.BufferObject(long(pbo)) # needs CUDA context, pbo is OpenGL buffer id #. cuda_pbo.unregister() # ends CUDA responsibility, allows OpenGL access for drawing #. pbo_mapping = cuda_pbo.map() # CUDA takes responsibility #. pbo_device_ptr = pbo_mapping.device_ptr() # pointer to device memory to pass as kernel call argument #. pbo_mapping.unmap() # after the kernel call, declares end of CUDA usage Currently are using the simple and inefficient technique of recreating the VBOs whenever need to change anything. Ruminations #. having separate VBOs for drawing lines and points is inconvenient, especially when have CUDA in the mix * should be able to draw points from the lines VBO via indices control (ie have one vertex array and which is used with two different indices arrays) * how about flag control photon selections ? glumpy VertexBuffer glues together vertices and indices as pair of ctor arguments : they do not need to go together. ie could split that off and do the indices selection on CPU in similar manner to current ? But what about on GPU selection, it just needs bit field comparison to uniform argument mask or history bitfield ... hmm but still would need to pull out the selection bits CPU side to setup the OpenGL indices ? So little gain. Does OpenGL have some concept of a mask array ? Can control color in the kernel, so make unselected invisible (set alpha to 0) * :google:`OpenGL CUDA vertex selection` * http://www.opengl.org/wiki/Vertex_Rendering * 2nd point of line pair can be calculated in the kernel following CUDA propagation, with line length being a uniform NB need to keep the slot, initially fill it from numpy for the non-CUDA installs #. need to retain drawing of initial photon positions/directions for non-chroma installs Too Many Moving parts ~~~~~~~~~~~~~~~~~~~~~~ #. OpenGL vertex attributes description of initial numpy ndarray #. shader attribute binding to access them from shaders #. what is PyCUDA GPUArray actually doing, as using OpenGL buffers direct from PyCUDA replaces this #. visualization gymnastics should not substantially impact without-vis propagation AoS or SoA ~~~~~~~~~~~~ * http://stackoverflow.com/questions/17924705/structure-of-arrays-vs-array-of-structures-in-cuda Data flow ~~~~~~~~~~~ #. ROOT deserializes the ChromaPhotonList bytes arriving from file or ZMQ into a ChromaPhotonList instance (a collection std::vector and std::vector) #. copy `pos`, `dir`, `wavelength`, `t` etc into numpy arrays inside a chroma.event.Photons instance #. copy subset of those arrays into "pdata" numpy ndarray The underlying data is coming from a numpy ndarray. Perhaps pycuda has solved the problem already ? http://documen.tician.de/pycuda/array.html Maybe, but for interop need to make pycuda use the OpenGL buffers. What needs to be shared between CUDA and OpenGL ? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #. position #. momdir (for lines) #. wavelength #. propagation flags, for selection/history Single VBO approach (array-of-structs) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This approach bangs into CUDA alignment/padding complexities device side and arriving at the struct that matches the OpenGL buffer layout. * http://www.igorsevo.com/Article.aspx?article=Million+particles+in+CUDA+and+OpenGL * http://on-demand.gputechconf.com/gtc/2013/presentations/S3477-GPGPU-In-Film-Production.pdf :: glBindBuffer(GL_ARRAY_BUFFER, vbo); glVertexPointer(3, GL_FLOAT, 16, 0); glColorPointer(4,GL_UNSIGNED_BYTE,16,(GLvoid*)12); In CUDA stuffs the colors into a float using a union:: union Color { float c; uchar4 components; }; Color temp; temp.components = make_uchar4(128/length,(int)(255/(velocity*51)),255,10); pos[y*width+x].w = temp.c; Ahha, users of OpenGL compute shaders face the same issues * http://stackoverflow.com/questions/21342814/rendering-data-in-opengl-vertices-and-compute-shaders * http://www.opengl.org/registry/doc/glspec43.core.20130214.pdf * 7.6.2.2 - Standard Uniform Block Layout NVIDIA Example ~~~~~~~~~~~~~~~~ * /usr/local/env/cuda/NVIDIA_CUDA-5.5_Samples/2_Graphics/simpleGL Targetted googling ~~~~~~~~~~~~~~~~~~~~~ * :google:`cuda kernel float4* VBO` andyswarm ^^^^^^^^^^^ #. color and position both as float4 with colors offset after position #. Advantage is can use `float4 *dptr` just like simpleGL example. * http://www.evl.uic.edu/aej/525/code/andySwarm.cu * http://www.evl.uic.edu/aej/525/code/andySwarm_kernel.cu :: // render from the vbo glBindBuffer(GL_ARRAY_BUFFER, vbo); glVertexPointer(4, GL_FLOAT, 0, 0); glColorPointer(4, GL_FLOAT, 0, (GLvoid *) (mesh_width * mesh_height * sizeof(float)*4)); Separate VBO approach (struct-of-arrays) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This approach avoids the struct problems at expense of high level bookkeeping for the multiple VBOs. Potentially an OpenGL draw performance hit too. * http://www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/225200412?pgno=6 Example uses separate VBOs for position and color and does manual linear addressing to change them from CUDA. Then OpenGL draws by binding to the multiple different VBO. This is nice and simple at expense of lots of VBOs :: __global__ void kernel(float4* pos, uchar4 *colorPos, unsigned int width, unsigned int height, float time) { unsigned int x = blockIdx.x*blockDim.x + threadIdx.x; unsigned int y = blockIdx.y*blockDim.y + threadIdx.y; ... // write output vertex pos[y*width+x] = make_float4(u, w, v, 1.0f); colorPos[y*width+x].w = 0; colorPos[y*width+x].x = 255.f *0.5*(1.f+sinf(w+x)); colorPos[y*width+x].y = 255.f *0.5*(1.f+sinf(x)*cosf(y)); colorPos[y*width+x].z = 255.f *0.5*(1.f+sinf(w+time/10.f)); } The splitting between arrays is done at glBindBuffer:: void renderCuda(int drawMode) { glBindBuffer(GL_ARRAY_BUFFER, vertexVBO.vbo); glVertexPointer(4, GL_FLOAT, 0, 0); glEnableClientState(GL_VERTEX_ARRAY); glBindBuffer(GL_ARRAY_BUFFER, colorVBO.vbo); glColorPointer(4, GL_UNSIGNED_BYTE, 0, 0); glEnableClientState(GL_COLOR_ARRAY); glBindBuffer ~~~~~~~~~~~~~~ * http://www.khronos.org/opengles/sdk/docs/man/xhtml/glBindBuffer.xml glBindBuffer lets you create or use a named buffer object. Calling glBindBuffer with target set to GL_ARRAY_BUFFER or GL_ELEMENT_ARRAY_BUFFER and buffer set to the name of the new buffer object binds the buffer object name to the target. When a buffer object is bound to a target, the previous binding for that target is automatically broken. When vertex array pointer state is changed by a call to glVertexAttribPointer, the current buffer object binding (GL_ARRAY_BUFFER_BINDING) is copied into the corresponding client state for the vertex attrib array being changed, one of the indexed GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDINGs. While a non-zero buffer object is bound to the GL_ARRAY_BUFFER target, the vertex array pointer parameter that is traditionally interpreted as a pointer to client-side memory is instead interpreted as an offset within the buffer object measured in basic machine units. OpenGL Drawing Techniques --------------------------- glDrawElements ~~~~~~~~~~~~~~~ Buffer offset default of 0 corresponds to glumpy original None, (ie (void*)0 ) the integer value is converted with `ctypes.c_void_p(offset)` allowing partial buffer drawing. * http://pyopengl.sourceforge.net/documentation/manual-3.0/glDrawElements.html * http://stackoverflow.com/questions/11132716/how-to-specify-buffer-offset-with-pyopengl * http://pyopengl.sourceforge.net/documentation/pydoc/OpenGL.arrays.vbo.html * http://www.opengl.org/discussion_boards/showthread.php/151386-VBO-BUFFER_OFFSET-and-glDrawElements-broken A C example of glDrawElements from /Developer/NVIDIA/CUDA-5.5/samples/5_Simulations/smokeParticles/SmokeRenderer.cpp:: glDrawElements(GL_POINTS, count, GL_UNSIGNED_INT, (void *)(start*sizeof(unsigned int))); # start is an int ==================== ============== type ==================== ============== GL_UNSIGNED_BYTE 0:255 GL_UNSIGNED_SHORT, 0:65535 GL_UNSIGNED_INT 0:4.295B ==================== ============== =================== ==================================== mode =================== ==================================== GL_POINTS GL_LINE_STRIP GL_LINE_LOOP GL_LINES GL_TRIANGLE_STRIP GL_TRIANGLE_FAN GL_TRIANGLES GL_QUAD_STRIP GL_QUADS GL_POLYGON =================== ==================================== The what letters, 'pnctesf' define the meaning of the arrays via enabling appropriate attributes. ================== ================== ================== ===== gl***Pointer GL_***_ARRAY Att names * ================== ================== ================== ===== Color COLOR color c EdgeFlag EDGE_FLAG edge_flag e FogCoord FOG_COORD fog_coord f Normal NORMAL normal n SecondaryColor SECONDARY_COLOR secondary_color s TexCoord TEXTURE_COORD tex_coord t Vertex VERTEX position p VertexAttrib N/A ================== ================== ================== ===== glDrawElements offset ~~~~~~~~~~~~~~~~~~~~~~~~ #. **glDrawElements offset applies to the entire indices array**, * ie it controls where to start getting indices from. * for offsets within each element have to use VertexAttrib offsets. glPushClientAttrib ~~~~~~~~~~~~~~~~~~~ http://www.opengl.org/sdk/docs/man2/xhtml/glPushClientAttrib.xml :: void glPushClientAttrib(GLbitfield mask); glPushClientAttrib takes one argument, a mask that indicates which groups of client-state variables to save on the client attribute stack. Symbolic constants are used to set bits in the mask. mask is typically constructed by specifying the bitwise-or of several of these constants together. The special mask GL_CLIENT_ALL_ATTRIB_BITS can be used to save all stackable client state. The symbolic mask constants and their associated GL client state are as follows (the second column lists which attributes are saved): GL_CLIENT_PIXEL_STORE_BIT Pixel storage modes GL_CLIENT_VERTEX_ARRAY_BIT Vertex arrays (and enables) glMultiDrawElements ~~~~~~~~~~~~~~~~~~~~ * http://stackoverflow.com/questions/11286964/glmultidrawelements-values-to-pass-to-indices-parameter-glvoid * http://stackoverflow.com/questions/5354710/drawing-polygons-with-varying-vertex-count-in-opengl-es * not in OpenGL ES 3.0 (highest in iOS so far), is there an alternate that avoids looping over draw calls ? * https://www.opengl.org/discussion_boards/showthread.php/176891-glMultiDrawElements-VBO * MultiDraw with VBO * indices and counts arrays are client side, but the indices array holds byte offsets into device side buffer Read OpenGL buffer back into numpy arrays ------------------------------------------- :: def read_array_buffer_0(self): """ * http://www.opengl.org/sdk/docs/man2/xhtml/glMapBuffer.xml * http://mail.scipy.org/pipermail/numpy-discussion/2008-September/037131.html ctypes.sizeof(ctypes.c_float) == 4 numpy.ctypeslib.as_array Writing into a mapped OpenGL VBO without pyopengl sugar ? * http://comments.gmane.org/gmane.comp.python.opengl.user/2069 :: # Map the buffer object to a pointer vbo_pointer = ctypes.cast(gl.glMapBuffer(gl.GL_ARRAY_BUFFER, gl.GL_WRITE_ONLY), ctypes.POINTER(ctypes.c_ubyte)) vbo_array = numpy.ctypeslib.as_array(vbo_pointer, (buffer_size,)) # numpy array from pointer vbo_array[0:data_size_to_copy] = data.view(dtype='uint8').ravel() gl.glUnmapBuffer(gl.GL_ARRAY_BUFFER) :: data = ctypes.string_at(gl.glMapBuffer(gl.GL_ARRAY_BUFFER, gl.GL_READ_ONLY), self.vertices.nbytes ) x = np.frombuffer(int_asbuffer(ctypes.addressof(data.contents), n*8)) * http://stackoverflow.com/questions/7543675/how-to-convert-pointer-to-c-array-to-python-array """ pass log.info("read_array_buffer") nbytes = self.vertices_copy.nbytes dtype = np.dtype(np.float32) itemsize = dtype.itemsize count = nbytes//itemsize # integer assert nbytes == count * itemsize ptr = gl.glMapBuffer(gl.GL_ARRAY_BUFFER, gl.GL_READ_ONLY) ArrayType = ctypes.c_float * count ap = ctypes.cast(y, ctypes.POINTER(ArrayType)) #vbo_data = ctypes.string_at(ptr, nbytes ) #x = np.frombuffer(int_asbuffer(ctypes.addressof(vbo_data),nbytes),dtype=dtype) #print x gl.glUnmapBuffer(gl.GL_ARRAY_BUFFER) #buffer_size = self.vertices_copy.nbytes #vbo_pointer = ctypes.cast(gl.glMapBuffer(gl.GL_ARRAY_BUFFER, gl.GL_READ_ONLY), ctypes.POINTER(ctypes.c_ubyte)) #vbo_array = np.ctypeslib.as_array(vbo_pointer, (buffer_size,)) # numpy array from pointer #vbo_array[0:data_size_to_copy] = data.view(dtype='uint8').ravel() #gl.glUnmapBuffer(gl.GL_ARRAY_BUFFER) def read_array_buffer( self ): """ * :google:`glMapBuffer numpy` * http://blog.vrplumber.com/b/2009/09/01/hack-to-map-vertex/ Map the given buffer into a numpy array... """ log.info("read_array_buffer") target = gl.GL_ARRAY_BUFFER access = gl.GL_READ_ONLY nbytes = self.vertices.nbytes func = ctypes.pythonapi.PyBuffer_FromMemory func.restype = ctypes.py_object ptr = gl.glMapBuffer( target, access ) buf = func( ctypes.c_void_p(ptr), nbytes ) arr = np.frombuffer( buf, 'B' ) self.propagated = arr.view(dtype=self.vertices.dtype) gl.glUnmapBuffer(target)