Chroma Geometry Source Overview ================================= GPU memory usage of chroma geometry ------------------------------------- For simple memory check make sure a non-split load. 2nd `gpu_used` numbers show change with prior. :: (chroma_env)delta:MockNuWa blyth$ python mocknuwa.py a : start a_gpu_used : 1269.06M a_min_free_gpu_mem : 300.00M b : after materials,surfaces b_gpu_used : 1270.11M 1.05M c : after colors, idmap c_gpu_used : 1299.48M 29.37M ## large? for colors and idmap, maybe context setup for kernels ? d : after nodes d_gpu_used : 1354.40M 54.92M ## matches expectation for nodes d_nodes_nbytes : 54.91M d_nodes_itemsize : 16 d_nodes_count : 3.43M 16 * 3431850 = 54909600 ( 54.91 M) d_split_index : 3.43M d_extra_nodes_count : 1 d_splitting : 0 e_triangles_nbytes : 28.83M e_triangles_itemsize : 12 e_triangles_count : 2.40M 12 * 2402432 = 28829184 ( 28.83 M) e : after triangles e_gpu_used : 1383.23M 28.84M ## matches expectation for triangles e_triangles_gpu : 1 f_vertices_nbytes : 14.60M f_vertices_itemsize : 12 f_vertices_count : 1.22M 12 * 1216452 = 14597424 ( 14.60 M) f : after vertices f_gpu_used : 1397.91M 14.68M ## matches expectation for vertices f_vertices_gpu : 1 g : after geometry struct g_gpu_used : 1397.91M 0 (chroma_env)delta:MockNuWa blyth$ :: In [12]: 1397.91 - 1269.06 Out[12]: 128.85000000000014 # only ~129 MB where is geometry populated ----------------------------- So where is `Geometry` populated:: (chroma_env)delta:cuda blyth$ grep geometry_types.h *.* bvh.cu:#include "geometry_types.h" geometry.h:#include "geometry_types.h" # device funcs that query the geometry, accessing nodes/triangles etc.. From python with `chroma/gpu/geometry.py:GPUGeometry` using `pycuda.gpuarray` and `chroma.gpu.tools.make_gpu_struct` chroma/cuda/geometry_types.h ------------------------------ :: 46 struct Node 47 { 48 float3 lower; 49 float3 upper; 50 unsigned int child; 51 unsigned int nchild; 52 }; 53 54 struct Geometry 55 { 56 float3 *vertices; 57 uint3 *triangles; 58 unsigned int *material_codes; 59 unsigned int *colors; 60 uint4 *primary_nodes; 61 uint4 *extra_nodes; 62 Material **materials; 63 Surface **surfaces; 64 float3 world_origin; 65 float world_scale; 66 int nprimary_nodes; 67 }; * http://stackoverflow.com/a/4838734 chroma/gpu/geometry.py ------------------------ GPUGeometry class that converts into GPU side geometry using CUDA types from `geometry_types.h` :: simon:chroma blyth$ find . -name '*.py' -exec grep -H GPUGeometry {} \; ./camera.py: self.gpu_geometry = gpu.GPUGeometry(self.geometry) ./camera.py: gpu_geometry = gpu.GPUGeometry(geometry, print_usage=False) ./camera.py: gpu_geometry = gpu.GPUGeometry(geometry) ./gpu/detector.py:from chroma.gpu.geometry import GPUGeometry ./gpu/detector.py:class GPUDetector(GPUGeometry): ./gpu/detector.py: GPUGeometry.__init__(self, detector, wavelengths=wavelengths, print_usage=False) ./gpu/geometry.py:class GPUGeometry(object): ./sim.py: self.gpu_geometry = gpu.GPUGeometry(detector) (chroma_env)delta:chroma blyth$ find ../bin -type f -exec grep -H GPUGeometry {} \; ../bin/chroma-bvh:from chroma.gpu.geometry import GPUGeometry ../bin/chroma-bvh: gpu_geometry = GPUGeometry(geometry) GPUGeometry ~~~~~~~~~~~~~ GPU `Geometry` struct is constructed and populated from the below python using * `pycuda.gpuarray` * http://documen.tician.de/pycuda/array.html * `chroma.gpu.tools.make_gpu_struct` `chroma/gpu/geometry.py`:: 13 class GPUGeometry(object): 14 def __init__(self, geometry, wavelengths=None, print_usage=False, min_free_gpu_mem=300e6): 15 if wavelengths is None: 16 wavelengths = standard_wavelengths 17 18 try: 19 wavelength_step = np.unique(np.diff(wavelengths)).item() 20 except ValueError: 21 raise ValueError('wavelengths must be equally spaced apart.') 22 23 geometry_source = get_cu_source('geometry_types.h') 24 material_struct_size = characterize.sizeof('Material', geometry_source) 25 surface_struct_size = characterize.sizeof('Surface', geometry_source) 26 geometry_struct_size = characterize.sizeof('Geometry', geometry_source) 27 .. The materials/surfaces/mesh from the python geometry object are transferred over to the GPU and `Geometry` struct is created to hold the pointers. Some fiddly splitting of data between GPU and CPU ?:: 196 def Mapped(array): 197 '''Analog to pycuda.driver.InOut(), but indicates this array 198 is memory mapped to the device space and should not be copied. 199 200 To simplify coding, Mapped() will pass anything with a gpudata 201 member, like a gpuarray, through unchanged. 202 ''' 203 if hasattr(array, 'gpudata'): 204 return array 205 else: 206 return np.intp(array.base.get_device_pointer()) In [91]: np.intp Out[91]: numpy.int32 Heavy lifting of getting geometry onto GPU:: 134 self.vertices = mapped_empty(shape=len(geometry.mesh.vertices), 135 dtype=ga.vec.float3, 136 write_combined=True) 137 self.triangles = mapped_empty(shape=len(geometry.mesh.triangles), 138 dtype=ga.vec.uint3, 139 write_combined=True) 140 self.vertices[:] = to_float3(geometry.mesh.vertices) 141 self.triangles[:] = to_uint3(geometry.mesh.triangles) 142 143 self.world_origin = ga.vec.make_float3(*geometry.bvh.world_coords.world_origin) 144 self.world_scale = np.float32(geometry.bvh.world_coords.world_scale) 145 146 material_codes = (((geometry.material1_index & 0xff) << 24) | 147 ((geometry.material2_index & 0xff) << 16) | 148 ((geometry.surface_index & 0xff) << 8)).astype(np.uint32) 149 self.material_codes = ga.to_gpu(material_codes) /// packing 3 single byte indices (0:255) into unint32 with low byte empty In [110]: "0x%x" % (((0xaa & 0xff) << 24) | ((0xbb & 0xff) << 16) | ((0xcc & 0xff) << 8)) Out[110]: '0xaabbcc00' 150 colors = geometry.colors.astype(np.uint32) 151 self.colors = ga.to_gpu(colors) 152 self.solid_id_map = ga.to_gpu(geometry.solid_id.astype(np.uint32)) 153 154 # Limit memory usage by splitting BVH into on and off-GPU parts 155 gpu_free, gpu_total = cuda.mem_get_info() 156 node_array_usage = geometry.bvh.nodes.nbytes 157 158 # Figure out how many elements we can fit on the GPU, 159 # but no fewer than 100 elements, and no more than the number of actual nodes 160 n_nodes = len(geometry.bvh.nodes) 161 split_index = min( 162 max(int((gpu_free - min_free_gpu_mem) / geometry.bvh.nodes.itemsize),100), 163 n_nodes 164 ) 165 166 self.nodes = ga.to_gpu(geometry.bvh.nodes[:split_index]) 167 n_extra = max(1, (n_nodes - split_index)) # forbid zero size 168 self.extra_nodes = mapped_empty(shape=n_extra, 169 dtype=geometry.bvh.nodes.dtype, 170 write_combined=True) 171 if split_index < n_nodes: 172 logger.info('Splitting BVH between GPU and CPU memory at node %d' % split_index) 173 self.extra_nodes[:] = geometry.bvh.nodes[split_index:] 174 175 # See if there is enough memory to put the and/ortriangles back on the GPU 176 gpu_free, gpu_total = cuda.mem_get_info() 177 if self.triangles.nbytes < (gpu_free - min_free_gpu_mem): 178 self.triangles = ga.to_gpu(self.triangles) 179 logger.info('Optimization: Sufficient memory to move triangles onto GPU') 180 181 gpu_free, gpu_total = cuda.mem_get_info() 182 if self.vertices.nbytes < (gpu_free - min_free_gpu_mem): 183 self.vertices = ga.to_gpu(self.vertices) 184 logger.info('Optimization: Sufficient memory to move vertices onto GPU') 185 186 self.gpudata = make_gpu_struct(geometry_struct_size, 187 [Mapped(self.vertices), 188 Mapped(self.triangles), 189 self.material_codes, 190 self.colors, self.nodes, 191 Mapped(self.extra_nodes), 192 self.material_pointer_array, 193 self.surface_pointer_array, 194 self.world_origin, 195 self.world_scale, 196 np.int32(len(self.nodes))]) material and surface indices packed into material_codes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hmm, there are loadsa materials and surfaces, so whats this `material_codes` qty ?:: 145 146 material_codes = (((geometry.material1_index & 0xff) << 24) | 147 ((geometry.material2_index & 0xff) << 16) | 148 ((geometry.surface_index & 0xff) << 8)).astype(np.uint32) 149 self.material_codes = ga.to_gpu(material_codes) /// packing 3 single byte indices (0:255) into unint32 with low byte empty In [110]: "0x%x" % (((0xaa & 0xff) << 24) | ((0xbb & 0xff) << 16) | ((0xcc & 0xff) << 8)) Out[110]: '0xaabbcc00' In [130]: (np.array([(1<<8)-1, (1<<8), (1<<8)+1 ]) & 0xff ).astype(np.uint32) Out[130]: array([255, 0, 1], dtype=uint32) chroma/geometry.py:: 301 def flatten(self): 302 """ 303 Create the flat list of triangles (and triangle properties) 304 from the list of solids in this geometry. 305 306 This does not build the BVH! If you want to use the geometry 307 for rendering or simulation, you should call build() instead. 308 """ 309 310 # Don't run this function twice! 311 if hasattr(self, 'mesh'): 312 return 313 314 nv = np.cumsum([0] + [len(solid.mesh.vertices) for solid in self.solids]) 315 nt = np.cumsum([0] + [len(solid.mesh.triangles) for solid in self.solids]) 316 317 vertices = np.empty((nv[-1],3), dtype=np.float32) 318 triangles = np.empty((nt[-1],3), dtype=np.uint32) 319 320 321 logger.info('Flattening detector mesh...') 322 logger.info(' triangles: %d' % len(triangles)) 323 logger.info(' vertices: %d' % len(vertices)) 324 325 326 for i, solid in enumerate(self.solids): 327 vertices[nv[i]:nv[i+1]] = \ 328 np.inner(solid.mesh.vertices, self.solid_rotations[i]) + self.solid_displacements[i] 329 triangles[nt[i]:nt[i+1]] = solid.mesh.triangles + nv[i] 330 331 # Different solids are very unlikely to share vertices, so this goes much faster 332 self.mesh = Mesh(vertices, triangles, remove_duplicate_vertices=False) 333 334 self.colors = np.concatenate([solid.color for solid in self.solids]) 335 336 self.solid_id = np.concatenate([filled_array(i, shape=len(solid.mesh.triangles), dtype=np.uint32) for i, solid in enumerate(self.solids)]) 337 338 self.unique_materials = list(np.unique(np.concatenate([solid.unique_materials for solid in self.solids]))) 339 340 material_lookup = dict(zip(self.unique_materials, range(len(self.unique_materials)))) // // map from material to its unique index // 341 342 self.material1_index = np.concatenate([solid.material1_indices(material_lookup) for solid in self.solids]) 343 // array of material1 indices of every triangle in every solid // 344 self.material2_index = np.concatenate([solid.material2_indices(material_lookup) for solid in self.solids]) 345 346 self.unique_surfaces = list(np.unique(np.concatenate([solid.unique_surfaces for solid in self.solids]))) 347 348 surface_lookup = dict(zip(self.unique_surfaces, range(len(self.unique_surfaces)))) 349 350 self.surface_index = np.concatenate([solid.surface_indices(surface_lookup) for solid in self.solids]) 351 352 try: 353 self.surface_index[self.surface_index == surface_lookup[None]] = -1 354 except KeyError: 355 pass 356 chroma/cuda/photon.h:: 79 __device__ void 80 fill_state(State &s, Photon &p, Geometry *g) 81 { 82 p.last_hit_triangle = intersect_mesh(p.position, p.direction, g, 83 s.distance_to_boundary, 84 p.last_hit_triangle); 85 86 if (p.last_hit_triangle == -1) { 87 p.history |= NO_HIT; 88 return; 89 } 90 91 Triangle t = get_triangle(g, p.last_hit_triangle); 92 93 unsigned int material_code = g->material_codes[p.last_hit_triangle]; 94 95 int inner_material_index = convert(0xFF & (material_code >> 24)); 96 int outer_material_index = convert(0xFF & (material_code >> 16)); 97 s.surface_index = convert(0xFF & (material_code >> 8)); In [135]: "0x%x" % (0xff & (0xaabbcc00 >> 24 )) Out[135]: '0xaa' In [136]: "0x%x" % (0xff & (0xaabbcc00 >> 16 )) Out[136]: '0xbb' In [137]: "0x%x" % (0xff & (0xaabbcc00 >> 8 )) Out[137]: '0xcc' 98 99 float3 v01 = t.v1 - t.v0; 00 float3 v12 = t.v2 - t.v1; 01 02 s.surface_normal = normalize(cross(v01, v12)); 03 04 Material *material1, *material2; 05 if (dot(s.surface_normal,-p.direction) > 0.0f) { 06 // outside to inside 07 material1 = g->materials[outer_material_index]; 08 material2 = g->materials[inner_material_index]; 09 10 s.inside_to_outside = false; 11 } 12 else { 13 // inside to outside 14 material1 = g->materials[inner_material_index]; 15 material2 = g->materials[outer_material_index]; 16 s.surface_normal = -s.surface_normal; 17 18 s.inside_to_outside = true; 19 } 20 21 s.refractive_index1 = interp_property(material1, p.wavelength, 22 material1->refractive_index); 23 s.refractive_index2 = interp_property(material2, p.wavelength, 24 material2->refractive_index); 25 s.absorption_length = interp_property(material1, p.wavelength, 26 material1->absorption_length); 27 s.scattering_length = interp_property(material1, p.wavelength, 28 material1->scattering_length); 29 s.reemission_prob = interp_property(material1, p.wavelength, 30 material1->reemission_prob); 31 32 s.material1 = material1; 33 } // fill_state chroma/loader.py ------------------ `def load_geometry_from_string` :: 28 "filename.stl" or "filename.stl.bz2" - Create a geometry from a 29 3D mesh on disk. This model will not be cached, but the 30 BVH can be, depending on whether update_bvh_cache is True. chroma/stl.py --------------- Parse STL files (simple format of vertices and triangles) into Mesh objects. chroma/geometry.py -------------------- * Is the below wavelength comment outdated ? My impression was that the wavelengths used are held in the material/surface structs and interpolated as appropriate.:: 15 # all material/surface properties are interpolated at these 16 # wavelengths when they are sent to the gpu 17 standard_wavelengths = np.arange(60, 810, 20).astype(np.float32) 18 19 class Mesh(object): 20 "Triangle mesh object." 21 def __init__(self, vertices, triangles, remove_duplicate_vertices=False): 22 vertices = np.asarray(vertices, dtype=np.float32) 23 triangles = np.asarray(triangles, dtype=np.int32) Python side geometry * `Geometry`, a detector_material and a list of Solids, rotations and displacements * `flatten` method determines global unique_materials, unique_surfaces from those for each solid * `Solid`, attaches materials, surfaces, and colors to each triangle in the Mesh object argument * `Mesh` , comprising arrays of vertices and triangles * `Material`, with name and wavelength dependant property arrays: * refractive_index * absorption_length * scattering_length * reemission_prob * reemission_cdf * density * composition * `Surface`, with name and model and wavelength dependant optical property arrays: * detect/absort/reemit/reflect_diffuse/reflect_specular/eta/k/reemission_cdf/thickness/transmissive Q: where all these properties getting set ? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * not in the STL, thats a very simple list of vertices/triangles Chroma geometry construction currently done in "ad-hoc" python such as `chroma/demo/__init__.py`, Not out of some "standard" file format, like G4DAE COLLADA+metadata Chroma BVH class chroma/bvh/bvh.py ----------------------------------- A bounding volume hierarchy for a triangle mesh. For the purposes of Chroma, a BVH is a tree with the following properties: * Each node consists of an axis-aligned bounding box, a child ID number, and a boolean flag indicating whether the node is a leaf. The bounding box is represented as a lower and upper bound for each Cartesian axis. chroma/cuda/geometry_types.h ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: 46 struct Node 47 { 48 float3 lower; 49 float3 upper; 50 unsigned int child; 51 unsigned int nchild; 52 }; * All nodes are stored in a 1D array with the root node first. * A node with a bounding box that has no surface area (upper and lower bounds equal for all axes) is a dummy node that should be ignored. Dummy nodes are used to pad the tree to satisfy the fixed degree requirement described below, and have no children. * If the node is a leaf, then the child ID number refers to the ID number of the triangle this node contains. * If the node is not a leaf (an "inner" node), then the child ID number indicates the offset in the node array of the first child. The other children of this node will be stored immediately after the first child. * All inner nodes have the same number of children, called the "degree" (technically the "out-degree") of the tree. This avoid the requirement to save the degree with the node. * For simplicity, we also require nodes at the same depth in the tree to be contiguous, and the layers to be in order of increasing depth. * All nodes satisfy the **bounding volume hierarchy constraint**: their bounding boxes contain the bounding boxes of all their children. For space reasons, the BVH bounds are internally represented using 16-bit unsigned fixed point coordinates. Normally, we would want to hide that from you, but we would like to avoid rounding issues and high memory usage caused by converting back and forth between floating point and fixed point representations. For similar reasons, the node array is stored in a packed record format that can be directly mapped to the GPU. In general, you will not need to manipulate the contents of the BVH node array directly. chroma/cuda/mesh.h -------------------- Stack based recursive tree walk:: 36 /* Finds the intersection between a ray and `geometry`. If the ray does 37 intersect the mesh and the index of the intersected triangle is not equal 38 to `last_hit_triangle`, set `min_distance` to the distance from `origin` to 39 the intersection and return the index of the triangle which the ray 40 intersected, else return -1. */ 41 __device__ int 42 intersect_mesh(const float3 &origin, const float3& direction, Geometry *g, 43 float &min_distance, int last_hit_triangle = -1) 44 { 45 int triangle_index = -1; 46