Table Of Contents

This Page

Boolean CSG on GPU

TODO: numerical/chi2 history comparison with CFG4 booleans

TODO : python trees into opticks

new way without container:

2017-03-20 17:52:45.163 INFO  [525230] [GSolid::Dump@172] GMergedMesh::combine (source solids) numSolid 2
2017-03-20 17:52:45.163 INFO  [525230] [GNode::dump@180] GNode::dump
GNode::dump idx 0 nchild 0
2017-03-20 17:52:45.163 INFO  [525230] [GNode::dump@187] mesh.numSolids 0 mesh.ce.0 gfloat4      0.000      0.000      0.000      0.000
2017-03-20 17:52:45.163 INFO  [525230] [GNode::dump@180] GNode::dump
GNode::dump idx 1 nchild 0
2017-03-20 17:52:45.163 INFO  [525230] [GNode::dump@187] mesh.numSolids 0 mesh.ce.0 gfloat4      0.000      0.000      0.000    186.383
2017-03-20 17:52:45.163 INFO  [525230] [GMesh::allocate@604] GMesh::allocate numVertices 2520 numFaces 840 numSolids 2
2017-03-20 17:52:45.163 INFO  [525230] [GMesh::allocate@635] GMesh::allocate DONE
2017-03-20 17:52:45.163 INFO  [525230] [GMergedMesh::dumpSolids@606] GMergedMesh::combine (combined result)  ce0 gfloat4      0.000      0.000      0.000    186.383
    0 ce             gfloat4      0.000      0.000      0.000    186.383  bb bb min   -186.383   -186.383   -186.383  max    186.383    186.383    186.383
    1 ce             gfloat4      0.000      0.000      0.000    186.383  bb bb min   -186.383   -186.383   -186.383  max    186.383    186.383    186.383
    0 ni[nf/nv/nidx/pidx] (  0,  0,  0,4294967295)  id[nidx,midx,bidx,sidx]  (  0,  0,123,  0)
    1 ni[nf/nv/nidx/pidx] (840,2520,  1,4294967295)  id[nidx,midx,bidx,sidx]  (  1,  1,124,  0)
2017-03-20 17:52:45.163 INFO  [525230] [GParts::Summary@580] GGeoTest::combineSolids --dbganalytic num_parts 4 num_prim 0

old way with container:

2017-03-20 17:50:05.320 INFO  [524398] [GSolid::Dump@172] GMergedMesh::combine (source solids) numSolid 4
2017-03-20 17:50:05.320 INFO  [524398] [GNode::dump@180] GNode::dump
GNode::dump idx 0 nchild 0
2017-03-20 17:50:05.320 INFO  [524398] [GNode::dump@187] mesh.numSolids 0 mesh.ce.0 gfloat4      0.000      0.000      0.000   1000.000
2017-03-20 17:50:05.320 INFO  [524398] [GNode::dump@180] GNode::dump
GNode::dump idx 0 nchild 0
2017-03-20 17:50:05.320 INFO  [524398] [GNode::dump@187] mesh.numSolids 0 mesh.ce.0 gfloat4      0.000      0.000      0.000    500.000
2017-03-20 17:50:05.320 INFO  [524398] [GNode::dump@180] GNode::dump
GNode::dump idx 0 nchild 0
2017-03-20 17:50:05.320 INFO  [524398] [GNode::dump@187] mesh.numSolids 0 mesh.ce.0 gfloat4      0.000      0.000      0.000    150.000
2017-03-20 17:50:05.320 INFO  [524398] [GNode::dump@180] GNode::dump
GNode::dump idx 0 nchild 0
2017-03-20 17:50:05.320 INFO  [524398] [GNode::dump@187] mesh.numSolids 0 mesh.ce.0 gfloat4      0.000      0.000      0.000      1.000
2017-03-20 17:50:05.320 INFO  [524398] [GMesh::allocate@604] GMesh::allocate numVertices 3912 numFaces 1316 numSolids 4
2017-03-20 17:50:05.320 INFO  [524398] [GMesh::allocate@635] GMesh::allocate DONE
2017-03-20 17:50:05.320 FATAL [524398] [GMergedMesh::mergeSolid@460] GMergedMesh::mergeSolid mismatch  nodeIndex 0 m_cur_solid 1
2017-03-20 17:50:05.320 FATAL [524398] [GMergedMesh::mergeSolid@460] GMergedMesh::mergeSolid mismatch  nodeIndex 0 m_cur_solid 2
2017-03-20 17:50:05.320 FATAL [524398] [GMergedMesh::mergeSolid@460] GMergedMesh::mergeSolid mismatch  nodeIndex 0 m_cur_solid 3
2017-03-20 17:50:05.321 INFO  [524398] [GMergedMesh::dumpSolids@606] GMergedMesh::combine (combined result)  ce0 gfloat4      0.000      0.000      0.000   1000.000
    0 ce             gfloat4      0.000      0.000      0.000   1000.000  bb bb min  -1000.000  -1000.000  -1000.000  max   1000.000   1000.000   1000.000
    1 ce             gfloat4      0.000      0.000      0.000    500.000  bb bb min   -500.000   -500.000   -500.000  max    500.000    500.000    500.000
    2 ce             gfloat4      0.000      0.000      0.000    150.000  bb bb min   -150.000   -150.000   -150.000  max    150.000    150.000    150.000
    3 ce             gfloat4      0.000      0.000      0.000    200.000  bb bb min   -200.000   -200.000   -200.000  max    200.000    200.000    200.000
    0 ni[nf/nv/nidx/pidx] ( 12, 24,  0,4294967295)  id[nidx,midx,bidx,sidx]  (  0,  0,  0,  0)
    1 ni[nf/nv/nidx/pidx] ( 12, 24,  0,4294967295)  id[nidx,midx,bidx,sidx]  (  0,  0,  0,  0)
    2 ni[nf/nv/nidx/pidx] ( 12, 24,  0,4294967295)  id[nidx,midx,bidx,sidx]  (  0,  0,  0,  0)
    3 ni[nf/nv/nidx/pidx] (1280,3840,  0,4294967295)  id[nidx,midx,bidx,sidx]  (  0,  0,  0,  0)
2017-03-20 17:50:05.321 INFO  [524398] [GParts::Summary@580] GGeoTest::combineSolids --dbganalytic num_parts 4 num_prim 0
 part  0 : node  0 type  6 boundary [123] Rock//perfectAbsorbSurface/Vacuum
 part  1 : node  1 type  2 boundary [124] Vacuum///GlassSchottF2
 part  2 : node  1 type  6 boundary [124] Vacuum///GlassSchottF2
 part  3 : node  1 type  5 boundary [124] Vacuum///GlassSchottF2

TODO : randbox photon source

  • very handy python side for intersect testing, implement in CUDA too
  • also some option to load python generated rays into OptiX would allow same rays testing

TODO : more natural test tree construction

  • authoring CSG trees in levelorder is difficult, find more natural approach

TODO: support transforms in op nodes

  • G4 boolean transforms apply to right side only, follow that.

TODO: support empties, ie non-perfect trees

TODO : algorithm characteristic comparisons

Extended period of boolean CSG debugging has resulted in several CUDA/OptiX and python implementations

Need to compare their characteristics

  • maximum csg and tranche stack size
  • performance

Boolean Debugging

FIXED : Root Termination problem

Issue: ERROR_XOR_SIDE, ERROR_POP_EMPTY only for very overlapped height 3 trees

The problem was an bug in the translation of the recursive CSG algorithm into an iterative version for use on GPU. The bug resulted in the root being visited twice for very overlapped geometries.

Reiteration within another reiteration needs to terminate at the caller not at the root.

slavish.py

Using 10k rays generated in a random box, succeed to get a few errors python side, made reproducible by setting np.random.seed.

slavish.py
[2017-03-08 13:45:35,836] p62090 {/Users/blyth/opticks/dev/csg/intersectTest.py:180} INFO - np.random.seed 0
ierr: 0x00001008 iray:   785
ierr: 0x00001008 iray:  1972
ierr: 0x00001008 iray:  3546
ierr: 0x00001008 iray:  7119
ierr: 0x00001008 iray:  7325
ierr: 0x00001008 iray:  8894
[2017-03-08 13:45:53,221] p62090 {/Users/blyth/opticks/dev/csg/intersectTest.py:229} WARNING - $TMP/tboolean-csg-four-box-minus-sphere : compare : i_discrep {'d': IIS([ 785,  785,  785, 1972, 1972, 1972, 3546, 3546, 3546, 7119, 7119, 7119, 7325, 7325, 7325, 8894, 8894, 8894]), 'ipos': IIS([ 785,  785,  785, 1972, 1972, 1972, 3546, 3546, 3546, 7119, 7119, 7119, 7325, 7325, 7325, 8894, 8894, 8894]), 't': array([ 785, 1972, 3546, 7119, 7325, 8894]), 'o': IIS([ 785,  785,  785, 1972, 1972, 1972, 3546, 3546, 3546, 7119, 7119, 7119, 7325, 7325, 7325, 8894, 8894, 8894]), 'n': IIS([ 785,  785,  785, 1972, 1972, 1972, 3546, 3546, 3546, 7119, 7119, 7119, 7325, 7325, 7325, 8894, 8894, 8894])} r_discrep: {}

Cause is same for all, iterative misses loopers that recursive does:

[2017-03-08 14:34:06,284] p62256 {/Users/blyth/opticks/dev/csg/intersectTest.py:185} INFO - np.random.seed 0
[2017-03-08 14:34:06,286] p62256 {/Users/blyth/opticks/dev/csg/slavish.py:267} INFO -    785 I : tranche begin 0 end 7
   785 I : nodeIdx  4
   785 I : nodeIdx  5
   785 I : nodeIdx  2
   785 I : nodeIdx  6
   785 I : nodeIdx  7
   785 I : nodeIdx  3
   785 I : nodeIdx  1
ierr: 0x00001008 tst.iray:   785

   785 R : nodeIdx  4
   785 R : nodeIdx  5
   785 R : nodeIdx  2
   785 R : nodeIdx  6
   785 R : nodeIdx  7
   785 R : nodeIdx  3
   785 R : nodeIdx  6*   REPEAT RIGHT SUBTREE
   785 R : nodeIdx  7*
   785 R : nodeIdx  3*
   785 R : nodeIdx  1

[2017-03-08 14:34:06,289] p62256 {/Users/blyth/opticks/dev/csg/slavish.py:267} INFO -   1972 I : tranche begin 0 end 7
  1972 I : nodeIdx  4
  1972 I : nodeIdx  5
  1972 I : nodeIdx  2
  1972 I : nodeIdx  6
  1972 I : nodeIdx  7
  1972 I : nodeIdx  3
  1972 I : nodeIdx  1
ierr: 0x00001008 tst.iray:  1972
  1972 R : nodeIdx  4
  1972 R : nodeIdx  5
  1972 R : nodeIdx  2
  1972 R : nodeIdx  6
  1972 R : nodeIdx  7
  1972 R : nodeIdx  3
  1972 R : nodeIdx  4*  REPEAT LEFT SUBTREE
  1972 R : nodeIdx  5*
  1972 R : nodeIdx  2*
  1972 R : nodeIdx  1
[2017-03-08 14:34:06,292] p62256 {/Users/blyth/opticks/dev/csg/slavish.py:267} INFO -   3546 I : tranche begin 0 end 7
  3546 I : nodeIdx  4
  3546 I : nodeIdx  5
  3546 I : nodeIdx  2
ierr: 0x00001008 tst.iray:  3546
  3546 R : nodeIdx  4
  3546 R : nodeIdx  5
  3546 R : nodeIdx  5*   REPEAT A BILEAF
  3546 R : nodeIdx  2
  3546 R : nodeIdx  6
  3546 R : nodeIdx  7
  3546 R : nodeIdx  3
  3546 R : nodeIdx  1
[2017-03-08 14:34:06,295] p62256 {/Users/blyth/opticks/dev/csg/slavish.py:267} INFO -   7119 I : tranche begin 0 end 7
  7119 I : nodeIdx  4
  7119 I : nodeIdx  5
  7119 I : nodeIdx  2
  7119 I : nodeIdx  6
  7119 I : nodeIdx  7
  7119 I : nodeIdx  3
  7119 I : nodeIdx  1
ierr: 0x00001008 tst.iray:  7119
  7119 R : nodeIdx  4
  7119 R : nodeIdx  5
  7119 R : nodeIdx  2
  7119 R : nodeIdx  6
  7119 R : nodeIdx  7
  7119 R : nodeIdx  3
  7119 R : nodeIdx  4*
  7119 R : nodeIdx  5*
  7119 R : nodeIdx  2*
  7119 R : nodeIdx  1

CSG Errors

Very overlapped geometry like : tboolean-csg-four-box-minus-sphere gives errors, shown below. Dumping the launch_index and comparing between runs suggests the issue is reproducible.

Returning the improper

0x1008 -> 1008 -> ERROR_RHS_END_EMPTY
0x100c -> 100c -> ERROR_LHS_END_NONEMPTY ERROR_RHS_END_EMPTY
      0x1 -> 1 -> ERROR_LHS_POP_EMPTY

Origin shows not primary rays causing errors:

2017-03-08 11:02:28.525 INFO  [457301] [OPropagator::prelaunch@149] 1 : (0;100000,1) prelaunch_times vali,comp,prel,lnch  0.0000 1.0982 0.1492 0.0000
intersect_csg primIdx_ 1 ierr 1008 launch_index (  175,    0) li.x(26) 19 ray.direction (     0.865,    -0.354,    -0.354) ray.origin (   -50.111,   -37.211,    -4.933)
intersect_csg primIdx_ 1 ierr 100c launch_index (  249,    0) li.x(26) 15 ray.direction (    -0.000,     0.434,    -0.901) ray.origin (    35.866,   -53.215,    50.111)
intersect_csg primIdx_ 1 ierr 1008 launch_index (  615,    0) li.x(26) 17 ray.direction (    -0.000,    -0.901,     0.434) ray.origin (    28.152,    50.111,     9.413)
intersect_csg primIdx_ 1 ierr 100c launch_index (   11,    0) li.x(26) 11 ray.direction (     0.434,    -0.000,    -0.901) ray.origin (    -6.774,    44.818,    50.111)
intersect_csg primIdx_ 1 ierr 100c launch_index (  323,    0) li.x(26) 11 ray.direction (     0.434,    -0.000,    -0.901) ray.origin (    -1.145,    31.434,    50.111)
intersect_csg primIdx_ 1 ierr 100c launch_index (  387,    0) li.x(26) 23 ray.direction (     0.354,    -0.865,     0.354) ray.origin (    42.450,    50.111,   -55.690)
intersect_csg primIdx_ 1 ierr 1008 launch_index (  406,    0) li.x(26) 16 ray.direction (    -0.000,     0.901,     0.434) ray.origin (   -37.924,   -50.111,     0.866)
intersect_csg primIdx_ 1 ierr 1008 launch_index (  397,    0) li.x(26)  7 ray.direction (     0.901,    -0.434,    -0.000) ray.origin (   -50.111,   -14.494,    17.463)
intersect_csg primIdx_ 1 ierr 1008 launch_index ( 1286,    0) li.x(26) 12 ray.direction (     0.434,    -0.000,     0.901) ray.origin (  -158.749,   -45.161,   -50.111)
intersect_csg primIdx_ 1 ierr 1008 launch_index (  207,    0) li.x(26) 25 ray.direction (     0.354,     0.354,     0.865) ray.origin (  -146.598,   -51.685,   -50.111)
intersect_csg primIdx_ 1 ierr 1008 launch_index (  584,    0) li.x(26) 12 ray.direction (     0.901,    -0.000,     0.434) ray.origin (   -50.111,   -16.444,    17.319)
intersect_csg primIdx_ 1 ierr 1008 launch_index (  662,    0) li.x(26) 12 ray.direction (     0.901,    -0.000,     0.434) ray.origin (   -50.111,   -17.234,    15.378)
intersect_csg primIdx_ 1 ierr 1008 launch_index (  666,    0) li.x(26) 16 ray.direction (    -0.000,     0.901,     0.434) ray.origin (   -25.323,   -50.111,     1.325)
intersect_csg primIdx_ 1 ierr    1 launch_index ( 1325,    0) li.x(26) 25 ray.direction (     0.354,     0.865,     0.354) ray.origin (    31.793,   -50.111,   -10.657)
intersect_csg primIdx_ 1 ierr 100c launch_index ( 1519,    0) li.x(26) 11 ray.direction (     0.434,    -0.000,    -0.901) ray.origin (    10.308,    21.809,    50.111)
intersect_csg primIdx_ 1 ierr 1008 launch_index (   99,    0) li.x(26) 21 ray.direction (    -0.354,    -0.865,     0.354) ray.origin (    52.533,   150.111,   -37.067)
intersect_csg primIdx_ 1 ierr 1008 launch_index ( 1968,    0) li.x(26) 18 ray.direction (    -0.865,    -0.354,    -0.354) ray.origin (    50.111,   -41.536,    21.572)
intersect_csg primIdx_ 1 ierr 1008 launch_index (  967,    0) li.x(26)  5 ray.direction (    -0.000,    -0.000,     1.000) ray.origin (   -47.721,   -40.248,  -250.111)
intersect_csg primIdx_ 1 ierr 100c launch_index (  141,    0) li.x(26) 11 ray.direction (     0.434,    -0.000,    -0.901) ray.origin (    26.544,     3.120,    50.111)
intersect_csg primIdx_ 1 ierr 100c launch_index (  985,    0) li.x(26) 23 ray.direction (     0.779,    -0.007,     0.627) ray.origin (    38.651,    13.330,   -10.936)
intersect_csg primIdx_ 1 ierr 1008 tloop   0 launch_index ( 1005,  365) li.x(26) 17 ray.direction (    -0.990,    -0.111,     0.089) ray.origin (    80.850,   -27.053,   -58.984)
intersect_csg primIdx_ 1 ierr 1008 tloop   0 launch_index ( 1006,  365) li.x(26) 18 ray.direction (    -0.990,    -0.110,     0.089) ray.origin (    80.850,   -27.053,   -58.984)
intersect_csg primIdx_ 1 ierr 1008 tloop   0 launch_index ( 1007,  365) li.x(26) 19 ray.direction (    -0.990,    -0.109,     0.089) ray.origin (    80.850,   -27.053,   -58.984)
intersect_csg primIdx_ 1 ierr 1008 tloop   0 launch_index ( 1004,  367) li.x(26) 16 ray.direction (    -0.990,    -0.112,     0.091) ray.origin (    80.850,   -27.053,   -58.984)
intersect_csg primIdx_ 1 ierr 1008 tloop   0 launch_index ( 1005,  367) li.x(26) 17 ray.direction (    -0.990,    -0.111,     0.091) ray.origin (    80.850,   -27.053,   -58.984)
intersect_csg primIdx_ 1 ierr 1008 tloop   0 launch_index ( 1006,  367) li.x(26) 18 ray.direction (    -0.990,    -0.110,     0.091) ray.origin (    80.850,   -27.053,   -58.984)

PRINT BUFFER -1 OVERFLOW
intersect_csg primIdx_ 1 ierr    1 tloop   2 launch_index (  920,  383) li.x(26) 10 ray.direction (    -0.978,    -0.184,     0.102) ray.origin (    82.681,   -27.666,   -60.320)
intersect_csg primIdx_ 1 ierr    1 tloop   2 launch_index (  921,  383) li.x(26) 11 ray.direction (    -0.978,    -0.183,     0.102) ray.origin (    82.681,   -27.666,   -60.320)
intersect_csg primIdx_ 1 ierr    1 tloop   2 launch_index (  922,  383) li.x(26) 12 ray.direction (    -0.978,    -0.182,     0.102) ray.origin (    82.681,   -27.666,   -60.320)
intersect_csg primIdx_ 1 ierr    1 tloop   2 launch_index (  923,  383) li.x(26) 13 ray.direction (    -0.978,    -0.182,     0.102) ray.origin (    82.681,   -27.666,   -60.320)
intersect_csg primIdx_ 1 ierr    1 tloop   2 launch_index (  924,  383) li.x(26) 14 ray.direction (    -0.978,    -0.181,     0.102) ray.origin (    82.681,   -27.666,   -60.320)
intersect_csg primIdx_ 1 ierr    1 tloop   2 launch_index (  925,  383) li.x(26) 15 ray.direction (    -0.978,    -0.180,     0.102) ray.origin (    82.681,   -27.666,   -60.320)
intersect_csg primIdx_ 1 ierr    1 tloop   2 launch_index (  926,  383) li.x(26) 16 ray.direction (    -0.978,    -0.179,     0.102) ray.origin (    82.681,   -27.666,   -60.320)

DONE: boolean csg tree implementation

OptiX array

Hmm seems everything other than very simple things need to go into buffers.

C : Two meanings of static

  • static global variables and functions, scope limited to definining file
  • static local variables, typically use compile time reserved data segment of memory rather than transient call stack

CUDA guide : static local variables within function

Within the body of a __device__ or __global__ function, only __shared__ variables or variables without any device memory qualifiers may be declared with static storage class.

Within the body of a __device__ __host__ function, only unannotated static variables (i.e., without device memory qualifiers) may be declared with static storage class. Unannotated function-scope static variables have the same restrictions as __device__ variables defined in namespace scope. They cannot have a non-empty constructor or a non-empty destructor, if they are of class type (see Device Memory Qualifiers).

  • hmm, this explains why I had to remove ctors/dtors in my simple structs
struct S1_t { int x; };
struct S2_t { int x; __device__ S2_t(void) { x = 10; } };
struct S3_t { int x; __device__ S3_t(int p) : x(p) { } };
__device__ void f1() {
         static int i1; // OK
         static int i2 = 11; // OK
         static S1_t i3; // OK
         static S1_t i4 = {22}; // OK
         static __shared__ int i5; // OK
         int x = 33;
         static int i6 = x; // error: dynamic initialization is not allowed
         static S1_t i7 = {x}; // error: dynamic initialization is not allowed
         static S2_t i8; // error: dynamic initialization is not allowed
         static S3_t i9(44); // error: dynamic initialization is not allowed
}
  • restriction to non-dynamic static local variables in device kernels makes sense, otherwise each of the millions of threads would need it own data segment
  • With compile time defined restriction can just have one used for all threads

OptiX/CUDA static variables

NB the below is an example of dynamic use of local static variables, so can only work host side.

/Developer/OptiX/SDK/optixTutorial/random.h:

69 // Multiply with carry
70 static __host__ __inline__ unsigned int mwc()
71 {
72   static unsigned long long r[4];
73   static unsigned long long carry;
74   static bool init = false;
75   if( !init ) {
76     init = true;
77     unsigned int seed = 7654321u, seed0, seed1, seed2, seed3;
78     r[0] = seed0 = lcg2(seed);
79     r[1] = seed1 = lcg2(seed0);
80     r[2] = seed2 = lcg2(seed1);
81     r[3] = seed3 = lcg2(seed2);
82     carry = lcg2(seed3);
83   }
84
85   unsigned long long sum = 2111111111ull * r[3] +
86                            1492ull       * r[2] +
87                            1776ull       * r[1] +
88                            5115ull       * r[0] +
89                            1ull          * carry;
90   r[3]   = r[2];
91   r[2]   = r[1];
92   r[1]   = r[0];
93   r[0]   = static_cast<unsigned int>(sum);        // lower half
94   carry  = static_cast<unsigned int>(sum >> 32);  // upper half
95   return static_cast<unsigned int>(r[0]);
96 }

Adding node transforms

Matrix manip, optixu_matrix_namespace.h

OptiX : const float3

2112 OPTIXU_INLINE RT_HOSTDEVICE float luminanceCIE(const float3& rgb)
2113 {
2114   const float3 cie_luminance = { 0.2126f, 0.7152f, 0.0722f };
2115   return  dot( rgb, cie_luminance );
2116 }

OptiX float4 as a very short stack

simon:optixu blyth$ grep ByIndex optixu_math_namespace.h
OPTIXU_INLINE RT_HOSTDEVICE float getByIndex(const float1& v, int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(float1& v, int i, float x)
OPTIXU_INLINE RT_HOSTDEVICE float getByIndex(const float2& v, int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(float2& v, int i, float x)
OPTIXU_INLINE RT_HOSTDEVICE float getByIndex(const float3& v, int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(float3& v, int i, float x)
OPTIXU_INLINE RT_HOSTDEVICE float getByIndex(const float4& v, int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(float4& v, int i, float x)
OPTIXU_INLINE RT_HOSTDEVICE int getByIndex(const int1& v, int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(int1& v, int i, int x)
OPTIXU_INLINE RT_HOSTDEVICE int getByIndex(const int2& v, int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(int2& v, int i, int x)
OPTIXU_INLINE RT_HOSTDEVICE int getByIndex(const int3& v, int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(int3& v, int i, int x)
OPTIXU_INLINE RT_HOSTDEVICE int getByIndex(const int4& v, int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(int4& v, int i, int x)
OPTIXU_INLINE RT_HOSTDEVICE unsigned int getByIndex(const uint1& v, unsigned int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(uint1& v, int i, unsigned int x)
OPTIXU_INLINE RT_HOSTDEVICE unsigned int getByIndex(const uint2& v, unsigned int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(uint2& v, int i, unsigned int x)
OPTIXU_INLINE RT_HOSTDEVICE unsigned int getByIndex(const uint3& v, unsigned int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(uint3& v, int i, unsigned int x)
OPTIXU_INLINE RT_HOSTDEVICE unsigned int getByIndex(const uint4& v, unsigned int i)
OPTIXU_INLINE RT_HOSTDEVICE void setByIndex(uint4& v, int i, unsigned int x)

Lookup tables in C

Perfect tree traversal has lots of constants, also boolean_act and boolean_table decision logic has lots of if statements with a small range of input values.

This kinda thing seems suited to small static lookup tables, to avoid computation every time. Of course with CUDA its not at all sure there will be any benefit, as GPUs favor computation over memory access.

Whats missing for opticks csg tree ?

  • postorder tree threading, leftmost operator starting point
  • stack of float4(quad) for tranches, holding tmin and begin/end tree indices
  • stack of float4 holding normal and t

Needs to be almost complete tree anyhow for easy serializing

  • so postorder can be hardcoded for different tree depths

depth 1, triplet:

In [21]: Node.postorder_r(root1, nodes=[])
Out[21]: [s2.s, s3.s, I1.Intersection(s2.s,s3.s)]

In [22]: root1.txt
Out[22]:
root1
     I1
      o
 s2      s3
  o       o

depth 2, septuplet:

In [15]: Node.postorder_r(root2, nodes=[])
Out[15]:
[s4.s,
 s5.s,
 I2.Intersection(s4.s,s5.s),
 s6.s,
 s7.s,
 I3.Intersection(s6.s,s7.s),
 U1.Union(I2.Intersection(s4.s,s5.s),I3.Intersection(s6.s,s7.s))]

In [16]: root2.txt
Out[16]:
root2
             U1
              o
     I2              I3
      o               o
 s4      s5      s6      s7
  o       o       o       o

depth 3, 15-tuplet:

In [17]: Node.postorder_r(root3, nodes=[])
Out[17]:
[s8.s,                            i  = 8
 s9.s,                            i+1 = 9                  add 1 to get to right sibling
 I4.Intersection(s8.s,s9.s),      (i+1)/2 = 4              divide by 2, up to parent
 s10.s,                           ( (i+1)/2) + 1)*2 = 10   add 1, multip by 2
 s11.s,                           ((i/2) + 1)*2 + 1 = 11
 I5.Intersection(s10.s,s11.s),
 U2.Union(I4.Intersection(s8.s,s9.s),I5.Intersection(s10.s,s11.s)),
 s12.s,
 s13.s,
 I6.Intersection(s12.s,s13.s),
 s14.s,
 s15.s,
 I7.Intersection(s14.s,s15.s),
 U3.Union(I6.Intersection(s12.s,s13.s),I7.Intersection(s14.s,s15.s)),
 U1.Union(U2.Union(I4.Intersection(s8.s,s9.s),I5.Intersection(s10.s,s11.s)),U3.Union(I6.Intersection(s12.s,s13.s),I7.Intersection(s14.s,s15.s)))]

In [18]: root3.txt
Out[18]:
root3
                             U1
                              o
             U2                              U3
              o                               o
     I4              I5              I6              I7
      o               o               o               o
 s8      s9     s10     s11     s12     s13     s14     s15
  o       o       o       o       o       o       o       o
  • 4, 5, 2, 6, 7, 3, 1
  • unsigned long long postorder_depth3 = 0x1376254 (64 bits)

Simpler to fly above the leaves:

In [26]: Node.postorder_r(root3, nodes=[], leaf=False)
Out[26]:
[I4.Intersection(s8.s,s9.s),
 I5.Intersection(s10.s,s11.s),
 U2.Union(I4.Intersection(s8.s,s9.s),I5.Intersection(s10.s,s11.s)),
 I6.Intersection(s12.s,s13.s),
 I7.Intersection(s14.s,s15.s),
 U3.Union(I6.Intersection(s12.s,s13.s),I7.Intersection(s14.s,s15.s)),
 U1.Union(U2.Union(I4.Intersection(s8.s,s9.s),I5.Intersection(s10.s,s11.s)),U3.Union(I6.Intersection(s12.s,s13.s),I7.Intersection(s14.s,s15.s)))]
  • If T has a total of N nodes, the number of internal nodes is I = (N – 1)/2
  • 1 + 2 + 4 + 8 + ... + 2^d = tot_d
  • 1 + ( 2 + 4 + 8 + 16 + ... + 2^d ) + 2^(d+1) = 1 + 2*tot_d
  • tot_d + 2^(d+1) = 1 + 2*tot_d
  • tot_d = 2^(d+1) - 1
  • internal nodes, [( 2^(d+1) - 1 ) - 1] / 2 -> 2^d - 1
  • better to base things from the depth, as might want to support gaps on the last row
  • depth number of nodes number of leaves
  • d = 0, 2^1 - 1 = 1
  • d = 1, 2^2 - 1 = 3
  • d = 2, 2^3 - 1 = 7
  • d = 3, 2^4 - 1 = 15
  • d = 4, 2^5 - 1 = 31

Tree Threading ?

  • GCSG (which should probably be renamed GCSGPmt) does something similar using a NPY buffer (created in python) as the input
  • most methods require an item index
32 #include "GGEO_API_EXPORT.hh"
33 class GGEO_API GCSG {
34     public:
..
62     public:
63         unsigned int getNumItems();
64     public:
65         float getX(unsigned int i);
66         float getY(unsigned int i);
67         float getZ(unsigned int i);
68         float getOuterRadius(unsigned int i);
69         float getInnerRadius(unsigned int i);
70         float getSizeZ(unsigned int i);
71         float getStartTheta(unsigned int i);
72         float getDeltaTheta(unsigned int i);
73     public:
74         unsigned int getTypeCode(unsigned int i);
75         bool isUnion(unsigned int i);
76         bool isIntersection(unsigned int i);
77         bool isSphere(unsigned int i);
78         bool isTubs(unsigned int i);
79
80         unsigned int getNodeIndex(unsigned int i);  // 1-based index, 0:unset
81         unsigned int getParentIndex(unsigned int i);  // 1-based index, 0:unset
82         unsigned int getSpare(unsigned int i);
83
84         const char* getTypeName(unsigned int i);
85     public:
86         unsigned int getIndex(unsigned int i);
87         unsigned int getNumChildren(unsigned int i);
88         unsigned int getFirstChildIndex(unsigned int i);
89         unsigned int getLastChildIndex(unsigned int i);
90     private:
91         float        getFloat(unsigned int i, unsigned int j, unsigned int k);
92         unsigned int getUInt(unsigned int i, unsigned int j, unsigned int k);
93
94     private:
95         NPY<float>*        m_csg_buffer ;
96         GItemList*         m_materials ;
97         GItemList*         m_lvnames ;
98         GItemList*         m_pvnames ;

CsgInBox test geometry

152 tboolean-csg-notes(){ cat << EON
153
154 * CSG tree is defined in breadth first order
155
156 * parameters of boolean operations currently define adhoc box
157   intended to contain the geometry, TODO: calculate from bounds of the contained tree
158
159 * offsets arg identifies which nodes belong to which primitives by pointing
160   at the nodes that start each primitive
161
162 EON
163 }
164
165 tboolean-csg()
166 {
167     local material=$(tboolean-material)
168     local inscribe=$(python -c "import math ; print 1.3*200/math.sqrt(3)")
169     local radius=200
170
171     local test_config=(
172                       mode=CsgInBox
173                       analytic=1
174                       offsets=0,1     ##
175
176                       node=box          parameters=0,0,0,1000          boundary=Rock//perfectAbsorbSurface/Vacuum
177
178                       node=union        parameters=0,0,0,400           boundary=Vacuum///$material
179                       node=difference   parameters=0,0,100,300         boundary=Vacuum///$material
180                       node=difference   parameters=0,0,-100,300        boundary=Vacuum///$material
181                       node=box          parameters=0,0,100,$inscribe   boundary=Vacuum///$material
182                       node=sphere       parameters=0,0,100,$radius     boundary=Vacuum///$material
183                       node=box          parameters=0,0,-100,$inscribe  boundary=Vacuum///$material
184                       node=sphere       parameters=0,0,-100,$radius    boundary=Vacuum///$material
185
186                       )
187
188     echo "$(join _ ${test_config[@]})"
189 }

Where is the tree ?

278 bool GGeoTestConfig::isStartOfPrimitive(unsigned nodeIdx )
279 {
280     return std::find(m_offsets.begin(), m_offsets.end(), nodeIdx) != m_offsets.end() ;
281 }


237 GMergedMesh* GGeoTest::createCsgInBox()
238 {
239     std::vector<GSolid*> solids ;
240     unsigned int n = m_config->getNumElements();
241
242     unsigned numPrim = m_config->getNumOffsets();
243     LOG(info) << "GGeoTest::createCsgInBox"
244               << " nodes " << n
245               << " numPrim " << numPrim
246              ;
247
248     int primIdx(-1) ;
249
250     for(unsigned int i=0 ; i < n ; i++)
251     {
252         bool primStart = m_config->isStartOfPrimitive(i); // as identified by configured offsets
253         if(primStart)
254         {
255             primIdx++ ;
256         }
...
284         GParts* pts = solid->getParts();
285
286         pts->setIndex(0u, i);
287         pts->setNodeIndex(0u, primIdx );
288         pts->setFlags(0u, flags);
289         pts->setBndLib(m_bndlib);
290
291         solids.push_back(solid);
292     }
86 char GMaker::NodeCode(const char* nodename)
87 {
88     char sc = 'U' ;
89     if(     strcmp(nodename, BOX) == 0)     sc = 'B' ;
90     else if(strcmp(nodename, SPHERE) == 0)  sc = 'S' ;
91     else if(strcmp(nodename, ZSPHERE) == 0) sc = 'Z' ;
92     else if(strcmp(nodename, ZLENS) == 0)   sc = 'L' ;
93     else if(strcmp(nodename, PMT) == 0)     sc = 'P' ;  // not operational
94     else if(strcmp(nodename, PRISM) == 0)   sc = 'M' ;
95     else if(strcmp(nodename, INTERSECTION) == 0)   sc = 'I' ;
96     else if(strcmp(nodename, UNION) == 0)          sc = 'J' ;
97     else if(strcmp(nodename, DIFFERENCE) == 0)     sc = 'K' ;
98     return sc ;
99 }

Tree serialization

2017-03-01 15:31:06.796 INFO  [6205604] [GParts::dumpPrimInfo@530] OGeo::makeAnalyticGeometry pts (part_offset, parts_for_prim, prim_index, prim_flags) numPrim:2
2017-03-01 15:31:06.796 INFO  [6205604] [GParts::dumpPrimInfo@535]  (  0,  1,  0, 16)
2017-03-01 15:31:06.796 INFO  [6205604] [GParts::dumpPrimInfo@535]  (  1,  7,  1,  4)
2017-03-01 15:31:06.796 INFO  [6205604] [GParts::dump@731] GParts::dump ni 8
     0.0000      0.0000      0.0000   1000.0000
     0.0000       0 <-id       123 <-bnd       16 <-flg  SHAPE_PRIMITIVE   bn Rock//perfectAbsorbSurface/Vacuum
 -1000.0100  -1000.0100  -1000.0100           3 (PART_BOX)
  1000.0100   1000.0100   1000.0100           0 (nodeIndex)

     0.0000      0.0000      0.0000    400.0000
     0.0000       1 <-id       124 <-bnd        4 <-flg  SHAPE_UNION   bn Vacuum///GlassSchottF2
  -400.0100   -400.0100   -400.0100           3 (PART_BOX)
   400.0100    400.0100    400.0100           1 (nodeIndex)

     0.0000      0.0000    100.0000    300.0000
     0.0000       2 <-id       124 <-bnd        8 <-flg  SHAPE_DIFFERENCE   bn Vacuum///GlassSchottF2
  -300.0100   -300.0100   -300.0100           3 (PART_BOX)
   300.0100    300.0100    300.0100           1 (nodeIndex)

     0.0000      0.0000   -100.0000    300.0000
     0.0000       3 <-id       124 <-bnd        8 <-flg  SHAPE_DIFFERENCE   bn Vacuum///GlassSchottF2
  -300.0100   -300.0100   -300.0100           3 (PART_BOX)
   300.0100    300.0100    300.0100           1 (nodeIndex)

     0.0000      0.0000    100.0000    150.1111
     0.0000       4 <-id       124 <-bnd       16 <-flg  SHAPE_PRIMITIVE   bn Vacuum///GlassSchottF2
  -150.1211   -150.1211   -150.1211           3 (PART_BOX)
   150.1211    150.1211    150.1211           1 (nodeIndex)

     0.0000      0.0000    100.0000    200.0000
     0.0000       5 <-id       124 <-bnd       16 <-flg  SHAPE_PRIMITIVE   bn Vacuum///GlassSchottF2
  -200.0100   -200.0100   -200.0100           1 (PART_SPHERE)
   200.0100    200.0100    200.0100           1 (nodeIndex)

     0.0000      0.0000   -100.0000    150.1111
     0.0000       6 <-id       124 <-bnd       16 <-flg  SHAPE_PRIMITIVE   bn Vacuum///GlassSchottF2
  -150.1211   -150.1211   -150.1211           3 (PART_BOX)
   150.1211    150.1211    150.1211           1 (nodeIndex)

     0.0000      0.0000   -100.0000    200.0000
     0.0000       7 <-id       124 <-bnd       16 <-flg  SHAPE_PRIMITIVE   bn Vacuum///GlassSchottF2
  -200.0100   -200.0100   -200.0100           1 (PART_SPHERE)
   200.0100    200.0100    200.0100           1 (nodeIndex)

FIXED Issue : ray trace “near/tmin” clipping fails to see inside booleans

  • FIXED BY STARTING boolean tA_min and tB_min at ray.tmin

The usual behavior of near clipping enabling to see inside things is not working with booleans when the viewpoint is outside the boolean.

As approach a boolean solid the near point preceeds you... when it reaches the solid a circular-ish black hole forms, this gets bigger as proceed onwards the black filling most of the frame until the viewpoint gets into the boolean primitive bbox(?) and suddenly the blackness changes into a view of the insides. Once inside changing the near point works to clip how much of insides can see.

Tempted to use scene_epsilon in the below, but its not correct (or currently possible) for general intersection code to depend on a rendering only thing like scene_epsilon.

Begs the question how does non-boolean geometry manage to get near clipped ?

  • rays are shot with t_min set to scene_epsilon

Exploring optix_device.h find ray.tmin, this might provide a solution:

simon:include blyth$ grep tmin *.h
optix_device.h:  optix::rt_trace(*(unsigned int*)&topNode, ray.origin, ray.direction, ray.ray_type, ray.tmin, ray.tmax, &prd, sizeof(T));
optix_device.h:  * @param[in] tmin  t value of the ray to be checked
optix_device.h:static inline __device__ bool rtPotentialIntersection( float tmin )
optix_device.h:  return optix::rt_potential_intersection( tmin );
optix_device.h:              "  ray tmin      : %f\n"
simon:include blyth$

1811 template<class T>
1812 static inline __device__ void rtTrace( rtObject topNode, optix::Ray ray, T& prd )
1813 {
1814   optix::rt_trace(*(unsigned int*)&topNode, ray.origin, ray.direction, ray.ray_type, ray.tmin, ray.tmax, &prd, sizeof(T));
1815 }

YEP IT WORKS:

33 static __device__
34 void intersect_boolean( const uint4& prim, const uint4& identity )
..
61     //float tA_min = propagate_epsilon ;
62     //float tB_min = propagate_epsilon ;
63     float tA_min = ray.tmin ;
64     float tB_min = ray.tmin ;
65     float tA     = 0.f ;
66     float tB     = 0.f ;
33 static __device__
34 void intersect_boolean( const uint4& prim, const uint4& identity )
35 {
..
57     // _min 0.f rather than propagate_epsilon
58     // leads to missed boundaries when start photons on a boundary,
59     // see boolean_csg_on_gpu.rst
60
61     float tA_min = propagate_epsilon ;
62     float tB_min = propagate_epsilon ;
63     float tA     = 0.f ;
64     float tB     = 0.f ;

scene_epsilon

scene_epsilon is how the near clipping feeds into the rays:

45 RT_PROGRAM void pinhole_camera()
46 {
47
48   PerRayData_radiance prd;
49   prd.flag = 0u ;
50   prd.result = bad_color ;
51
52   float2 d = make_float2(launch_index) / make_float2(launch_dim) * 2.f - 1.f ;
53
54   optix::Ray ray = parallel == 0 ?
55                        optix::make_Ray( eye                 , normalize(d.x*U + d.y*V + W), radiance_ray_type, scene_epsilon, RT_DEFAULT_MAX)
56                      :
57                        optix::make_Ray( eye + d.x*U + d.y*V , normalize(W)                , radiance_ray_type, scene_epsilon, RT_DEFAULT_MAX)
58                      ;
59
simon:geant4_opticks_integration blyth$ opticks-find scene_epsilon
./optixrap/cu/pinhole_camera.cu:rtDeclareVariable(float,         scene_epsilon, , );
...
./optixrap/cu/pinhole_camera.cu:  // scene_epsilon is "t_min" but ray_direction is normalized,
./optixrap/cu/pinhole_camera.cu:  // scene_epsilon is the distance along the ray at which to start
./optixrap/OTracer.cc:    m_context[ "scene_epsilon"]->setFloat(m_composition->getNear());
./optixrap/OTracer.cc:    float scene_epsilon = m_composition->getNear();
./optixrap/OTracer.cc:    m_context[ "scene_epsilon"]->setFloat(scene_epsilon);
./ana/debug/genstep_sequence_material_mismatch.py:     328     m_context[ "scene_epsilon"]->setFloat(m_composition->getNear());

FIXED Issue : boolean insides invisible from outside

Not sure why, but fixed by using “absolute loop ctrl” instead of relative in intersect_boolean

159         else if(
160                      (action & AdvanceAAndLoop)
161                   ||
162                      ((action & AdvanceAAndLoopIfCloser) && tA <= tB )
163                 )
164         {
165
166 #ifdef BOOLEAN_DEBUG
167             if( (action & AdvanceAAndLoop) )                     debugA = 2 ;
168             if( (action & AdvanceAAndLoopIfCloser) && tA <= tB ) debugA = 3 ;
169 #endif
170
171             //ctrl = ctrl & ~LIVE_B  ;   // CAUSES INVISIBLE INSIDES
172             ctrl = LIVE_A  ;
173             tA_min = tA ;
174         }
175         else if(
176                      (action & AdvanceBAndLoop)
177                   ||
178                      ((action & AdvanceBAndLoopIfCloser) && tB <= tA )
179                 )
180         {
181             //ctrl = ctrl & ~LIVE_A  ;   // CAUSES INVISIBLE INSIDES
182             ctrl = LIVE_B ;
183             tB_min = tB ;
184         }
185
186      }     // while loop
187 }

tboolean-box-dented shows a hole where expect to see surface of concave hemi-spherical dent.

Using BOOLEAN_DEBUG to color the A and B intersects makes the problem clearer. Can only see innards when the viewpoint is inside.

tboolean-box-minus-sphere shows no insides:

106     local inscribe=$(python -c "import math ; print 1.3*200/math.sqrt(3)")
107     local test_config_1=(
108                  mode=BoxInBox
109                  analytic=1
110
111                  shape=box          parameters=0,0,0,1000          boundary=Rock//perfectAbsorbSurface/Vacuum
112
113                  shape=difference   parameters=0,0,0,300           boundary=Vacuum///$material
114                  shape=box          parameters=0,0,0,$inscribe     boundary=Vacuum///$material
115                  shape=sphere       parameters=0,0,0,200           boundary=Vacuum///$material
116
117                )

FIXED : Issue : cannot see booleans from inside

  • formerly saw that when navigating inside the union, see only container box not the union shape insides

Fixed by moving from:

if( valid_intersect )
{
    float tint = tmin > 0.f ? tmin : tmax ;  // pick the intersect
    tt = tint > tt_min ? tint : tt_min ;
    ...

To:

if( valid_intersect )
{
    //  just because the ray intersects the box doesnt
    //  mean want to see it, there are 3 possibilities
    //
    //                t_near       t_far
    //
    //                  |           |
    //        -----1----|----2------|------3---------->
    //                  |           |
    //
    tt =  tt_min < t_near ?
                           t_near
                        :
                           ( tt_min < t_far ? t_far : tt_min )

FIXED : Issue : ray trace of box shows slab intersects extending behind the box

Was due to intersect validity not handling axis aligned photons

  • checked the non-boolean box, thats working fine with no artifacts.
  • Using discaxial torch type to shoot photons from 26 positions and directions, so can feel the geometry in a numerical manner.
  • when on target, things look correct, the same as the non-boolen box when off target the invalid intersects manifest
local discaxial_hit=0,0,0
local discaxial_miss=0,0,300
local torch_config_discaxial=(
             type=discaxial
             photons=$photons
             frame=-1
             transform=$identity
             source=$discaxial_hit
             target=0,0,0
             time=0.1
             radius=110
             distance=200
             zenithazimuth=0,1,0,1
             material=Vacuum
             wavelength=$wavelength
           )

Axis aligned photon directions appear to be part of the problem at least:

421       else if( ts.type == T_DISCAXIAL )
422       {
423           unsigned long long photon_id = launch_index.x ;
424
425           //float3 dir = get_direction_26( photon_id % 26 );
426           //float3 dir = get_direction_6( photon_id % 6 );
427           //float3 dir = get_direction_6( photon_id % 4, -0.00001f );  // 1st 4: +X,-X,+Y,-Y   SPURIOUS INTERSECTS GONE
428           //float3 dir = get_direction_6( photon_id % 4, -0.f );       // 1st 4: +X,-X,+Y,-Y   SPURIOUS INTERSECTS GONE
429           float3 dir = get_direction_6( photon_id % 4, 0.f );          // 1st 4: +X,-X,+Y,-Y   SPURIOUS INTERSECTS BACK AGAIN
430
431           float r = radius*sqrtf(u1) ; // sqrt avoids pole bunchung
432           float3 discPosition = make_float3( r*cosPhi, r*sinPhi, 0.f );
433           rotateUz(discPosition, dir);
434
435           // ts.x0 should be placed inside the target when hits are desired
436           // wih DISCAXIAL mode
437           p.position = ts.x0 + distance*dir + discPosition ;
438           p.direction = -dir ;
439

Curious the direction zeros are all negative 0 resulting in -inf for both -X and +X directions:

ray.origin 200.000000 -11.247929 307.520966 ray.direction -1.000000 -0.000000 -0.000000 idir -1.000000 -inf -inf
ray.origin 200.000000 44.386002 262.619629 ray.direction -1.000000 -0.000000 -0.000000 idir -1.000000 -inf -inf
ray.origin 200.000000 -88.033470 321.681213 ray.direction -1.000000 -0.000000 -0.000000 idir -1.000000 -inf -inf
ray.origin 200.000000 -39.863480 244.735748 ray.direction -1.000000 -0.000000 -0.000000 idir -1.000000 -inf -inf
ray.origin -200.000000 97.620598 274.010651 ray.direction 1.000000 -0.000000 -0.000000 idir 1.000000 -inf -inf
ray.origin 200.000000 8.609403 199.297638 ray.direction -1.000000 -0.000000 -0.000000 idir -1.000000 -inf -inf
ray.origin -200.000000 -67.498100 266.557739 ray.direction 1.000000 -0.000000 -0.000000 idir 1.000000 -inf -inf
ray.origin -200.000000 78.251770 366.333496 ray.direction 1.000000 -0.000000 -0.000000 idir 1.000000 -inf -inf
ray.origin -200.000000 47.188507 215.060699 ray.direction 1.000000 -0.000000 -0.000000 idir 1.000000 -inf -inf

Using a delta 0.00001f get -1/delta and spurious interects remain:

ray.origin 200.000778 9.482430 213.216736 ray.direction -1.000000 -0.000010 -0.000010 idir -1.000000 -100000.000000 -100000.000000
ray.origin -199.999054 48.094410 346.568787 ray.direction 1.000000 -0.000010 -0.000010 idir 1.000000 -100000.000000 -100000.000000

Bizarrely switching to delta -0.00001f get 1/delta and the spurious intersects are gone:

ray.origin 199.999344 -88.035469 321.679199 ray.direction -1.000000 0.000010 0.000010 idir -1.000000 100000.000000 100000.000000
ray.origin 199.999222 9.478431 213.212708 ray.direction -1.000000 0.000010 0.000010 idir -1.000000 100000.000000 100000.000000
ray.origin 200.000000 49.761848 249.952194 ray.direction -1.000000 0.000010 0.000010 idir -1.000000 100000.000000 100000.000000
ray.origin 200.000748 39.745564 334.747955 ray.direction -1.000000 0.000010 0.000010 idir -1.000000 100000.000000 100000.000000
ray.origin -199.999298 -8.694067 238.793365 ray.direction 1.000000 0.000010 0.000010 idir 1.000000 100000.000000 100000.000000
ray.origin 199.999878 -76.475029 363.946503 ray.direction -1.000000 0.000010 0.000010 idir -1.000000 100000.000000 100000.000000
ray.origin 200.000290 44.076099 285.449768 ray.direction -1.000000 0.000010 0.000010 idir -1.000000 100000.000000 100000.000000

Same when using -0.f:

425           //float3 dir = get_direction_26( photon_id % 26 );
426           //float3 dir = get_direction_6( photon_id % 6 );
427           //float3 dir = get_direction_6( photon_id % 4, -0.00001f );     // 1st 4: +X,-X,+Y,-Y
428           float3 dir = get_direction_6( photon_id % 4, -0.f );     // 1st 4: +X,-X,+Y,-Y
429
430           float r = radius*sqrtf(u1) ; // sqrt avoids pole bunchung
431           float3 discPosition = make_float3( r*cosPhi, r*sinPhi, 0.f );
432           rotateUz(discPosition, dir);
433
434           // ts.x0 should be placed inside the target when hits are desired
435           // wih DISCAXIAL mode
436           p.position = ts.x0 + distance*dir + discPosition ;
437           p.direction = -dir ;
ray.origin 200.000000 14.684715 244.904205 ray.direction -1.000000 0.000000 0.000000 idir -1.000000 inf inf
ray.origin 200.000000 -68.328766 251.635269 ray.direction -1.000000 0.000000 0.000000 idir -1.000000 inf inf
ray.origin -200.000000 102.468193 335.907471 ray.direction 1.000000 0.000000 0.000000 idir 1.000000 inf inf
ray.origin 200.000000 -26.478765 307.570923 ray.direction -1.000000 0.000000 0.000000 idir -1.000000 inf inf
ray.origin 200.000000 -15.085106 304.063721 ray.direction -1.000000 0.000000 0.000000 idir -1.000000 inf inf
42    float3 idir = make_float3(1.f)/ray.direction ;
43    float3 t0 = (bmin - ray.origin)*idir;
44    float3 t1 = (bmax - ray.origin)*idir;
idir -1.000000 -inf -inf t0 300.000000 inf inf t1 100.000000 -inf inf
idir -1.000000 -inf -inf t0 300.000000 inf inf t1 100.000000 -inf inf
idir -1.000000 -inf -inf t0 300.000000 inf inf t1 100.000000 -inf inf
idir -1.000000 -inf -inf t0 300.000000 inf inf t1 100.000000 -inf inf
idir 1.000000  -inf -inf t0 100.000000 inf inf t1 300.000000 -inf inf
idir 1.000000  -inf -inf t0 100.000000 inf inf t1 300.000000 -inf inf
idir 1.000000  -inf -inf t0 100.000000 inf inf t1 300.000000 -inf inf
idir 1.000000  -inf -inf t0 100.000000 inf inf t1 300.000000 -inf inf

CUDA fminf/fmaxf/max infinity/nan handling ?

simon:include blyth$ grep fminf *.*
device_functions.h:__DEVICE_FUNCTIONS_STATIC_DECL__ float fminf(float x, float y);
device_functions.hpp:__DEVICE_FUNCTIONS_STATIC_DECL__ float fminf(float x, float y)
device_functions.hpp:  return __nv_fminf(x, y);
device_functions_decls.h:__DEVICE_FUNCTIONS_DECLS__ float __nv_fminf(float x, float y);
math_functions.h:extern __host__ __device__ __device_builtin__ float                  fminf(float x, float y) __THROW;
math_functions.h:extern __host__ __device__ __device_builtin__ _CRTIMP float  __cdecl fminf(float x, float y);
math_functions.h:__func__(float fminf(float a, float b));
math_functions.hpp:  return fminf(a, b);
math_functions.hpp:  return fminf(a, b);
math_functions.hpp:__func__(float fminf(float a, float b))
nppi_color_conversion.h: *  This code uses the fmaxf() and fminf() 32 bit floating point math functions.
nppi_color_conversion.h: *  Npp32f nMin = fminf(nNormalizedR, nNormalizedG);
nppi_color_conversion.h: *         nMin = fminf(nMin, nNormalizedB);
nppi_color_conversion.h: *  This code uses the fmaxf() and fminf() 32 bit floating point math functions.
nppi_color_conversion.h: *  Npp32f nTemp = fminf(nNormalizedR, nNormalizedG);
nppi_color_conversion.h: *         nTemp = fminf(nTemp, nNormalizedB);
simon:include blyth$
simon:include blyth$
simon:include blyth$ pwd
/Developer/NVIDIA/CUDA-7.0/include

FIXED Issue : boolean intersection “lens” : boundary disappears from inside

FIXED by starting tmin from propagate_epsilon, as during propagation photons start on boundaries

Using boolean sphere-sphere intersection to construct a lens.:

72 tboolean-testconfig()
73 {
74     local material=GlassSchottF2
75     #local material=MainH2OHale
76
77     local test_config=(
78                  mode=BoxInBox
79                  analytic=1
80
81                  shape=box      parameters=0,0,0,1200               boundary=Rock//perfectAbsorbSurface/Vacuum
82
83                  shape=intersection parameters=0,0,0,400            boundary=Vacuum///$material
84                  shape=sphere       parameters=0,0,-600,641.2          boundary=Vacuum///$material
85                  shape=sphere       parameters=0,0,600,641.2           boundary=Vacuum///$material
86
87                )
91      echo "$(join _ ${test_config[@]})"
92 }

Observe that photons reflecting inside the lens off the 2nd boundary do not intersect with the 1st boundary on their way back yielding “TO BT BR SA”

Similarly, and more directly, also have “TO BT SA” not seeing the 2nd boundary.

Initially thought the raytrace confirmed this as it looked OK from outside but when go inside the boundary disappears, but that turns out to be just near clipping.

tboolean-;tboolean--

FIXED Issue : lens not bending light

Fixed by passing the boundary index via the instanceIdentity attribute from intersection to closest hit progs.

approach

ggeo/GPmt.hh ggeo/GCSG.hh

Brings python prepared CSG tree for DYB PMT into GPmt member

Looks like GCSG is currently being translated into into partBuffer/solidBuffer representation prior to GPU ?

hemi-pmt.cu:

/// flag needed in solidBuffer
///
///   0:primitive
///   1:boolean-intersect
///   2:boolean-union
///   3:boolean-difference
///
/// presumably the numParts will be 2 for booleans
/// thence can do the sub-intersects and boolean logic
///
/// ...
/// need to elide the sub-solids from OptiX just passing booleans
/// in as a single solidBuffer entry with numParts = 2 ?
///
/// maybe change name solidBuffer->primBuffer
/// as booleans handled as OptiX primitives composed of two parts
///

1243 RT_PROGRAM void intersect(int primIdx)
1244 {
1245   const uint4& solid    = solidBuffer[primIdx];
1246   unsigned int numParts = solid.y ;
....
1252   uint4 identity = identityBuffer[instance_index] ;
1254
1255   for(unsigned int p=0 ; p < numParts ; p++)
1256   {
1257       unsigned int partIdx = solid.x + p ;
1258
1259       quad q0, q1, q2, q3 ;
1260
1261       q0.f = partBuffer[4*partIdx+0];
1262       q1.f = partBuffer[4*partIdx+1];
1263       q2.f = partBuffer[4*partIdx+2] ;
1264       q3.f = partBuffer[4*partIdx+3];
1265
1266       identity.z = q1.u.z ;  // boundary from partBuffer (see ggeo-/GPmt)
1267
1268       int partType = q2.i.w ;
1269
1270       // TODO: use enum
////     this is the NPart.hpp enum
////
1271       switch(partType)
1272       {
1273           case 0:
1274                 intersect_aabb(q2, q3, identity);
1275                 break ;
1276           case 1:
1277                 intersect_zsphere<false>(q0,q1,q2,q3,identity);
1278                 break ;