CUDA Context Cleanup
====================

Observe that GPU memory usage is high and stays high for minutes
even when no applications are actively using the GPU.  

Is this due to bad actors not cleaning up their CUDA Contexts ?

* does host thread death automatically cleanup any CUDA contexts that were openend ?

Running with OSX in normal GUI mode
-------------------------------------

Immediately after restart with Finder, Safari and Terminal::

    delta:~ blyth$ cuda_info.sh
    timestamp                Mon Dec  1 13:25:26 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              447.8M
    memory free              1.7G
    delta:~ blyth$ 

After a ~20min in Safari, Terminal, Sys Prefs a whole gig of GPU memory is gone::

    delta:~ blyth$ cuda_info.sh
    timestamp                Mon Dec  1 13:42:33 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              1.4G
    memory free              738.7M
    delta:~ blyth$ 

After sleeping during lunch, 1G of VRAM frees up::

    delta:env blyth$ cuda_info.sh
    timestamp                Mon Dec  1 14:55:09 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              440.8M
    memory free              1.7G
    delta:env blyth$ 


Pragmatic Solution
------------------

* run in ">console" mode when you need maximum GPU memory 
* for GUI usage, restart the machine often to ensure will have enough GPU memory 

  * TODO: add a GPU memory configurable minimum to g4daeview.py, it will try to 
    run with mapped/pinned host memory but the performance is factor 3-4 lower that 
    when sufficient memory to go GPU resident 


OSX VRAM 
-----------

* https://forums.adobe.com/thread/1326404

  * some Adobe raycaster, running into VRAM pressure

* http://www.anandtech.com/show/2804
* http://arstechnica.com/apple/2005/04/macosx-10-4/13/

  * abouts OSX VRAM usage, number of open windows matters

* retina screen support is presumably eating lots of VRAM 


Running without GUI using ">console" login
---------------------------------------------

GPU memory is almost entirely free::

    delta:~ blyth$ cuda_info.sh
    timestamp                Mon Nov 24 12:57:17 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              96.0M
    memory free              2.1G
    delta:~ blyth$ 

While running g4daechroma.sh::

    delta:~ blyth$ cuda_info.sh
    timestamp                Mon Nov 24 13:01:46 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              111.2M
    memory free              2.0G

Huh memory usage seems variable, sometimes get 220M used.

After one mocknuwa run::

    elta:~ blyth$ cuda_info.sh
    timestamp                Mon Nov 24 13:04:28 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              277.2M
    memory free              1.9G
    delta:~ blyth$ 

After 2nd mocknuwa, not increasing::

    delta:~ blyth$ cuda_info.sh 
    timestamp                Mon Nov 24 13:06:13 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              277.2M
    memory free              1.9G
    delta:~ blyth$ 

ctrl-c interrupt g4daechroma.py cleans up ok::

    delta:~ blyth$ cuda_info.sh 
    timestamp                Mon Nov 24 13:07:50 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              96.0M
    memory free              2.1G
    delta:~ blyth$ 

Repeating::

    delta:~ blyth$ cuda_info.sh 
    timestamp                Mon Nov 24 13:09:06 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              111.2M
    memory free              2.0G


Mem reporting from inside the process doesnt match the above::

    chroma_env)delta:MockNuWa blyth$ python mocknuwa.py 
                 a_min_free_gpu_mem :     300.00M  300000000  
                 b_node_array_usage :      54.91M  54909600  
                    b_node_itemsize :      16.00M  16  
                      b_split_index :       3.43M  3431850  
                          b_n_extra :       1.00M  1  
                          b_n_nodes :       3.43M  3431850  
                        b_splitting :       0.00M  0  
                  c_triangle_nbytes :      28.83M  28829184  
                     c_triangle_gpu :       1.00M  1  
                  d_vertices_nbytes :      14.60M  14597424  
                     d_triangle_gpu :       1.00M  1  
                         a_gpu_used :      99.57M  99573760  
                         b_gpu_used :     129.72M  129720320  
                         c_gpu_used :     184.64M  184639488  
                         d_gpu_used :     213.48M  213475328  
                         e_gpu_used :     228.16M  228155392  
    (chroma_env)delta:MockNuWa blyth$ 


Huh GPUGeometry init only happening when the first evt arrives::

    2014-11-24 13:22:58,720 INFO    env.geant4.geometry.collada.g4daeview.daedirectpropagator:53  DAEDirectPropagator ctrl {u'reset_rng_states': 1, u'nthreads_per_block': 64, u'seed': 0, u'max_blocks': 1024, u'max_steps': 30, u'COLUMNS': u'max_blocks:i,max_steps:i,nthreads_per_block:i,reset_rng_states:i,seed:i'} 
    2014-11-24 13:22:58,720 WARNING env.geant4.geometry.collada.g4daeview.daedirectpropagator:63  reset_rng_states
    2014-11-24 13:22:58,720 INFO    env.geant4.geometry.collada.g4daeview.daechromacontext:182 _set_rng_states
    2014-11-24 13:22:58,851 INFO    chroma.gpu.geometry :19  GPUGeometry.__init__ min_free_gpu_mem 300000000.0 
    2014-11-24 13:22:59,073 INFO    chroma.gpu.geometry :206 Optimization: Sufficient memory to move triangles onto GPU
    2014-11-24 13:22:59,085 INFO    chroma.gpu.geometry :220 Optimization: Sufficient memory to move vertices onto GPU
    2014-11-24 13:22:59,085 INFO    chroma.gpu.geometry :248 device usage:
    ----------
    nodes             3.4M  54.9M
    total                   54.9M
    ----------
    device total             2.1G
    device used            228.2M
    device free              1.9G

    2014-11-24 13:22:59,089 INFO    env.geant4.geometry.collada.g4daeview.daechromacontext:177 _get_rng_states
    2014-11-24 13:22:59,090 INFO    env.geant4.geometry.collada.g4daeview.daechromacontext:132 setup_rng_states using seed 0 
    2014-11-24 13:22:59,512 INFO    chroma.gpu.photon_hit:204 nwork 4165 step 0 max_steps 30 nsteps 30 
    2014-11-24 13:23:00,157 INFO    chroma.gpu.photon_hit:242 step 0 propagate_hit_kernel times  [0.6453909912109375] 
    2014-11-24 13:23:00,319 INFO    env.geant4.geometry.collada.g4daeview.daedirectpropagator:86  daedirectpropagator:propagate returning photons_end.as_npl()


Timings are not stable, even when running in console mode with no memory or other GPU user
contention.


Stuck Python Process
----------------------

Killing an old stuck process succeeds to free some ~200M of GPU memory, 
but still how is 1.7 G being used. 
When running with visible apps only Finder and Terminal.

::

    (chroma_env)delta:MockNuWa blyth$ cuda_info.py 
    timestamp                Mon Nov 24 12:40:09 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              1.9G
    memory free              232.1M
    (chroma_env)delta:MockNuWa blyth$ ps aux | grep python
    blyth           69938   1.2  0.2 35266100  31340 s000  S+    3Nov14 126:41.78 python /Users/blyth/env/bin/daedirectpropagator.py mock001
    blyth            2313   0.0  0.0  2423368    284 s007  R+   12:40PM   0:00.00 grep python
    (chroma_env)delta:MockNuWa blyth$ kill -9 69938 
    (chroma_env)delta:MockNuWa blyth$ ps aux | grep python
    blyth            2315   0.0  0.0  2423368    240 s007  R+   12:40PM   0:00.00 grep python
    (chroma_env)delta:MockNuWa blyth$ cuda_info.py 
    timestamp                Mon Nov 24 12:40:47 2014
    tag                      default
    name                     GeForce GT 750M
    compute capability       (3, 0)
    memory total             2.1G
    memory used              1.7G
    memory free              400.0M


Search
-------

* :google:`CUDA Context Cleanup`

https://devblogs.nvidia.com/parallelforall/pro-tip-clean-up-after-yourself-ensure-correct-profiling/ 

    If your application uses the CUDA Runtime API, call cudaDeviceReset() just
    before exiting, or when the application finishes making CUDA calls and using
    device data. If your application uses the CUDA Driver API, call
    cuProfilerStop() on each context to flush the profiling buffers before
    destroying the context with cuCtxDestroy().