CUDA Error Handling including timeouts ======================================== Handling / Reset ? ------------------- * http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api * http://stackoverflow.com/questions/19632401/how-to-work-around-gpu-watchdog-timer-limitation-on-cuda-code-in-os-x * http://stackoverflow.com/questions/9602312/gpu-card-resets-after-2-seconds Compute and Graphics --------------------- Using GPU for both, forces use of timeouts. * https://devtalk.nvidia.com/default/topic/483643/cuda-the-launch-timed-out-and-was-terminated/ * https://devtalk.nvidia.com/search/more/sitecommentsearch/Launch%20timeout/ CUDA Driver API Errors -------------------------- * http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html#group__CUDA__TYPES `CUDA_ERROR_LAUNCH_FAILED = 700` An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. The context cannot be used, so it must be destroyed (and a new one should be created). All existing device memory allocations from this context are invalid and must be reconstructed if the program is to continue using CUDA. `CUDA_ERROR_LAUNCH_TIMEOUT = 702` This indicates that the device kernel took too long to execute. This can only occur if timeouts are enabled - see the device attribute CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT for more information. The context cannot be used (and must be destroyed similar to CUDA_ERROR_LAUNCH_FAILED). All existing device memory allocations from this context are invalid and must be reconstructed if the program is to continue using CUDA. `CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT = 17` Specifies whether there is a run time limit on kernels deviceQuery -------------- :: delta:w blyth$ cuda-samples-bin-deviceQuery | grep limit Run time limit on kernels: Yes