CUDA Driver API나 Runtime API는 원칙적으로 하나의 GPU를 사용하여
하나의 GPU내의 멀티쓰레드를 사용하도록 설계되어 있다. 이러한 제어의 관리를 컨텍스트 라고 지칭해도 될 지 모르겠지만
이러한 기본 컨텍스트는 1번 GPU를 사용하도록 되어있고,
Runtime API 쪽에서는 cudaSetDevice (int device) 라는 함수로 특정 GPU를 사용하도록 제한할 수 있다.
하지만 Driver API에는 이러한 함수가 존재하지 않으므로
직접 Handle을 이용하여 openMP 나 thread 등을 이용하여 직접 여러개의 CPU 쓰레드를 이용하여
GPU를 여러개 동시에 가동시키는 방법을 사용하는 것으로 보여진다.
8.2 Multi-GPU Programming
In order to issue work to a GPU, a context is established between a CPU thread and the GPU. Only one context can be active on GPU at a time. Similarly, a CPU thread can have one active context at a time. A context is established during the program’s first call to a function that changes state (such as cudaMalloc(), etc.), so one can force the creation of a context by calling cudaFree(0). Note that a context is created on GPU 0 by default, unless another GPU is selected explicitly prior to context creation with a cudaSetDevice() call. Context is destroyed either with a cudaThreadExit() call, or when the controlling CPU thread exits.
CUDA driver API allows a single CPU thread to manage multiple contexts (and therefore multiple GPUs) by pushing/popping contexts. In the remainder of the document we will focus on CUDA runtime API, which currently allows strictly one context per CPU thread.
In order to issue work to p GPUs concurrently, a program needs p CPU threads, each with its own context. Threads can be lightweight (pthreads, OpenMP, etc.) or heavyweight (MPI). Note that any CPU multi-threading or message-passing API or library can be used, as CPU thread management is completely orthogonal to CUDA.
For example, one can add GPU processing to an existing MPI application by porting the compute-intensive portions of the code without changing the communication structure.
Even though a GPU can execute calls from one context at a time, it can belong to multiple contexts. For example, it is possible for several CPU threads to establish contexts with the same GPU. This allows developing multi-GPU applications on a single GPU. GPU driver manages GPU switching between the contexts, as well as
partitioning memory among the contexts (GPU memory allocated in one context cannot be accessed from another context).
[출처 : CUDA_C_Best_Practices_Guide.pdf / Chapter 8]
|
CUDA Toolkit SDK의 예제는 threadMigration를 참조하면 될 듯
/******************************************************************************
*
* Module: threadMigration.cpp
*
* Description:
* Simple sample demonstrating multi-GPU/multithread functionality using
* the CUDA Context Management API. This API allows the a CUDA context to be
* associated with a CPU process. CUDA Contexts have a one-to-one correspondence
* with host threads. A host thread may have only one device context current
* at a time.
*
* Refer to the CUDA programming guide 4.5.3.3 on Context Management
*
******************************************************************************/
|
MonteCarloMultiGPU 예제에도 cutil 을 이용한 예제가 존재하는 것으로 보인다.
//Start CPU thread for each GPU
for(gpuIndex = 0; gpuIndex < GPU_N; gpuIndex++)
threadID[gpuIndex] = cutStartThread((CUT_THREADROUTINE)solverThread, &optionSolver[gpuIndex]);
printf("main(): waiting for GPU results...\n");
cutWaitForThreads(threadID, GPU_N);
|
cutStartThread는 multithreading.h 에 포함되어 있는 녀석이다.
#if _WIN32
//Create thread
CUTThread cutStartThread(CUT_THREADROUTINE func, void *data){
return CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)func, data, 0, NULL);
}
#endif
|
그리고 이런식으로 해당 OS의 Thread로 연결되어 있다.