'Programming/openCL & CUDA'에 해당되는 글 80건

2011.01.02 CUDA Toolkit 3.2
2011.01.02 deviceQuery on 8600GT 512MB + CUDA 하드웨어 구조
2010.12.07 CUDA on Linux
2010.12.05 CUDA 예제 컴파일시 오류
2010.12.05 CUDA / Visual Studio 2008 2
2010.12.01 CUDA + Visual Studio 2005
2010.11.14 nvcc for windows 제약사항?
2010.11.11 PTX - Parallel Thread Execution
2010.11.06 ATI Stream / OpenCL 을 Nvidia에서 돌려보았더니!
2010.11.04 ATI STREAM - OpenCL 문서들

CUDA Toolkit 3.2

예전에 설치할때는 3.2는 RC(Release Candidate) 여서 3.1을 설치했었는데
지난 2010년 12월 22일에 3.2가 정식으로 나왔다.

결론은 빨라진게 대부분이고, H.264 인코딩/디코딩 라이브러리가 추가 되었다고 한다.

Last Updated: 12 / 22 / 2010

CUDA Toolkit 3.2 (November 2010)

Download Quick Links

[Windows]

[Linux]

[MacOS]

For older releases, see the CUDA Toolkit Release Archive

Release Highlights

New and Improved CUDA Libraries

CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs, now 2x to 10x faster than MKL
New CUSPARSE library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations delivers 5x to 30x faster performance than MKL
New CURAND library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines at 10x to 20x faster than similar routines in MKL
H.264 encode/decode libraries now included in the CUDA Toolkit

[링크 : http://developer.nvidia.com/object/cuda_3_2_downloads.html]

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA training (0)	2011.01.05
Visual Studio 2008 에서 CUDA 프로젝트 만들기 (2)	2011.01.04
deviceQuery on 8600GT 512MB + CUDA 하드웨어 구조 (0)	2011.01.02
CUDA on Linux (0)	2010.12.07
CUDA 예제 컴파일시 오류 (0)	2010.12.05

Posted by 구차니

deviceQuery on 8600GT 512MB + CUDA 하드웨어 구조

deviceQuery 예문은 말그대로 장치에 질의의 던져 어떤 스펙인지
몇개의 thread가 존재하는지를 파악하는 프로그램이다.

아무튼 NVIDIA_CUDA_C_ProgrammingGuide.pdf 문서를 보면 아래의 내용이 나오니 참고를 해서 보자면,

Geforce 8800GT(이하 8800GT)는 14개의 멀티 프로세서를 가졌고 112개의 코어를 지녔다.
Grid는 Block을 포함하고, Block은 Thread를 포함한다.

단순 계산으로는 하나의 프로세서별로 8개의 코어가 존재하며
멀티프로세서는 14개 코어는 총 112개가 존재한다.

그리드(Grid)는 블럭의 2차원 배열로 존재하고,
블럭(Block)은 쓰레드의 2차원 배열로 존재한다.
쓰레드(Thread)는 일을하는 최소단위이다.

한번에 묶이는 최소 쓰레드의 숫자(Warp size)는 32개 이며
하나의 블럭으로 묶을수 있는 최대 쓰레드는 512개 이다.

블럭의 최대 차원은 3차원 512x512x64 이며
그리드의 최대 차원은 2차원 65535x65535x1 이다.

라고 이해하면 되려나?

A multithreaded program is partitioned into blocks of threads that execute independently from each
other, so that a GPU with more cores will automatically execute the program in less time than a GPU
with fewer cores.
=> 멀티쓰레드화된 프로그램은 서로 독립적으로 실행되는 쓰레드의 블럭으로 나누어지며, 그러한 이유로 더욱 많은 코어를 포함하는 CPU는 적은 코어를 지닌 GPU보다 짧은 시간에 프로그램을 실행할 수 있다.

Host는 일반적인 CPU 환경이고, Device는 GPU 환경이다.
컴파일시에 Device 코드만 nvcc에서 처리하고, 나머지는 일반적인 컴파일러에서 처리하는 이원화된 구조이다.

아래는 커널이다. 디바이스 코드를 생성하는 부분이며
MatAdd 함수는 __global__ 선언을 앞에 붙여 device 코드임을 명시해야 한다.

MatAdd<<<numBlocks, threadsPerBlock>>>(A, B, C);

그리고 커널 안에는 int형이나 dim3 형으로 선언이 되어야 한다.

아래는 dim3 형으로 선언된 녀석으로,

dim3 threadsPerBlock(16, 16);
dim3 numBlocks(N / threadsPerBlock.x, N / threadsPerBlock.y);
MatAdd<<<numBlocks, threadsPerBlock>>>(A, B, C);

블럭당 쓰레디의 크기를 16x16 thread로, 그리드를 N/16개로 분할하는 예제이다.

물론 2차원은 1차원과 같이 사용이 가능하므로 int형도 허용하는 듯?

B.15 Execution Configuration
Any call to a __global__ function must specify the execution configuration for that call. The execution configuration defines the dimension of the grid and blocks that will be used to execute the function on the device, as well as the associated stream (see Section 3.3.10.1 for a description of streams).
When using the driver API, the execution configuration is specified through a series of driver function calls as detailed in Section 3.3.3.
When using the runtime API (Section 3.2), the execution configuration is specified by inserting an expression of the form <<< Dg, Db, Ns, S >>> between the function name and the parenthesized argument list, where:
 Dg is of type dim3 (see Section B.3.2) and specifies the dimension and size of the grid,
such that Dg.x * Dg.y equals the number of blocks being launched; Dg.z must be equal to 1;
 Db is of type dim3 (see Section B.3.2) and specifies the dimension and size of each block,
such that Db.x * Db.y * Db.z equals the number of threads per block;
 Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in Section B.2.3; Ns is an optional argument which defaults to 0;
 S is of type cudaStream_t and specifies the associated stream; S is an optional argument which defaults to 0.
As an example, a function declared as
__global__ void Func(float* parameter);
must be called like this:
Func<<< Dg, Db, Ns >>>(parameter);
The arguments to the execution configuration are evaluated before the actual function arguments and like the function arguments, are currently passed via shared memory to the device. The function call will fail if Dg or Db are greater than the maximum sizes allowed for the device as specified in Appendix G, or if Ns is greater than the maximum
amount of shared memory available on the device, minus the amount of shared memory required for static allocation, functions arguments (for devices of compute capability 1.x), and execution configuration.

아무튼 원래는 아래의 예제 내용을 분석하기 위한 내용인데
점점 미궁으로 빠지는 느낌 -_-

CUDA deviceQuery

D:\CUDA\NVIDIA GPU Computing SDK\C\bin\win32\Release\deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "GeForce 8800 GT"
CUDA Driver Version:                           3.20
CUDA Runtime Version:                          3.10
CUDA Capability Major revision number:         1
CUDA Capability Minor revision number:         1
Total amount of global memory:                 536543232 bytes
Number of multiprocessors:                     14
Number of cores:                               112
Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       16384 bytes
Total number of registers available per block: 8192
Warp size:                                     32
Maximum number of threads per block:           512
Maximum sizes of each dimension of a block:    512 x 512 x 64
Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
Maximum memory pitch:                          2147483647 bytes
Texture alignment:                             256 bytes
Clock rate:                                    1.50 GHz
Concurrent copy and execution:                 Yes
Run time limit on kernels:                     Yes
Integrated:                                    No
Support host page-locked memory mapping:       Yes
Compute mode:                                  Default (multiple host threads
can use this device simultaneously)
Concurrent kernel execution:                   No
Device has ECC support enabled:                No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Vers
ion = 3.10, NumDevs = 1, Device = GeForce 8800 GT

PASSED

Press <Enter> to Quit...
-----------------------------------------------------------

OpenCL DeviceQuery

D:\CUDA\NVIDIA GPU Computing SDK\OpenCL\bin\Win32\Release>oclDeviceQuery.exe
oclDeviceQuery.exe Starting...

OpenCL SW Info:

CL_PLATFORM_NAME:      NVIDIA CUDA
CL_PLATFORM_VERSION:   OpenCL 1.0 CUDA 3.2.1
OpenCL SDK Revision:   6161726

OpenCL Device Info:

1 devices found supporting OpenCL:

---------------------------------
Device GeForce 8800 GT
---------------------------------
CL_DEVICE_NAME:                       GeForce 8800 GT
CL_DEVICE_VENDOR:                     NVIDIA Corporation
CL_DRIVER_VERSION:                    260.99
CL_DEVICE_TYPE:                       CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS:          14
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
CL_DEVICE_MAX_WORK_ITEM_SIZES:        512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE:        512
CL_DEVICE_MAX_CLOCK_FREQUENCY:        1500 MHz
CL_DEVICE_ADDRESS_BITS:               32
CL_DEVICE_MAX_MEM_ALLOC_SIZE:         128 MByte
CL_DEVICE_GLOBAL_MEM_SIZE:            511 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
CL_DEVICE_LOCAL_MEM_TYPE:             local
CL_DEVICE_LOCAL_MEM_SIZE:             16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT:              1
CL_DEVICE_MAX_READ_IMAGE_ARGS:        128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS:       8
CL_DEVICE_SINGLE_FP_CONFIG:           INF-quietNaNs round-to-nearest round-to-
zero round-to-inf fma

CL_DEVICE_IMAGE <dim>                 2D_MAX_WIDTH     4096
                                        2D_MAX_HEIGHT    32768
                                        3D_MAX_WIDTH     2048
                                        3D_MAX_HEIGHT    2048
                                        3D_MAX_DEPTH     2048

CL_DEVICE_EXTENSIONS:                 cl_khr_byte_addressable_store
                                        cl_khr_icd
                                        cl_khr_gl_sharing
                                        cl_nv_d3d9_sharing
                                        cl_nv_compiler_options
                                        cl_nv_device_attribute_query
                                        cl_nv_pragma_unroll
                                        cl_khr_global_int32_base_atomics
                                        cl_khr_global_int32_extended_atomics

CL_DEVICE_COMPUTE_CAPABILITY_NV:      1.1
NUMBER OF MULTIPROCESSORS:            14
NUMBER OF CUDA CORES:                 112
CL_DEVICE_REGISTERS_PER_BLOCK_NV:     8192
CL_DEVICE_WARP_SIZE_NV:               32
CL_DEVICE_GPU_OVERLAP_NV:             CL_TRUE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:     CL_TRUE
CL_DEVICE_INTEGRATED_MEMORY_NV:       CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1,
DOUBLE 0

---------------------------------
2D Image Formats Supported (71)
---------------------------------
#     Channel Order   Channel Type

1     CL_R            CL_FLOAT
2     CL_R            CL_HALF_FLOAT
3     CL_R            CL_UNORM_INT8
4     CL_R            CL_UNORM_INT16
5     CL_R            CL_SNORM_INT16
6     CL_R            CL_SIGNED_INT8
7     CL_R            CL_SIGNED_INT16
8     CL_R            CL_SIGNED_INT32
9     CL_R            CL_UNSIGNED_INT8
10    CL_R            CL_UNSIGNED_INT16
11    CL_R            CL_UNSIGNED_INT32
12    CL_A            CL_FLOAT
13    CL_A            CL_HALF_FLOAT
14    CL_A            CL_UNORM_INT8
15    CL_A            CL_UNORM_INT16
16    CL_A            CL_SNORM_INT16
17    CL_A            CL_SIGNED_INT8
18    CL_A            CL_SIGNED_INT16
19    CL_A            CL_SIGNED_INT32
20    CL_A            CL_UNSIGNED_INT8
21    CL_A            CL_UNSIGNED_INT16
22    CL_A            CL_UNSIGNED_INT32
23    CL_RG           CL_FLOAT
24    CL_RG           CL_HALF_FLOAT
25    CL_RG           CL_UNORM_INT8
26    CL_RG           CL_UNORM_INT16
27    CL_RG           CL_SNORM_INT16
28    CL_RG           CL_SIGNED_INT8
29    CL_RG           CL_SIGNED_INT16
30    CL_RG           CL_SIGNED_INT32
31    CL_RG           CL_UNSIGNED_INT8
32    CL_RG           CL_UNSIGNED_INT16
33    CL_RG           CL_UNSIGNED_INT32
34    CL_RA           CL_FLOAT
35    CL_RA           CL_HALF_FLOAT
36    CL_RA           CL_UNORM_INT8
37    CL_RA           CL_UNORM_INT16
38    CL_RA           CL_SNORM_INT16
39    CL_RA           CL_SIGNED_INT8
40    CL_RA           CL_SIGNED_INT16
41    CL_RA           CL_SIGNED_INT32
42    CL_RA           CL_UNSIGNED_INT8
43    CL_RA           CL_UNSIGNED_INT16
44    CL_RA           CL_UNSIGNED_INT32
45    CL_RGBA         CL_FLOAT
46    CL_RGBA         CL_HALF_FLOAT
47    CL_RGBA         CL_UNORM_INT8
48    CL_RGBA         CL_UNORM_INT16
49    CL_RGBA         CL_SNORM_INT16
50    CL_RGBA         CL_SIGNED_INT8
51    CL_RGBA         CL_SIGNED_INT16
52    CL_RGBA         CL_SIGNED_INT32
53    CL_RGBA         CL_UNSIGNED_INT8
54    CL_RGBA         CL_UNSIGNED_INT16
55    CL_RGBA         CL_UNSIGNED_INT32
56    CL_BGRA         CL_UNORM_INT8
57    CL_BGRA         CL_SIGNED_INT8
58    CL_BGRA         CL_UNSIGNED_INT8
59    CL_ARGB         CL_UNORM_INT8
60    CL_ARGB         CL_SIGNED_INT8
61    CL_ARGB         CL_UNSIGNED_INT8
62    CL_INTENSITY    CL_FLOAT
63    CL_INTENSITY    CL_HALF_FLOAT
64    CL_INTENSITY    CL_UNORM_INT8
65    CL_INTENSITY    CL_UNORM_INT16
66    CL_INTENSITY    CL_SNORM_INT16
67    CL_LUMINANCE    CL_FLOAT
68    CL_LUMINANCE    CL_HALF_FLOAT
69    CL_LUMINANCE    CL_UNORM_INT8
70    CL_LUMINANCE    CL_UNORM_INT16
71    CL_LUMINANCE    CL_SNORM_INT16

---------------------------------
3D Image Formats Supported (71)
---------------------------------
#     Channel Order   Channel Type

1     CL_R            CL_FLOAT
2     CL_R            CL_HALF_FLOAT
3     CL_R            CL_UNORM_INT8
4     CL_R            CL_UNORM_INT16
5     CL_R            CL_SNORM_INT16
6     CL_R            CL_SIGNED_INT8
7     CL_R            CL_SIGNED_INT16
8     CL_R            CL_SIGNED_INT32
9     CL_R            CL_UNSIGNED_INT8
10    CL_R            CL_UNSIGNED_INT16
11    CL_R            CL_UNSIGNED_INT32
12    CL_A            CL_FLOAT
13    CL_A            CL_HALF_FLOAT
14    CL_A            CL_UNORM_INT8
15    CL_A            CL_UNORM_INT16
16    CL_A            CL_SNORM_INT16
17    CL_A            CL_SIGNED_INT8
18    CL_A            CL_SIGNED_INT16
19    CL_A            CL_SIGNED_INT32
20    CL_A            CL_UNSIGNED_INT8
21    CL_A            CL_UNSIGNED_INT16
22    CL_A            CL_UNSIGNED_INT32
23    CL_RG           CL_FLOAT
24    CL_RG           CL_HALF_FLOAT
25    CL_RG           CL_UNORM_INT8
26    CL_RG           CL_UNORM_INT16
27    CL_RG           CL_SNORM_INT16
28    CL_RG           CL_SIGNED_INT8
29    CL_RG           CL_SIGNED_INT16
30    CL_RG           CL_SIGNED_INT32
31    CL_RG           CL_UNSIGNED_INT8
32    CL_RG           CL_UNSIGNED_INT16
33    CL_RG           CL_UNSIGNED_INT32
34    CL_RA           CL_FLOAT
35    CL_RA           CL_HALF_FLOAT
36    CL_RA           CL_UNORM_INT8
37    CL_RA           CL_UNORM_INT16
38    CL_RA           CL_SNORM_INT16
39    CL_RA           CL_SIGNED_INT8
40    CL_RA           CL_SIGNED_INT16
41    CL_RA           CL_SIGNED_INT32
42    CL_RA           CL_UNSIGNED_INT8
43    CL_RA           CL_UNSIGNED_INT16
44    CL_RA           CL_UNSIGNED_INT32
45    CL_RGBA         CL_FLOAT
46    CL_RGBA         CL_HALF_FLOAT
47    CL_RGBA         CL_UNORM_INT8
48    CL_RGBA         CL_UNORM_INT16
49    CL_RGBA         CL_SNORM_INT16
50    CL_RGBA         CL_SIGNED_INT8
51    CL_RGBA         CL_SIGNED_INT16
52    CL_RGBA         CL_SIGNED_INT32
53    CL_RGBA         CL_UNSIGNED_INT8
54    CL_RGBA         CL_UNSIGNED_INT16
55    CL_RGBA         CL_UNSIGNED_INT32
56    CL_BGRA         CL_UNORM_INT8
57    CL_BGRA         CL_SIGNED_INT8
58    CL_BGRA         CL_UNSIGNED_INT8
59    CL_ARGB         CL_UNORM_INT8
60    CL_ARGB         CL_SIGNED_INT8
61    CL_ARGB         CL_UNSIGNED_INT8
62    CL_INTENSITY    CL_FLOAT
63    CL_INTENSITY    CL_HALF_FLOAT
64    CL_INTENSITY    CL_UNORM_INT8
65    CL_INTENSITY    CL_UNORM_INT16
66    CL_INTENSITY    CL_SNORM_INT16
67    CL_LUMINANCE    CL_FLOAT
68    CL_LUMINANCE    CL_HALF_FLOAT
69    CL_LUMINANCE    CL_UNORM_INT8
70    CL_LUMINANCE    CL_UNORM_INT16
71    CL_LUMINANCE    CL_SNORM_INT16

oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA
3.2.1, SDK Revision = 6161726, NumDevs = 1, Device = GeForce 8800 GT

System Info:

Local Time/Date = 20:52:54, 1/2/2011
CPU Arch: 0
CPU Level: 15
# of CPU processors: 2
Windows Build: 2600
Windows Ver: 5.1

PASSED

Press <Enter> to Quit...
-----------------------------------------------------------

'Programming > openCL & CUDA' 카테고리의 다른 글

Visual Studio 2008 에서 CUDA 프로젝트 만들기 (2)	2011.01.04
CUDA Toolkit 3.2 (0)	2011.01.02
CUDA on Linux (0)	2010.12.07
CUDA 예제 컴파일시 오류 (0)	2010.12.05
CUDA / Visual Studio 2008 (2)	2010.12.05

Posted by 구차니

CUDA on Linux

헉헉 힘들게도 컴파일 했다 -_-

이녀석을 컴파일 하려면 험난한 과정을 거쳐야 한다 -_-
ldconfig는 libglut3를 설치한다면 아마도 생략가능할 듯?
(trigger로 ldconfig를 수행한다)

$ sudo vi /etc/ld.so.conf/libcuda.conf
/usr/local/cuda/lib

$ sudo ldconfig
$ sudo apt-get install libglut3

$ sudo ln -s /usr/lib/libglut.so.3 /usr/lib/libglut.so
$ sudo ln -s /usr/lib/libGLU.so.1 /usr/lib/libGLU.so
$ sudo ln -s /usr/lib/libX11.so.6 /usr/lib/libX11.so
$ sudo ln -s /usr/lib/libXi.so.6 /usr/lib/libXi.so
$ sudo ln -s /usr/lib/libXmu.so.6 /usr/lib/libXmu.so

대부분이 so 의 버전에 대한 심볼릭 링크 문제였다.

2010/12/05 - [Programming/CUDA / openCL] - CUDA 예제 컴파일시 오류
2010/11/02 - [Programming/CUDA / openCL] - CUDA 예제파일 실행결과 + SLI

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA Toolkit 3.2 (0)	2011.01.02
deviceQuery on 8600GT 512MB + CUDA 하드웨어 구조 (0)	2011.01.02
CUDA 예제 컴파일시 오류 (0)	2010.12.05
CUDA / Visual Studio 2008 (2)	2010.12.05
CUDA + Visual Studio 2005 (0)	2010.12.01

Posted by 구차니

CUDA 예제 컴파일시 오류

Linux에서 CUDA를 설치하고 예제를 컴파일 하려고 하니 다음과 같은 오류가 난다.

~/NVIDIA_GPU_Computing_SDK/C/src/deviceQuery$ make

/usr/bin/ld: cannot find -lcutil_i386

collect2: ld returned 1 exit status

make: *** [../../bin/linux/release/deviceQuery] 오류 1

경로설정이 잘못되었나 했는데, gcc 호환성으로 인해 구버전을 쓰라던 이야기가 떠오르게 하는 아래의 내용 -_-

NVIDIA Cuda ¶

Before running the Makefile, you will need to install gcc 4.3 and g++ 4.3. This is because the NVIDIA Cuda SDK 3.0 has not yet worked with gcc 4.0 and g++ 4.0. There should be no issue compiling cuda files with gcc 4.3 and g++ 4.3 on newer NVIDIA Cuda SDK versions. For a successful compilation, please follow these steps:

...

3) Create a directory and create symlinks to gcc-4.3/g++-4.3

$ mkdir mygcc
$ cd mygcc
$ ln -s $(which g++-4.3) g++
$ ln -s $(which gcc-4.3) gcc

[링크 : http://boinc.berkeley.edu/trac/wiki/GPUApp]

$ gcc --version
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ g++ --version
g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ whereis g++
whwg++: /usr/bin/g++ /usr/share/man/man1/g++.1.gz

$ whereis gcc
gcc: /usr/bin/gcc /usr/lib/gcc /usr/share/man/man1/gcc.1.gz

없는건 아닌데 왜 안될까..

일단.. 어거지로
~/NVIDIA_GPU_Computing_SDK/C 에서
make를 하니 어느정도 컴파일을 하는데 GLU 어쩌구 하면서 중단 OTL

libcudart.so.3: cannot open shared object file: No such file or directory

요런 에러가 발생하면
단순하게 LD_LIBRARY_PATH를 정해주고, sudo ldconfig 를 해서는 해결이 되지 않았다.
/etc/ld.so.conf.d/ 에 libcuda.conf를 만들고 cuda 설치 경로인
/usr/local/cuda/lib
를 넣어주고 나서 sudo ldconfig를 해야 제대로 설정이 되었다.

2010.12.07 추가
아래의 경로에서 libcutil_i386.a 발견! (별 의미는 없음)

~/NVIDIA_GPU_Computing_SDK/C/lib$ ll
합계 224
drwxr-xr-x 2 minimonk minimonk 4096 2010-12-05 23:58 ./
drwxr-xr-x 9 minimonk minimonk 4096 2010-12-05 23:58 ../
-rw-r--r-- 1 minimonk minimonk 142978 2010-12-05 23:58 libcutil_i386.a
-rw-r--r-- 1 minimonk minimonk 30512 2010-12-05 23:58 libparamgl_i386.a
-rw-r--r-- 1 minimonk minimonk 43034 2010-12-05 23:58 librendercheckgl_i386.a

~/NVIDIA_GPU_Computing_SDK/C/src/marchingCubes$ make
/usr/bin/ld: cannot find -lGLU
collect2: ld returned 1 exit status
make: *** [../../bin/linux/release/marchingCubes] 오류 1

$ sudo find / -name "*GLU*"
/usr/lib/libGLU.so.1
/usr/lib/libGLU.so.1.3.070701

$ sudo ln -s /usr/lib/libGLU.so.1 /usr/lib/libGLU.so

~/NVIDIA_GPU_Computing_SDK/C/src/marchingCubes$ make
/usr/bin/ld: cannot find -lX11
collect2: ld returned 1 exit status
make[1]: *** [../../bin/linux/release/marchingCubes] 오류 1

$ sudo ln -s /usr/lib/libX11.so.6 /usr/lib/libX11.so
$ sudo ln -s /usr/lib/libXi.so.6 /usr/lib/libXi.so
$ sudo ln -s /usr/lib/libXmu.so.6 /usr/lib/libXmu.so

/usr/bin/ld: cannot find -lglut
collect2: ld returned 1 exit status
make[1]: *** [../../bin/linux/release/marchingCubes] 오류 1

$ sudo apt-get install libglut3
$ sudo ln -s /usr/lib/libglut.so.3 /usr/lib/libglut.so

우분투가 웬수인가.. 어느넘이 웬수인가?

'Programming > openCL & CUDA' 카테고리의 다른 글

deviceQuery on 8600GT 512MB + CUDA 하드웨어 구조 (0)	2011.01.02
CUDA on Linux (0)	2010.12.07
CUDA / Visual Studio 2008 (2)	2010.12.05
CUDA + Visual Studio 2005 (0)	2010.12.01
nvcc for windows 제약사항? (0)	2010.11.14

Posted by 구차니

CUDA / Visual Studio 2008

Visual Studio 2008을 설치하고
Windows에서 CUDA예제를 컴파일했는데 실행파일이 없다!!!!

그래서 속성페이지를 뒤지는.. VC6만 쓰다보니 이거 도무지 어디 짱박힌지 알수가 없네?
아무튼, "구성 속성 - 링커" 에서 출력 파일이라는 이름으로 되어있다.
(젠장 영문버전을 써야하나? 한글 익숙하지 않아 ㅠ.ㅠ)

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA on Linux (0)	2010.12.07
CUDA 예제 컴파일시 오류 (0)	2010.12.05
CUDA + Visual Studio 2005 (0)	2010.12.01
nvcc for windows 제약사항? (0)	2010.11.14
PTX - Parallel Thread Execution (0)	2010.11.11

Posted by 구차니

CUDA + Visual Studio 2005

집에 있는 어둠의 Visual Studio 버전이 2003 뿐이길래 일단 설치

하지만, 아래와 같은 에러가 나면서 실행이 안된다.

위키를 검색해보니

Version history

Prior Visual Studio Version 4.0 there were Visual Basic 3, Visual C++, Visual FoxPro and Source Safe as separate products.

Product name	Internal version	.NET Framework version	Release date
Visual Studio	4.0	N/A	Spring 1995
Visual Studio 97	5.0	N/A	1997
Visual Studio 6.0	6.0	N/A	1998-06
Visual Studio .NET (2002)	7.0	1.0	2002-02-13
Visual Studio .NET 2003	7.1	1.1	2003-04-24
Visual Studio 2005	8.0	2.0	2005-11-07
Visual Studio 2008	9.0	3.5	2007-11-19
Visual Studio 2010	10.0	4.0	2010-04-12

[링크 : http://en.wikipedia.org/wiki/Microsoft_Visual_Studio]

Visual Studio 2003의 내부 버전은 7.1 이고

Visual Studio 2005가 내부 버전 8.0 이다.

간단하게 말하자면 비싼 시간을 들여...

CUDA는 Visual Studio 2005 이상을 설치해야 한다는 사실을 깨달은 하루 -_-

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA 예제 컴파일시 오류 (0)	2010.12.05
CUDA / Visual Studio 2008 (2)	2010.12.05
nvcc for windows 제약사항? (0)	2010.11.14
PTX - Parallel Thread Execution (0)	2010.11.11
ATI Stream / OpenCL 을 Nvidia에서 돌려보았더니! (0)	2010.11.06

Posted by 구차니

nvcc for windows 제약사항?

NVCC는 NV(Nvidia)CC(C Compiler) 인데, 구조적으로 아래와 같은 컴파일 과정을 거친다.
호스트 코드는 일반적인 C 컴파일러(예를 들면 비쥬얼 스튜디오 커맨드 라인이나 gcc)로 컴파일을 떠넘기고
nvcc는 머신코드(CUDA device용 PTX)를 생성한다.

즉, 어떠한 코드를 컴파일 하는데 있어 nvcc만으로는 독립적으로 컴파일이 진행될수 없다.
그런 이유로 윈도우에서는 Visual Studio에 빌붙고, 리눅스에서는 gcc에 빌붙는다.

nvcc의 목적에 나온 내용으로, CUDA가 아닌 내용은 범용 C 컴파일러로 투척(forward)한다고 되어있고
윈도우에서는 MS Visual Studio의 cl을 실행(instance)하여 사용한다고 되어있다.

Purpose of nvcc

This compilation trajectory involves several splitting, compilation, preprocessing,
and merging steps for each CUDA source file, and several of these steps are subtly
different for different modes of CUDA compilation (such as compilation for device
emulation, or the generation of device code repositories). It is the purpose of the
CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from
developers. Additionally, instead of being a specific CUDA compilation driver,
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range of
conventional compiler options, such as for defining macros and include/library
paths, and for steering the compilation process. All non-CUDA compilation steps
are forwarded to a general purpose C compiler that is supported by nvcc, and on
Windos platforms, where this compiler is an instance of the Microsoft Visual Studio
compiler, nvcc will translate its options into appropriate ‘cl’ command syntax. This
extended behavior plus ‘cl’ option translation is intended for support of portable
application build and make scripts across Linux and Windows platforms.

그리고 내 컴퓨터에는 일단..
Visual Studio 6.0이 설치되어 있고, 개인적인 .net 거부반응으로 인해 2002나 2008 이런 녀석들은 설치되어 있지 않다.

아무튼, host compiler에서 Windows platform은
           "Microsoft Visual Studio compiler, cl" 이라고 되어 있는디..
           VS2002 부터 지원하는지는 모르겠지만 아무튼, cl은 command line이라고
           clcc.exe 같은 녀석으로 지원하는 커맨드 라인 MSVS 컴파일러 이다.
           혹시나 openCL인줄 알았더니 그것도 아니네 -_-

그리고 Supported build enviroment 에서는 Windows + MinGW shell이 존재한다.
gcc가 아니다 shell 이다 -_- 즉, 죽어도 컴파일러는 Visual Studio를 설치할 수 밖에 없다(윈도우에서는)

아래 13페이지와

14페이지의 내용을 둘러보고

옵션들을 조정해 보아도, VS가 없으면 안된다.(VS6.0도 안된다)

[링크 : http://moss.csc.ncsu.edu/~mueller/cluster/nvidia/2.0/nvcc_2.0.pdf]

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA / Visual Studio 2008 (2)	2010.12.05
CUDA + Visual Studio 2005 (0)	2010.12.01
PTX - Parallel Thread Execution (0)	2010.11.11
ATI Stream / OpenCL 을 Nvidia에서 돌려보았더니! (0)	2010.11.06
ATI STREAM - OpenCL 문서들 (0)	2010.11.04

Posted by 구차니

PTX - Parallel Thread Execution

CUDA 문서를 읽다가 PTX라는 말이 자주 언급되는데
문제는 이 용어에 대한 내용은 영 다른 문서(PTXISA.pdf)에 짱박혀 있다는 것 -_-

아무튼 부랴부랴 검색을 해보니
일종의 CUDA 장치용 어셈블리 언어의 개념이고
대량의 레지스터를(아무래도 쓰레드별로 존재할테니) 매핑하기 위한
pseudo register 라는 개념을 사용하과 있다고 한다.

확장자이기도 하고, 언어이기도 하고..

굳이 비유하자면
*.c를 어셈블 해서 *.S가 나오고 그걸 컴파일 해서 *.o 가 나오듯
*.cu를 어셈블 해서 호스트 코드는 *.S로 Cuda 코드는 *.ptx로 나오는 식일려나?

[링크 : http://en.wikipedia.org/wiki/Parallel_Thread_Execution]

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA + Visual Studio 2005 (0)	2010.12.01
nvcc for windows 제약사항? (0)	2010.11.14
ATI Stream / OpenCL 을 Nvidia에서 돌려보았더니! (0)	2010.11.06
ATI STREAM - OpenCL 문서들 (0)	2010.11.04
ATI Stream SDK (0)	2010.11.03

Posted by 구차니

ATI Stream / OpenCL 을 Nvidia에서 돌려보았더니!

소쿨한 에러

C:\> ConstantBandwidth.exe
Error: clCreateContextFromType failed. Error code : CL_DEVICE_NOT_FOUND

컴파일러가 ATI 쪽만 인식하도록 되어있는지, nVidia의 GPU를 제대로 활용하지는 못한다.

아니면 예제 프로그램들이 openCL을 이용하기는 하지만, nVidia의 openCL과는 달라서 그럴려나?

Number of platforms:                1
Platform Profile:                FULL_PROFILE
Platform Version:                OpenCL 1.1 ATI-Stream-v2.2 (302)
Platform Name:                    ATI Stream
Platform Vendor:                Advanced Micro Devices, Inc.
Platform Extensions:            cl_khr_icd cl_amd_event_callback cl_khr_d3d10_sharing

Platform Name:                    ATI Stream
Number of devices:                2
Device Type:                    CL_DEVICE_TYPE_CPU
Device ID:                    4098
Max compute units:                4
Max work items dimensions:            3
    Max work items[0]:                1024
    Max work items[1]:                1024
    Max work items[2]:                1024
Max work group size:                1024
Preferred vector width char:            16
Preferred vector width short:            8
Preferred vector width int:            4
Preferred vector width long:            2
Preferred vector width float:            4
Preferred vector width double:        0
Max clock frequency:                2393Mhz
Address bits:                    32
Max memory allocation:            536870912
Image support:                No
Max size of kernel argument:            4096
Alignment (bits) of base address:        1024
Minimum alignment (bytes) for any datatype:    128
Single precision floating point capability
    Denorms:                    Yes
    Quiet NaNs:                    Yes
    Round to nearest even:            Yes
    Round to zero:                Yes
    Round to +ve and infinity:            Yes
    IEEE754-2008 fused multiply-add:        No
Cache type:                    Read/Write
Cache line size:                64
Cache size:                    32768
Global memory size:                1073741824
Constant buffer size:                65536
Max number of constant args:            8
Local memory type:                Global
Local memory size:                32768
Profiling timer resolution:            427
Device endianess:                Little
Available:                    Yes
Compiler available:                Yes
Execution capabilities:
    Execute OpenCL kernels:            Yes
    Execute native function:            Yes
Queue properties:
    Out-of-Order:                No
    Profiling :                    Yes
Platform ID:                    00C3D40C
Name:                        Intel(R) Core(TM) i5 CPU       M 450 @ 2.40GHz
Vendor:                    GenuineIntel
Driver version:                2.0
Profile:                    FULL_PROFILE
Version:                    OpenCL 1.1 ATI-Stream-v2.2 (302)
Extensions:                    cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf cl_khr_d3d10_sharing
Device Type:                    CL_DEVICE_TYPE_GPU
Device ID:                    4098
Max compute units:                2
Max work items dimensions:            3
    Max work items[0]:                128
    Max work items[1]:                128
    Max work items[2]:                128
Max work group size:                128
Preferred vector width char:            16
Preferred vector width short:            8
Preferred vector width int:            4
Preferred vector width long:            2
Preferred vector width float:            4
Preferred vector width double:        0
Max clock frequency:                720Mhz
Address bits:                    32
Max memory allocation:            134217728
Image support:                No
Max size of kernel argument:            1024
Alignment (bits) of base address:        32768
Minimum alignment (bytes) for any datatype:    128
Single precision floating point capability
    Denorms:                    No
    Quiet NaNs:                    Yes
    Round to nearest even:            Yes
    Round to zero:                Yes
    Round to +ve and infinity:            Yes
    IEEE754-2008 fused multiply-add:        Yes
Cache type:                    None
Cache line size:                0
Cache size:                    0
Global memory size:                268435456
Constant buffer size:                65536
Max number of constant args:            8
Local memory type:                Global
Local memory size:                16384
Profiling timer resolution:            1
Device endianess:                Little
Available:                    Yes
Compiler available:                Yes
Execution capabilities:
    Execute OpenCL kernels:            Yes
    Execute native function:            No
Queue properties:
    Out-of-Order:                No
    Profiling :                    Yes
Platform ID:                    00C3D40C
Name:                        ATI RV710
Vendor:                    Advanced Micro Devices, Inc.
Driver version:                CAL 1.4.838
Profile:                    FULL_PROFILE
Version:                    OpenCL 1.0 ATI-Stream-v2.2 (302)
Extensions:                    cl_khr_icd cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing

Passed!

Number of platforms:                2
Platform Profile:                FULL_PROFILE
Platform Version:                OpenCL 1.0 CUDA 3.2.1
Platform Name:                    NVIDIA CUDA
Platform Vendor:                NVIDIA Corporation
Platform Extensions:            cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
Platform Profile:                FULL_PROFILE
Platform Version:                OpenCL 1.1 ATI-Stream-v2.2 (302)
Platform Name:                    ATI Stream
Platform Vendor:                Advanced Micro Devices, Inc.
Platform Extensions:            cl_khr_icd cl_amd_event_callback

Platform Name:                    NVIDIA CUDA
Number of devices:                2
Device Type:                    CL_DEVICE_TYPE_GPU
Device ID:                    4318
Max compute units:                4
Max work items dimensions:            3
    Max work items[0]:                512
    Max work items[1]:                512
    Max work items[2]:                64
Max work group size:                512
Preferred vector width char:            1
Preferred vector width short:            1
Preferred vector width int:            1
Preferred vector width long:            1
Preferred vector width float:            1
Preferred vector width double:        0
Max clock frequency:                1350Mhz
Address bits:                    5347096844566560
Max memory allocation:            134217728
Image support:                Yes
Max number of images read arguments:    128
Max number of images write arguments:    8
Max image 2D width:            4096
Max image 2D height:            32768
Max image 3D width:            2048
Max image 3D height:    2048
Max image 3D depth:            2048
Max samplers within kernel:        16
Max size of kernel argument:            4352
Alignment (bits) of base address:        2048
Minimum alignment (bytes) for any datatype:    128
Single precision floating point capability
    Denorms:                    No
    Quiet NaNs:                    Yes
    Round to nearest even:            Yes
    Round to zero:                Yes
    Round to +ve and infinity:            Yes
    IEEE754-2008 fused multiply-add:        Yes
Cache type:                    None
Cache line size:                0
Cache size:                    0
Global memory size:                268107776
Constant buffer size:                65536
Max number of constant args:            9
Local memory type:                Scratchpad
Local memory size:                16384
Profiling timer resolution:            1000
Device endianess:                Little
Available:                    Yes
Compiler available:                Yes
Execution capabilities:
    Execute OpenCL kernels:            Yes
    Execute native function:            No
Queue properties:
    Out-of-Order:                Yes
    Profiling :                    Yes
Platform ID:                    003E8750
Name:                        GeForce 8600 GT
Vendor:                    NVIDIA Corporation
Driver version:                260.99
Profile:                    FULL_PROFILE
Version:                    OpenCL 1.0 CUDA
Extensions:                    cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
Device Type:                    CL_DEVICE_TYPE_GPU
Device ID:                    4318
Max compute units:                4
Max work items dimensions:            3
    Max work items[0]:                512
    Max work items[1]:                512
    Max work items[2]:                64
Max work group size:                512
Preferred vector width char:            1
Preferred vector width short:            1
Preferred vector width int:            1
Preferred vector width long:            1
Preferred vector width float:            1
Preferred vector width double:        0
Max clock frequency:                1188Mhz
Address bits:                    5347096844566560
Max memory allocation:            134217728
Image support:                Yes
Max number of images read arguments:    128
Max number of images write arguments:    8
Max image 2D width:            4096
Max image 2D height:            32768
Max image 3D width:            2048
Max image 3D height:    2048
Max image 3D depth:            2048
Max samplers within kernel:        16
Max size of kernel argument:            4352
Alignment (bits) of base address:        2048
Minimum alignment (bytes) for any datatype:    128
Single precision floating point capability
    Denorms:                    No
    Quiet NaNs:                    Yes
    Round to nearest even:            Yes
    Round to zero:                Yes
    Round to +ve and infinity:            Yes
    IEEE754-2008 fused multiply-add:        Yes
Cache type:                    None
Cache line size:                0
Cache size:                    0
Global memory size:                268107776
Constant buffer size:                65536
Max number of constant args:            9
Local memory type:                Scratchpad
Local memory size:                16384
Profiling timer resolution:            1000
Device endianess:                Little
Available:                    Yes
Compiler available:                Yes
Execution capabilities:
    Execute OpenCL kernels:            Yes
    Execute native function:            No
Queue properties:
    Out-of-Order:                Yes
    Profiling :                    Yes
Platform ID:                    003E8750
Name:                        GeForce 8600 GT
Vendor:                    NVIDIA Corporation
Driver version:                260.99
Profile:                    FULL_PROFILE
Version:                    OpenCL 1.0 CUDA
Extensions:                    cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics

Error : Bytes mismatch!
Error : glSharing mismatch!
Error : images mismatch!
Error : printf mismatch!
Error : deviceAttributeQuery mismatch!
Failed!
Platform Name:                    ATI Stream
Number of devices:                1
Device Type:                    CL_DEVICE_TYPE_CPU
Device ID:                    4098
Max compute units:                2
Max work items dimensions:            3
    Max work items[0]:                1024
    Max work items[1]:                1024
    Max work items[2]:                1024
Max work group size:                1024
Preferred vector width char:            16
Preferred vector width short:            8
Preferred vector width int:            4
Preferred vector width long:            2
Preferred vector width float:            4
Preferred vector width double:        0
Max clock frequency:                2211Mhz
Address bits:                    32
Max memory allocation:            536870912
Image support:                No
Max size of kernel argument:            4096
Alignment (bits) of base address:        1024
Minimum alignment (bytes) for any datatype:    128
Single precision floating point capability
    Denorms:                    Yes
    Quiet NaNs:                    Yes
    Round to nearest even:            Yes
    Round to zero:                Yes
    Round to +ve and infinity:            Yes
    IEEE754-2008 fused multiply-add:        No
Cache type:                    Read/Write
Cache line size:                64
Cache size:                    65536
Global memory size:                1073741824
Constant buffer size:                65536
Max number of constant args:            8
Local memory type:                Global
Local memory size:                32768
Profiling timer resolution:            279
Device endianess:                Little
Available:                    Yes
Compiler available:                Yes
Execution capabilities:
    Execute OpenCL kernels:            Yes
    Execute native function:            Yes
Queue properties:
    Out-of-Order:                No
    Profiling :                    Yes
Platform ID:                    01DFD40C
Name:                        AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
Vendor:                    AuthenticAMD
Driver version:                2.0
Profile:                    FULL_PROFILE
Version:                    OpenCL 1.1 ATI-Stream-v2.2 (302)
Extensions:                    cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf

Error : Bytes mismatch!
Error : glSharing mismatch!
Error : images mismatch!
Error : printf mismatch!
Error : deviceAttributeQuery mismatch!
Failed!

'Programming > openCL & CUDA' 카테고리의 다른 글

nvcc for windows 제약사항? (0)	2010.11.14
PTX - Parallel Thread Execution (0)	2010.11.11
ATI STREAM - OpenCL 문서들 (0)	2010.11.04
ATI Stream SDK (0)	2010.11.03
GPU Gems 3 (2)	2010.11.02

Posted by 구차니

ATI STREAM - OpenCL 문서들

SDK 설치시에 기본적으로 포함되는 문서(오프라인 PDF 파일)는

ATI_Stream_SDK_FAQ.pdf

ATI_Stream_SDK_Getting_Started_Guide_v2.2.pdf

두가지 뿐이다.

나머지는 전부 웹에서 받아야 하는 상황 -_-

ATI Stream SDK Installation Notes (v2.2) [PDF 60.1KB]
ATI Stream SDK Developer Release Notes (v2.2) [PDF 64.3KB]
ATI Stream SDK Samples Release Notes (v2.2) [PDF 42.4KB]
ATI Stream SDK Frequently Asked Questions (FAQ) (v2.2) [PDF 42.7KB]
ATI Stream SDK Getting Started Guide (v2.2) [PDF 58.2KB]
ATI Stream SDK OpenCL™ Programming Guide (v1.05) [PDF 1.41MB]
OpenCL™ 1.1 Specification (revision 33) [PDF 3.34MB]
OpenCL™ 1.0 Specification (revision 48) [PDF 2.47MB]
ATI Compute Abstraction Layer (CAL) Programming Guide (v2.0) [PDF 1.12MB]
ATI_Intermediate_Language_(IL)_Specification (v2.0d) [PDF 1.93MB]
AMD Evergreen-Family Instruction Set Architecture (v1.0c) [PDF 2.41MB]
AMD R600/R700/Evergreen Assembly Language Format [PDF 83.1KB]

[링크 : http://developer.amd.com/gpu/ATIStreamSDK/pages/Documentation.aspx]

'Programming > openCL & CUDA' 카테고리의 다른 글

PTX - Parallel Thread Execution (0)	2010.11.11
ATI Stream / OpenCL 을 Nvidia에서 돌려보았더니! (0)	2010.11.06
ATI Stream SDK (0)	2010.11.03
GPU Gems 3 (2)	2010.11.02
CUDA 예제파일 실행결과 + SLI (0)	2010.11.02

Posted by 구차니

구차니의 잡동사니 모음

'Programming/openCL & CUDA'에 해당되는 글 80건

CUDA Toolkit 3.2

CUDA Toolkit 3.2 (November 2010)

'Programming > openCL & CUDA' 카테고리의 다른 글

deviceQuery on 8600GT 512MB + CUDA 하드웨어 구조

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA on Linux

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA 예제 컴파일시 오류

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA / Visual Studio 2008

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA + Visual Studio 2005

Version history

'Programming > openCL & CUDA' 카테고리의 다른 글

nvcc for windows 제약사항?

'Programming > openCL & CUDA' 카테고리의 다른 글

PTX - Parallel Thread Execution

'Programming > openCL & CUDA' 카테고리의 다른 글

ATI Stream / OpenCL 을 Nvidia에서 돌려보았더니!

'Programming > openCL & CUDA' 카테고리의 다른 글

ATI STREAM - OpenCL 문서들

[링크 : http://developer.amd.com/gpu/ATIStreamSDK/pages/Documentation.aspx]

'Programming > openCL & CUDA' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

티스토리툴바