embeded/jetson

jetson nano nvcc 빌드

구차니 2026. 4. 6. 21:16

음.. cp로 하니 이상하게 안되는 군

$ nvcc tt.cpp
tt.cpp: In function ‘void kernel_test(int*, int*, int*)’:
tt.cpp:14:12: error: ‘threadIdx’ was not declared in this scope
  int idx = threadIdx.x +blockIdx.x * blockDim.x + (gridDim.x * blockDim.x) * (blockIdx.y * blockDim.y + threadIdx.y);
            ^~~~~~~~~
tt.cpp:14:12: note: suggested alternative: ‘pthread_t’
  int idx = threadIdx.x +blockIdx.x * blockDim.x + (gridDim.x * blockDim.x) * (blockIdx.y * blockDim.y + threadIdx.y);
            ^~~~~~~~~
            pthread_t
tt.cpp:14:25: error: ‘blockIdx’ was not declared in this scope
  int idx = threadIdx.x +blockIdx.x * blockDim.x + (gridDim.x * blockDim.x) * (blockIdx.y * blockDim.y + threadIdx.y);
                         ^~~~~~~~
tt.cpp:14:25: note: suggested alternative: ‘clock’
  int idx = threadIdx.x +blockIdx.x * blockDim.x + (gridDim.x * blockDim.x) * (blockIdx.y * blockDim.y + threadIdx.y);
                         ^~~~~~~~
                         clock
tt.cpp:14:38: error: ‘blockDim’ was not declared in this scope
  int idx = threadIdx.x +blockIdx.x * blockDim.x + (gridDim.x * blockDim.x) * (blockIdx.y * blockDim.y + threadIdx.y);
                                      ^~~~~~~~
tt.cpp:14:38: note: suggested alternative: ‘clock’
  int idx = threadIdx.x +blockIdx.x * blockDim.x + (gridDim.x * blockDim.x) * (blockIdx.y * blockDim.y + threadIdx.y);
                                      ^~~~~~~~
                                      clock
tt.cpp:14:52: error: ‘gridDim’ was not declared in this scope
  int idx = threadIdx.x +blockIdx.x * blockDim.x + (gridDim.x * blockDim.x) * (blockIdx.y * blockDim.y + threadIdx.y);
                                                    ^~~~~~~
tt.cpp: At global scope:
tt.cpp:18:11: error: ‘::main’ must return ‘int’
 void main()
           ^
tt.cpp: In function ‘int main()’:
tt.cpp:61:15: error: expected primary-expression before ‘<’ token
  kernel_test<<<block,thread>>>(dev_a,dev_b,dev_c);
               ^
tt.cpp:61:30: error: expected primary-expression before ‘>’ token
  kernel_test<<<block,thread>>>(dev_a,dev_b,dev_c);
                              ^

 

음.. cuda는 main이 int 형이여야 하는군

$ nvcc tt.cu
tt.cu(18): warning: return type of function "main" must be "int"

tt.cu(18): warning: return type of function "main" must be "int"

tt.cu:18:11: error: ‘::main’ must return ‘int’
 void main()
           ^

[링크 : https://mangkyu.tistory.com/84]

 

싱글코어

$ ./a.out
cpu Time : 0.206937
gpu Time : 0.000106

 

어..? 멀티코어 돌리는게 왜 더 느려?!?!

$ nvcc  -Xcompiler -fopenmp tt.cu -o a.out.mp
jetson@nano-4gb-jp451:~$ ./a.out.mp
cpu Time : 0.231175
gpu Time : 0.000088

[링크 : https://forums.developer.nvidia.com/t/how-use-openmp-in-cu-file/2918/10]

 

2014.01.17 - [Programming/openCL & CUDA] - cuda + openmp 적용 예제