준비할게 참 많구나...
'개소리 왈왈 > 직딩의 비애' 카테고리의 다른 글
| 야근야근 열매 (6) | 2012.05.29 |
|---|---|
| 프로그램에서 제일 중요한 것 (0) | 2012.05.24 |
| 내일 출근 어떻게 하지 -_- (0) | 2012.05.17 |
| 아이고 내 펀드 ㅠ.ㅠ (2) | 2012.05.11 |
| 길빵을 왜케 많이해? (2) | 2012.05.09 |
| 야근야근 열매 (6) | 2012.05.29 |
|---|---|
| 프로그램에서 제일 중요한 것 (0) | 2012.05.24 |
| 내일 출근 어떻게 하지 -_- (0) | 2012.05.17 |
| 아이고 내 펀드 ㅠ.ㅠ (2) | 2012.05.11 |
| 길빵을 왜케 많이해? (2) | 2012.05.09 |
MPEG-2/VC-1 support
Decode Acceleration for G8x, G9x (Requires Compute 1.1 or higher)
Full Bitstream Decode for MCP79, MCP89, G98, GT2xx, GF1xx
MPEG-2 CUDA accelerated decode with a GPUs with 8+ SMs (64 CUDA cores). (Windows)
Supports HD (1080i/p) playback including Bluray content
R185+ (Windows), R260+ (Linux)
H.264/AVCHD support
Baseline, Main, and High Profile, up to Level 4.1
Full Bitstream Decoding in hardware including HD (1080i/p) Bluray content
Supports B-Frames, bitrates up to 45 mbps
Available on NVIDIA GPUs: G8x, G9x, MCP79, MCP89, G98, GT2xx, GF1xx
R185+ (Windows), R260+ (Linux)
[출처 : CUDA_VideoDecoder_Library.pdf] |
| Supported on all CUDA-enabled GPUs with 32 scalar processor cores or more [출처 : CUDA_VideoEncoder_Library.pdf] |
Device 0: "ION"
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 268435456 bytes
Number of multiprocessors: 2
Number of cores: 16
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.10 GHz
Concurrent copy and execution: No
Run time limit on kernels: No
Integrated: Yes
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this devi
[링크 : http://forums.nvidia.com/index.php?showtopic=100288 ] |
| CUDA devicequery - ION 330 (0) | 2012.06.02 |
|---|---|
| cuda 5 preview (0) | 2012.06.02 |
| CUDA API 메모리 종류 (0) | 2012.05.18 |
| Interoperability (상호운용성) (0) | 2012.05.04 |
| cuda 내장변수 (0) | 2012.04.30 |
| if(((year % 400) == 0) || ((year % 4) == 0) && ((year % 100) != 0)) { // 윤년 } else { // 평년 } |
그레고리력의 윤년 규칙은 정확히 4년마다 윤년이 오는 율리우스력을 수정한 것이다. 정확한 규칙은 다음과 같다.
[링크 : http://ko.wikipedia.org/wiki/윤년] |
if( month==2 && ( year%4==0 && year%100!=0 || year%400==0 ) ) { maxDay = 29; } |
| apple 차세대 언어 swift (0) | 2014.06.03 |
|---|---|
| ARToolKit / openVRML (0) | 2012.12.25 |
| TBB/IPP (2) | 2012.02.12 |
| 프로그래밍 언어에 대한 생각 (2) | 2012.01.25 |
| S language (0) | 2011.07.01 |
| 생각을 정지합니다. 으아 안되잖아 (4) | 2012.06.11 |
|---|---|
| 엇 네이트온 메모짱이! (6) | 2012.06.01 |
| 잘가 메모짱 (0) | 2012.05.15 |
| 질문 : 왜 사는가? 어떻게 사는가? (0) | 2012.04.20 |
| 4.19 (0) | 2012.04.18 |
| 핸드폰 약정도 끝나가고.. (0) | 2012.07.03 |
|---|---|
| 수리주간? (2) | 2012.07.02 |
| android sdk 에서 가상머신만 설치하기? (0) | 2012.05.16 |
| android x86 + virtualbox (0) | 2012.05.16 |
| 안드로이드 마켓 인터넷으로 검색하기 (0) | 2012.01.17 |
| BB가 고장난게 아니라~ (0) | 2012.06.09 |
|---|---|
| 야간에 자전거! (2) | 2012.05.25 |
| 간만에 자전거 (4) | 2012.05.12 |
| 오랫만에 자전거 나들이 (4) | 2012.03.17 |
| 자전거 핸들 스템 (1) | 2011.12.21 |
| 5.3.2 Device Memory Accesses .................................................................... 70
5.3.2.1 Global Memory ............................................................................ 70
5.3.2.2 Local Memory .............................................................................. 72
5.3.2.3 Shared Memory ........................................................................... 72
5.3.2.4 Constant Memory ........................................................................ 73
5.3.2.5 Texture and Surface Memory ........................................................ 73 [출처 : CUDA C Programming guide.pdf] |
| Device 0: "GeForce 8800 GT" CUDA Driver Version: 3.20 CUDA Runtime Version: 3.10 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 1 Total amount of global memory: 536543232 bytes Number of multiprocessors: 14 Number of cores: 112 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 2011/01/02 - [Programming/openCL / CUDA] - deviceQuery on 8600GT 512MB + CUDA 하드웨어 구조 |
| |
|
| // Matrices are stored in row-major order:
// M(row, col) = *(M.elements + row * M.width + col)
typedef struct {
int width;
int height;
float* elements;
} Matrix;
// Thread block size
#define BLOCK_SIZE 16
// Forward declaration of the matrix multiplication kernel
__global__ void MatMulKernel(const Matrix, const Matrix, Matrix);
// Matrix multiplication - Host code
// Matrix dimensions are assumed to be multiples of BLOCK_SIZE
void MatMul(const Matrix A, const Matrix B, Matrix C)
{
// Load A and B to device memory
Matrix d_A;
d_A.width = A.width; d_A.height = A.height;
size_t size = A.width * A.height * sizeof(float);
cudaMalloc(&d_A.elements, size);
cudaMemcpy(d_A.elements, A.elements, size,
cudaMemcpyHostToDevice);
Matrix d_B;
d_B.width = B.width; d_B.height = B.height;
size = B.width * B.height * sizeof(float);
cudaMalloc(&d_B.elements, size);
cudaMemcpy(d_B.elements, B.elements, size,
cudaMemcpyHostToDevice);
// Allocate C in device memory
Matrix d_C;
d_C.width = C.width; d_C.height = C.height;
size = C.width * C.height * sizeof(float);
cudaMalloc(&d_C.elements, size);
// Invoke kernel
dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE);
dim3 dimGrid(B.width / dimBlock.x, A.height / dimBlock.y);
MatMulKernel<<<dimGrid, dimBlock>>>(d_A, d_B, d_C);
// Read C from device memory
cudaMemcpy(C.elements, Cd.elements, size,
cudaMemcpyDeviceToHost);
// Free device memory
cudaFree(d_A.elements);
cudaFree(d_B.elements);
cudaFree(d_C.elements);
}
// Matrix multiplication kernel called by MatMul()
__global__ void MatMulKernel(Matrix A, Matrix B, Matrix C)
{
// Each thread computes one element of C
// by accumulating results into Cvalue
float Cvalue = 0;
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
for (int e = 0; e < A.width; ++e)
Cvalue += A.elements[row * A.width + e]
* B.elements[e * B.width + col];
C.elements[row * C.width + col] = Cvalue;
} |
// Matrices are stored in row-major order:
// M(row, col) = *(M.elements + row * M.stride + col)
typedef struct {
int width;
int height;
int stride;
float* elements;
} Matrix;
// Get a matrix element
__device__ float GetElement(const Matrix A, int row, int col)
{
return A.elements[row * A.stride + col];
}
// Set a matrix element
__device__ void SetElement(Matrix A, int row, int col,
float value)
{
A.elements[row * A.stride + col] = value;
}
// Get the BLOCK_SIZExBLOCK_SIZE sub-matrix Asub of A that is
// located col sub-matrices to the right and row sub-matrices down
// from the upper-left corner of A
__device__ Matrix GetSubMatrix(Matrix A, int row, int col)
{
Matrix Asub;
Asub.width = BLOCK_SIZE;
Asub.height = BLOCK_SIZE;
Asub.stride = A.stride;
Asub.elements = &A.elements[A.stride * BLOCK_SIZE * row
+ BLOCK_SIZE * col];
return Asub;
}
// Thread block size
#define BLOCK_SIZE 16
// Forward declaration of the matrix multiplication kernel
__global__ void MatMulKernel(const Matrix, const Matrix, Matrix);
// Matrix multiplication - Host code
// Matrix dimensions are assumed to be multiples of BLOCK_SIZE
void MatMul(const Matrix A, const Matrix B, Matrix C)
{
// Load A and B to device memory
Matrix d_A;
d_A.width = d_A.stride = A.width; d_A.height = A.height;
size_t size = A.width * A.height * sizeof(float);
cudaMalloc(&d_A.elements, size);
cudaMemcpy(d_A.elements, A.elements, size,
cudaMemcpyHostToDevice);
Matrix d_B;
d_B.width = d_B.stride = B.width; d_B.height = B.height;
size = B.width * B.height * sizeof(float);
cudaMalloc(&d_B.elements, size);
cudaMemcpy(d_B.elements, B.elements, size,
cudaMemcpyHostToDevice);
// Allocate C in device memory
Matrix d_C;
d_C.width = d_C.stride = C.width; d_C.height = C.height;
size = C.width * C.height * sizeof(float);
cudaMalloc(&d_C.elements, size);
// Invoke kernel
dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE);
dim3 dimGrid(B.width / dimBlock.x, A.height / dimBlock.y);
MatMulKernel<<<dimGrid, dimBlock>>>(d_A, d_B, d_C);
// Read C from device memory
cudaMemcpy(C.elements, d_C.elements, size,
cudaMemcpyDeviceToHost);
// Free device memory
cudaFree(d_A.elements);
cudaFree(d_B.elements);
cudaFree(d_C.elements);
}
// Matrix multiplication kernel called by MatMul()
__global__ void MatMulKernel(Matrix A, Matrix B, Matrix C)
{
// Block row and column
int blockRow = blockIdx.y;
int blockCol = blockIdx.x;
// Each thread block computes one sub-matrix Csub of C
Matrix Csub = GetSubMatrix(C, blockRow, blockCol);
// Each thread computes one element of Csub
// by accumulating results into Cvalue
float Cvalue = 0;
// Thread row and column within Csub
int row = threadIdx.y;
int col = threadIdx.x;
// Loop over all the sub-matrices of A and B that are
// required to compute Csub
// Multiply each pair of sub-matrices together
// and accumulate the results
for (int m = 0; m < (A.width / BLOCK_SIZE); ++m) {
// Get sub-matrix Asub of A
Matrix Asub = GetSubMatrix(A, blockRow, m);
// Get sub-matrix Bsub of B
Matrix Bsub = GetSubMatrix(B, m, blockCol);
// Shared memory used to store Asub and Bsub respectively
__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];
__shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];
// Load Asub and Bsub from device memory to shared memory
// Each thread loads one element of each sub-matrix
As[row][col] = GetElement(Asub, row, col);
Bs[row][col] = GetElement(Bsub, row, col);
// Synchronize to make sure the sub-matrices are loaded
// before starting the computation
__syncthreads();
// Multiply Asub and Bsub together
for (int e = 0; e < BLOCK_SIZE; ++e)
Cvalue += As[row][e] * Bs[e][col];
// Synchronize to make sure that the preceding
// computation is done before loading two new
// sub-matrices of A and B in the next iteration
__syncthreads();
}
// Write Csub to device memory
// Each thread writes one element
SetElement(Csub, row, col, Cvalue);
}
|
| cuda 5 preview (0) | 2012.06.02 |
|---|---|
| nvidia ion cuda core와 h.264 library (0) | 2012.05.22 |
| Interoperability (상호운용성) (0) | 2012.05.04 |
| cuda 내장변수 (0) | 2012.04.30 |
| kernel block 과 thread (0) | 2012.04.26 |
| 지하철 10대 에티...켓? (4) | 2012.06.21 |
|---|---|
| 정전대비훈련? + 안철수vs박근혜 (4) | 2012.06.19 |
| 5.18 광주 민주화 운동 (2) | 2012.05.18 |
| 사막귀신을 믿습니다 에이맨~ (0) | 2012.05.17 |
| 수협 인터넷 예금담보대출 잠정 중단 (0) | 2012.05.10 |
| 정전대비훈련? + 안철수vs박근혜 (4) | 2012.06.19 |
|---|---|
| 한강에 사람이 넘쳐나고 소주 소비량 폭증할듯 (0) | 2012.05.18 |
| 사막귀신을 믿습니다 에이맨~ (0) | 2012.05.17 |
| 수협 인터넷 예금담보대출 잠정 중단 (0) | 2012.05.10 |
| 아오 히밤 (2) | 2012.04.12 |
| 프로그램에서 제일 중요한 것 (0) | 2012.05.24 |
|---|---|
| D-103 (0) | 2012.05.23 |
| 아이고 내 펀드 ㅠ.ㅠ (2) | 2012.05.11 |
| 길빵을 왜케 많이해? (2) | 2012.05.09 |
| 와이파이 안터져요 ㅠ.ㅠ (0) | 2012.05.04 |