arm-none-linux-gnueabi-gcc -mfpu=neon -ftree-vectorize -c vectorized.c
플래그로 tree-vectorize를 주고
소스에서 __restrict 하면 gcc에서 알아서 neon 코드를 생성해 낸다고 한다.
openmp 를 조금 해봐서 그런가..openmp의 스멜이야 ㅋㅋㅋ
$ vi auto.c void add_ints(int * __restrict pa, int * __restrict pb, unsigned int n, int x) { unsigned int i; for(i = 0; i < (n & ~3); i++) pa[i] = pb[i] + x; } $ arm-linux-gnueabihf-gcc -mfpu=neon -ftree-vectorize -c auto.c $ arm-linux-gnueabihf-objdump -D auto.o auto.o: file format elf32-littlearm Disassembly of section .text: 00000000 <add_ints>: 0: e52db004 push {fp} ; (str fp, [sp, #-4]!) 4: e28db000 add fp, sp, #0 8: e24dd01c sub sp, sp, #28 c: e50b0010 str r0, [fp, #-16] 10: e50b1014 str r1, [fp, #-20] 14: e50b2018 str r2, [fp, #-24] 18: e50b301c str r3, [fp, #-28] 1c: e3a03000 mov r3, #0 20: e50b3008 str r3, [fp, #-8] 24: ea00000e b 64 <add_ints+0x64> 28: e51b3008 ldr r3, [fp, #-8] 2c: e1a03103 lsl r3, r3, #2 30: e51b2010 ldr r2, [fp, #-16] 34: e0823003 add r3, r2, r3 38: e51b2008 ldr r2, [fp, #-8] 3c: e1a02102 lsl r2, r2, #2 40: e51b1014 ldr r1, [fp, #-20] 44: e0812002 add r2, r1, r2 48: e5921000 ldr r1, [r2] 4c: e51b201c ldr r2, [fp, #-28] 50: e0812002 add r2, r1, r2 54: e5832000 str r2, [r3] 58: e51b3008 ldr r3, [fp, #-8] 5c: e2833001 add r3, r3, #1 60: e50b3008 str r3, [fp, #-8] 64: e51b3018 ldr r3, [fp, #-24] 68: e3c32003 bic r2, r3, #3 6c: e51b3008 ldr r3, [fp, #-8] 70: e1520003 cmp r2, r3 74: 8affffeb bhi 28 <add_ints+0x28> 78: e24bd000 sub sp, fp, #0 7c: e49db004 pop {fp} ; (ldr fp, [sp], #4) 80: e12fff1e bx lr Disassembly of section .comment: 00000000 <.comment>: 0: 43434700 movtmi r4, #14080 ; 0x3700 4: 6328203a teqvs r8, #58 ; 0x3a 8: 73736f72 cmnvc r3, #456 ; 0x1c8 c: 6c6f6f74 stclvs 15, cr6, [pc], #-464 ; fffffe44 <add_ints+0xfffffe44> 10: 20474e2d subcs r4, r7, sp, lsr #28 14: 616e696c cmnvs lr, ip, ror #18 18: 312d6f72 teqcc sp, r2, ror pc 1c: 2e33312e rsfcssp f3, f3, #0.5 20: 2e342d31 mrccs 13, 1, r2, cr4, cr1, {1} 24: 30322d38 eorscc r2, r2, r8, lsr sp 28: 302e3431 eorcc r3, lr, r1, lsr r4 2c: 202d2031 eorcs r2, sp, r1, lsr r0 30: 616e694c cmnvs lr, ip, asr #18 34: 47206f72 ; <UNDEFINED> instruction: 0x47206f72 38: 32204343 eorcc r4, r0, #201326593 ; 0xc000001 3c: 2e333130 mrccs 1, 1, r3, cr3, cr0, {1} 40: 20293131 eorcs r3, r9, r1, lsr r1 44: 2e382e34 mrccs 14, 1, r2, cr8, cr4, {1} 48: 30322033 eorscc r2, r2, r3, lsr r0 4c: 31303431 teqcc r0, r1, lsr r4 50: 28203630 stmdacs r0!, {r4, r5, r9, sl, ip, sp} 54: 72657270 rsbvc r7, r5, #112, 4 58: 61656c65 cmnvs r5, r5, ror #24 5c: 00296573 eoreq r6, r9, r3, ror r5 Disassembly of section .ARM.attributes: 00000000 <.ARM.attributes>: 0: 00003241 andeq r3, r0, r1, asr #4 4: 61656100 cmnvs r5, r0, lsl #2 8: 01006962 tsteq r0, r2, ror #18 c: 00000028 andeq r0, r0, r8, lsr #32 10: 06003605 streq r3, [r0], -r5, lsl #12 14: 09010806 stmdbeq r1, {r1, r2, fp} 18: 0c030a01 stceq 10, cr0, [r3], {1} 1c: 14041201 strne r1, [r4], #-513 ; 0x201 20: 17011501 strne r1, [r1, -r1, lsl #10] 24: 19011803 stmdbne r1, {r0, r1, fp, ip} 28: 1b021a01 blne 86834 <add_ints+0x86834> 2c: 1e011c03 cdpne 12, 0, cr1, cr1, cr3, {0} 30: Address 0x00000030 is out of bounds. [링크 : http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/ch01s04s03.html] |
엥? vadd 라던가 이런류의 코드는 안보이는데 -ㅁ-?
라즈베리 파이용 컴파일러가 문제인가?
음.. 해도 안되는데 -ㅁ-? auto vertorize는 안되는건가? ㅠㅠ
-mfpu=name This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: ‘vfp’, ‘vfpv3’, ‘vfpv3-fp16’, ‘vfpv3-d16’, ‘vfpv3-d16-fp16’, ‘vfpv3xd’, ‘vfpv3xd-fp16’, ‘neon’, ‘neon-fp16’, ‘vfpv4’, ‘vfpv4-d16’, ‘fpv4-sp-d16’, ‘neon-vfpv4’, ‘fpv5-d16’, ‘fpv5-sp-d16’, ‘fp-armv8’, ‘neon-fp-armv8’, and ‘crypto-neon-fp-armv8’. If -msoft-float is specified this specifies the format of floating-point values. If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision. [링크 : https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html] [링크 : https://www.raspberrypi.org/forums/viewtopic.php?f=33&t=98354] |
+
2016.03.14
-ftree-vectorizer-verbose=1
[링크 : http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/ch01s04s03.html]
Using the Vectorizer
Vectorization is enabled by the flag -ftree-vectorize and by default at -O3. To allow vectorization on powerpc* platforms also use -maltivec. On i?86 and x86_64 platforms use -msse/-msse2. To enable vectorization of floating point reductions use -ffast-math or -fassociative-math.
-ftree-vectorizer-verbose=2
[링크 : https://gcc.gnu.org/projects/tree-ssa/vectorization.html]
'Programming > neon' 카테고리의 다른 글
node.js 멀티코어 사용하기 (0) | 2018.09.10 |
---|---|
kernel mode neon support? (0) | 2015.05.06 |
NEON instruction (0) | 2015.05.05 |
neon 예제 실행 + 커널 교체 (0) | 2015.05.04 |
arm neon 예제 컴파일 (0) | 2015.05.03 |