neon auto vectoring

Programming/neon2015. 5. 6. 22:01

neon auto vectoring

arm-none-linux-gnueabi-gcc -mfpu=neon -ftree-vectorize -c vectorized.c

플래그로 tree-vectorize를 주고

소스에서 __restrict 하면 gcc에서 알아서 neon 코드를 생성해 낸다고 한다.

openmp 를 조금 해봐서 그런가..openmp의 스멜이야 ㅋㅋㅋ

$ vi auto.c

void add_ints(int * __restrict pa, int * __restrict pb, unsigned int n, int x)

{

unsigned int i;

for(i = 0; i < (n & ~3); i++)

pa[i] = pb[i] + x;

}

$ arm-linux-gnueabihf-gcc -mfpu=neon -ftree-vectorize -c auto.c

$ arm-linux-gnueabihf-objdump -D auto.o

auto.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <add_ints>:

0: e52db004 push {fp} ; (str fp, [sp, #-4]!)

4: e28db000 add fp, sp, #0

8: e24dd01c sub sp, sp, #28

c: e50b0010 str r0, [fp, #-16]

10: e50b1014 str r1, [fp, #-20]

14: e50b2018 str r2, [fp, #-24]

18: e50b301c str r3, [fp, #-28]

1c: e3a03000 mov r3, #0

20: e50b3008 str r3, [fp, #-8]

24: ea00000e b 64 <add_ints+0x64>

28: e51b3008 ldr r3, [fp, #-8]

2c: e1a03103 lsl r3, r3, #2

30: e51b2010 ldr r2, [fp, #-16]

34: e0823003 add r3, r2, r3

38: e51b2008 ldr r2, [fp, #-8]

3c: e1a02102 lsl r2, r2, #2

40: e51b1014 ldr r1, [fp, #-20]

44: e0812002 add r2, r1, r2

48: e5921000 ldr r1, [r2]

4c: e51b201c ldr r2, [fp, #-28]

50: e0812002 add r2, r1, r2

54: e5832000 str r2, [r3]

58: e51b3008 ldr r3, [fp, #-8]

5c: e2833001 add r3, r3, #1

60: e50b3008 str r3, [fp, #-8]

64: e51b3018 ldr r3, [fp, #-24]

68: e3c32003 bic r2, r3, #3

6c: e51b3008 ldr r3, [fp, #-8]

70: e1520003 cmp r2, r3

74: 8affffeb bhi 28 <add_ints+0x28>

78: e24bd000 sub sp, fp, #0

7c: e49db004 pop {fp} ; (ldr fp, [sp], #4)

80: e12fff1e bx lr

Disassembly of section .comment:

00000000 <.comment>:

0: 43434700 movtmi r4, #14080 ; 0x3700

4: 6328203a teqvs r8, #58 ; 0x3a

8: 73736f72 cmnvc r3, #456 ; 0x1c8

c: 6c6f6f74 stclvs 15, cr6, [pc], #-464 ; fffffe44 <add_ints+0xfffffe44>

10: 20474e2d subcs r4, r7, sp, lsr #28

14: 616e696c cmnvs lr, ip, ror #18

18: 312d6f72 teqcc sp, r2, ror pc

1c: 2e33312e rsfcssp f3, f3, #0.5

20: 2e342d31 mrccs 13, 1, r2, cr4, cr1, {1}

24: 30322d38 eorscc r2, r2, r8, lsr sp

28: 302e3431 eorcc r3, lr, r1, lsr r4

2c: 202d2031 eorcs r2, sp, r1, lsr r0

30: 616e694c cmnvs lr, ip, asr #18

34: 47206f72 ; <UNDEFINED> instruction: 0x47206f72

38: 32204343 eorcc r4, r0, #201326593 ; 0xc000001

3c: 2e333130 mrccs 1, 1, r3, cr3, cr0, {1}

40: 20293131 eorcs r3, r9, r1, lsr r1

44: 2e382e34 mrccs 14, 1, r2, cr8, cr4, {1}

48: 30322033 eorscc r2, r2, r3, lsr r0

4c: 31303431 teqcc r0, r1, lsr r4

50: 28203630 stmdacs r0!, {r4, r5, r9, sl, ip, sp}

54: 72657270 rsbvc r7, r5, #112, 4

58: 61656c65 cmnvs r5, r5, ror #24

5c: 00296573 eoreq r6, r9, r3, ror r5

Disassembly of section .ARM.attributes:

00000000 <.ARM.attributes>:

0: 00003241 andeq r3, r0, r1, asr #4

4: 61656100 cmnvs r5, r0, lsl #2

8: 01006962 tsteq r0, r2, ror #18

c: 00000028 andeq r0, r0, r8, lsr #32

10: 06003605 streq r3, [r0], -r5, lsl #12

14: 09010806 stmdbeq r1, {r1, r2, fp}

18: 0c030a01 stceq 10, cr0, [r3], {1}

1c: 14041201 strne r1, [r4], #-513 ; 0x201

20: 17011501 strne r1, [r1, -r1, lsl #10]

24: 19011803 stmdbne r1, {r0, r1, fp, ip}

28: 1b021a01 blne 86834 <add_ints+0x86834>

2c: 1e011c03 cdpne 12, 0, cr1, cr1, cr3, {0}

30: Address 0x00000030 is out of bounds.

[링크 : http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/ch01s04s03.html]

엥? vadd 라던가 이런류의 코드는 안보이는데 -ㅁ-?

라즈베리 파이용 컴파일러가 문제인가?

음.. 해도 안되는데 -ㅁ-? auto vertorize는 안되는건가? ㅠㅠ

-mfpu=name

This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: ‘vfp’, ‘vfpv3’, ‘vfpv3-fp16’, ‘vfpv3-d16’, ‘vfpv3-d16-fp16’, ‘vfpv3xd’, ‘vfpv3xd-fp16’, ‘neon’, ‘neon-fp16’, ‘vfpv4’, ‘vfpv4-d16’, ‘fpv4-sp-d16’, ‘neon-vfpv4’, ‘fpv5-d16’, ‘fpv5-sp-d16’, ‘fp-armv8’, ‘neon-fp-armv8’, and ‘crypto-neon-fp-armv8’.

If -msoft-float is specified this specifies the format of floating-point values.

If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.

[링크 : https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html]

[링크 : https://www.raspberrypi.org/forums/viewtopic.php?f=33&t=98354]

2016.03.14

-ftree-vectorizer-verbose=1

[링크 : http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/ch01s04s03.html]

Using the Vectorizer

Vectorization is enabled by the flag -ftree-vectorize and by default at -O3. To allow vectorization on powerpc* platforms also use -maltivec. On i?86 and x86_64 platforms use -msse/-msse2. To enable vectorization of floating point reductions use -ffast-math or -fassociative-math.

-ftree-vectorizer-verbose=2

[링크 : https://gcc.gnu.org/projects/tree-ssa/vectorization.html]

'Programming > neon' 카테고리의 다른 글

node.js 멀티코어 사용하기 (0)	2018.09.10
kernel mode neon support? (0)	2015.05.06
NEON instruction (0)	2015.05.05
neon 예제 실행 + 커널 교체 (0)	2015.05.04
arm neon 예제 컴파일 (0)	2015.05.03

Posted by 구차니

구차니의 잡동사니 모음

neon auto vectoring

'Programming > neon' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

티스토리툴바