Programming/neon2015. 5. 6. 22:01

arm-none-linux-gnueabi-gcc -mfpu=neon -ftree-vectorize -c vectorized.c

플래그로 tree-vectorize를 주고

소스에서 __restrict 하면 gcc에서 알아서 neon 코드를 생성해 낸다고 한다.

openmp 를 조금 해봐서 그런가..openmp의 스멜이야 ㅋㅋㅋ


$ vi auto.c

void add_ints(int * __restrict pa, int * __restrict pb, unsigned int n, int x)

{

    unsigned int i;


    for(i = 0; i < (n & ~3); i++)

        pa[i] = pb[i] + x;

}


$ arm-linux-gnueabihf-gcc -mfpu=neon -ftree-vectorize -c auto.c

$ arm-linux-gnueabihf-objdump -D auto.o


auto.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <add_ints>:
   0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   4:   e28db000        add     fp, sp, #0
   8:   e24dd01c        sub     sp, sp, #28
   c:   e50b0010        str     r0, [fp, #-16]
  10:   e50b1014        str     r1, [fp, #-20]
  14:   e50b2018        str     r2, [fp, #-24]
  18:   e50b301c        str     r3, [fp, #-28]
  1c:   e3a03000        mov     r3, #0
  20:   e50b3008        str     r3, [fp, #-8]
  24:   ea00000e        b       64 <add_ints+0x64>
  28:   e51b3008        ldr     r3, [fp, #-8]
  2c:   e1a03103        lsl     r3, r3, #2
  30:   e51b2010        ldr     r2, [fp, #-16]
  34:   e0823003        add     r3, r2, r3
  38:   e51b2008        ldr     r2, [fp, #-8]
  3c:   e1a02102        lsl     r2, r2, #2
  40:   e51b1014        ldr     r1, [fp, #-20]
  44:   e0812002        add     r2, r1, r2
  48:   e5921000        ldr     r1, [r2]
  4c:   e51b201c        ldr     r2, [fp, #-28]
  50:   e0812002        add     r2, r1, r2
  54:   e5832000        str     r2, [r3]
  58:   e51b3008        ldr     r3, [fp, #-8]
  5c:   e2833001        add     r3, r3, #1
  60:   e50b3008        str     r3, [fp, #-8]
  64:   e51b3018        ldr     r3, [fp, #-24]
  68:   e3c32003        bic     r2, r3, #3
  6c:   e51b3008        ldr     r3, [fp, #-8]
  70:   e1520003        cmp     r2, r3
  74:   8affffeb        bhi     28 <add_ints+0x28>
  78:   e24bd000        sub     sp, fp, #0
  7c:   e49db004        pop     {fp}            ; (ldr fp, [sp], #4)
  80:   e12fff1e        bx      lr

Disassembly of section .comment:

00000000 <.comment>:
   0:   43434700        movtmi  r4, #14080      ; 0x3700
   4:   6328203a        teqvs   r8, #58 ; 0x3a
   8:   73736f72        cmnvc   r3, #456        ; 0x1c8
   c:   6c6f6f74        stclvs  15, cr6, [pc], #-464    ; fffffe44 <add_ints+0xfffffe44>
  10:   20474e2d        subcs   r4, r7, sp, lsr #28
  14:   616e696c        cmnvs   lr, ip, ror #18
  18:   312d6f72        teqcc   sp, r2, ror pc
  1c:   2e33312e        rsfcssp f3, f3, #0.5
  20:   2e342d31        mrccs   13, 1, r2, cr4, cr1, {1}
  24:   30322d38        eorscc  r2, r2, r8, lsr sp
  28:   302e3431        eorcc   r3, lr, r1, lsr r4
  2c:   202d2031        eorcs   r2, sp, r1, lsr r0
  30:   616e694c        cmnvs   lr, ip, asr #18
  34:   47206f72                        ; <UNDEFINED> instruction: 0x47206f72
  38:   32204343        eorcc   r4, r0, #201326593      ; 0xc000001
  3c:   2e333130        mrccs   1, 1, r3, cr3, cr0, {1}
  40:   20293131        eorcs   r3, r9, r1, lsr r1
  44:   2e382e34        mrccs   14, 1, r2, cr8, cr4, {1}
  48:   30322033        eorscc  r2, r2, r3, lsr r0
  4c:   31303431        teqcc   r0, r1, lsr r4
  50:   28203630        stmdacs r0!, {r4, r5, r9, sl, ip, sp}
  54:   72657270        rsbvc   r7, r5, #112, 4
  58:   61656c65        cmnvs   r5, r5, ror #24
  5c:   00296573        eoreq   r6, r9, r3, ror r5

Disassembly of section .ARM.attributes:

00000000 <.ARM.attributes>:
   0:   00003241        andeq   r3, r0, r1, asr #4
   4:   61656100        cmnvs   r5, r0, lsl #2
   8:   01006962        tsteq   r0, r2, ror #18
   c:   00000028        andeq   r0, r0, r8, lsr #32
  10:   06003605        streq   r3, [r0], -r5, lsl #12
  14:   09010806        stmdbeq r1, {r1, r2, fp}
  18:   0c030a01        stceq   10, cr0, [r3], {1}
  1c:   14041201        strne   r1, [r4], #-513 ; 0x201
  20:   17011501        strne   r1, [r1, -r1, lsl #10]
  24:   19011803        stmdbne r1, {r0, r1, fp, ip}
  28:   1b021a01        blne    86834 <add_ints+0x86834>
  2c:   1e011c03        cdpne   12, 0, cr1, cr1, cr3, {0}
  30:   Address 0x00000030 is out of bounds.


[링크 : http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/ch01s04s03.html] 


엥? vadd 라던가 이런류의 코드는 안보이는데 -ㅁ-?

라즈베리 파이용 컴파일러가 문제인가?


음.. 해도 안되는데 -ㅁ-? auto vertorize는 안되는건가? ㅠㅠ

-mfpu=name

This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: ‘vfp’, ‘vfpv3’, ‘vfpv3-fp16’, ‘vfpv3-d16’, ‘vfpv3-d16-fp16’, ‘vfpv3xd’, ‘vfpv3xd-fp16’, ‘neon’, ‘neon-fp16’, ‘vfpv4’, ‘vfpv4-d16’, ‘fpv4-sp-d16’, ‘neon-vfpv4’, ‘fpv5-d16’, ‘fpv5-sp-d16’, ‘fp-armv8’, ‘neon-fp-armv8’, and ‘crypto-neon-fp-armv8’.

If -msoft-float is specified this specifies the format of floating-point values.


If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision. 

[링크 : https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html]

    [링크 : https://www.raspberrypi.org/forums/viewtopic.php?f=33&t=98354] 



+

2016.03.14

-ftree-vectorizer-verbose=1

[링크 : http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/ch01s04s03.html]


Using the Vectorizer


Vectorization is enabled by the flag -ftree-vectorize and by default at -O3. To allow vectorization on powerpc* platforms also use -maltivec. On i?86 and x86_64 platforms use -msse/-msse2. To enable vectorization of floating point reductions use -ffast-math or -fassociative-math.

-ftree-vectorizer-verbose=2

[링크 : https://gcc.gnu.org/projects/tree-ssa/vectorization.html]

'Programming > neon' 카테고리의 다른 글

node.js 멀티코어 사용하기  (0) 2018.09.10
kernel mode neon support?  (0) 2015.05.06
NEON instruction  (0) 2015.05.05
neon 예제 실행 + 커널 교체  (0) 2015.05.04
arm neon 예제 컴파일  (0) 2015.05.03
Posted by 구차니