프로그램 사용/gcc2023. 8. 8. 11:17

문득 cpu 사양 다시 볼까? 싶어서 보니

어? NEON이 아니라 NEON MPE?

NEON™ media-processing engine
Single and double precision Vector Floating Point Unit (VFPU)

[링크 : https://docs.xilinx.com/v/u/en-US/ds190-Zynq-7000-Overview]

 

그래서 cortex-A9 NEON MPE 명령을 뒤져보는데

VADD나 VSUB VMUL VDIV에 대해서 찾아보니 NEON으로는 float까지만 되도, double은 VFP를 통해서 가능할 것 같은데

D
Double precision floating-point values

F
Single precision floating-point values

H
Half precision floating-point values

I
Integer values

P
Polynomials with single-bit coefficients

X
Operation is independent of data representation.


Name Advanced SIMD VFP Description
VADD I, F F, D Add
VDIV - F, D Divide
VMUL I, F, P F, D Multiply
VSUB I, F F, D Subtract

[링크 : https://developer.arm.com/documentation/ddi0409/i/instruction-timing/cortex-a9-neon-mpe-instructions?lang=en]

타입을 바꾸어 봐도 안되서 골머리를 싸매다가(float는 된다매!!! double은 vfp로 된다매!!!)

main.c:187:2: missed: couldn't vectorize loop
main.c:177:6: missed: not vectorized: unsupported data-type double


main.c:187:2: missed: couldn't vectorize loop
main.c:177:6: missed: not vectorized: unsupported data-type float

 

금단의 플래그를 설정하니 잘 된다. -_-

main.c:194:2: optimized: loop vectorized using 16 byte vectors
main.c:188:2: optimized: loop vectorized using 16 byte vectors

 

IEEE를 무시하고 안전하지 않은 연산도 적용되고 하다보니 영 쓰기가 불안한데...

In addition GCC offers the -ffast-math flag which is a shortcut for several options, presenting the least conforming but fastest math mode. It enables -fno-trapping-math, -funsafe-math-optimizations, -ffinite-math-only, -fno-errno-math, -fno-signaling-nans, -fno-rounding-math, -fcx-limited-range and -fno-signed-zeros. Each of these flags violates IEEE in a different way. -ffast-math also may disable some features of the hardware IEEE implementation such as the support for denormals or flush-to-zero behavior. An example for such a case is x86_64 with it's use of SSE and SSE2 units for floating point math. 

[링크 : https://gcc.gnu.org/wiki/FloatingPointMath]


아무튼 어제 어디서 보다 찾았던 associative 옵션을 못찾아서 헤매다가 다시 생각나서 보는데

associative하지 않다.. 이게 무슨 의미지?

Goldberg 논문에 나온 것 처럼 floating-point의 계산은 associative하지 않다.
그러므로 ffast-math 연산 방식에서는 실제 값에 오류를 포함할 수 밖에 없다.
이러한 점 때문에 ffast-math 방식은 IEEE에서 정의한 방식을 따르지 못한다.

위와 같은 특징 때문에, 정확한 값을 계산해야하는 것이라면 ffast-math를 사용하면 안된다.
하지만 대충 어림잡아서 맞는 값을 원하는 것이라면?

[링크 : https://www.cv-learn.com/20210107-gcc-ffast-math/]


float 형의 오차로 인해서 계산때 마다 동일 결과가 나오지 않는다는 의미군..

결합의((a × b) × c = a × (b × c)의 예에서처럼 계산식이 부분의 순서와 상관없이 동일한 결과가 나오는)

[링크 : https://en.dict.naver.com/#/entry/enko/43a6bbaaacf546199c5d4c57b6b88ebb]


그래서 한번 -ffast-math 대신 적용해보려는데 다른 상위 옵션에 의해서 무시 당했다고 나온다.

누가 상위 옵션이려나?

-o -W -Wall -fopt-info-vec -march=armv7-a -mfpu=neon -O3 -fassociative-math

cc1: warning: ‘-fassociative-math’ disabled; other options take precedence


-ffast-math 보단 순한 맛이긴 한데 적용이 안되면 의미 없지 머..

-fassociative-math
Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on code that relies on rounding behavior like (x + 2**52) - 2**52. May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required. This option requires that both -fno-signed-zeros and -fno-trapping-math be in effect. Moreover, it doesn’t make much sense with -frounding-math. For Fortran the option is automatically enabled when both -fno-signed-zeros and -fno-trapping-math are in effect.

The default is -fno-associative-math.

[링크 : https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html]

'프로그램 사용 > gcc' 카테고리의 다른 글

gcc tree vectorize  (0) 2023.01.26
gcc fstack-protector-strong  (0) 2022.12.06
gcc vectorization 실패  (0) 2022.06.02
gcc / 문자열 선언  (0) 2022.03.17
static link  (0) 2022.02.07
Posted by 구차니
프로그램 사용/gcc2023. 1. 26. 19:55

필수

-O3 -ftree-vectorize

(-O2 에서는 -ftree-vectorize가 적용되지 않는다.)

 

옵션(?)

-mfpu=neon -fopt-info-vec[-all]

 

neon을 지정안해주어도 cortex-a9 에서 vfp로 되는진 모르겠지만

약간 변환되는게 있고

neon을 지정해주면 많이 늘어난다

 

[링크 : https://developer.arm.../Compiling-NEON-Instructions/Vectorization/Enabling-auto-vectorization-in-GCC-compiler]

'프로그램 사용 > gcc' 카테고리의 다른 글

gcc cortex-a9 double형 neon 연산 가속  (3) 2023.08.08
gcc fstack-protector-strong  (0) 2022.12.06
gcc vectorization 실패  (0) 2022.06.02
gcc / 문자열 선언  (0) 2022.03.17
static link  (0) 2022.02.07
Posted by 구차니
프로그램 사용/gcc2022. 12. 6. 16:59

fno 되어있어서 웬지 눈에 익숙한 느낌인데 

-fno-stack-protector

 

새로운게 보여서 찾아보니

말 그대로 stack 오버런이 발생하는지 감지하는 기능을 제공한다고 한다.

[링크 : https://m.blog.naver.com/neos_rtos/220688072708]

[링크 : https://developer.arm.com/documentation/101754/0618/armclang-Reference/armclang-Command-line-Options/-fstack-protector---fstack-protector-all---fstack-protector-strong---fno-stack-protector]

'프로그램 사용 > gcc' 카테고리의 다른 글

gcc cortex-a9 double형 neon 연산 가속  (3) 2023.08.08
gcc tree vectorize  (0) 2023.01.26
gcc vectorization 실패  (0) 2022.06.02
gcc / 문자열 선언  (0) 2022.03.17
static link  (0) 2022.02.07
Posted by 구차니
프로그램 사용/gcc2022. 6. 2. 14:47

아래 에러들은 SIMD 명령으로 변환하는데 실패한 녀석들인것 같은데

아래와 같은 유형들이 에러로 발생했다.

 

반복문이 중첩되거나, 반복문 내에서 조건문이 있으면 안되는 것 같고

tt.c:180:3: note: ===== analyze_loop_nest =====
tt.c:180:3: note: === vect_analyze_loop_form ===
tt.c:180:3: note: not vectorized: control flow in loop.
tt.c:180:3: note: bad loop form.


tt.c:61:3: note: ===== analyze_loop_nest =====
tt.c:61:3: note: === vect_analyze_loop_form ===
tt.c:61:3: note: not vectorized: multiple nested loops.
tt.c:61:3: note: bad loop form.

 

아래부터는 어떤 에러인지 감이 안오는 녀석들..

지원하지 않는 패턴

tt.c:83:7: note: Unsupported pattern.
tt.c:83:7: note: not vectorized: unsupported use in stmt.
tt.c:83:7: note: unexpected pattern.

 

지원되지 않는 데이터 타입. 코드를 보니 for문의 비교문에

함수 포인터를 통한 참조(->) 로 보려고 할때는 타입을 추적 못하는 듯?

tt.c:107:5: note: not vectorized: unsupported data-type
tt.c:107:5: note: can't determine vectorization factor.

 

no grouped store가 어떤건지 모르겠다.

val = data[];

out = data / 255;

이런식으로 단순화 가능한 코드인데 배열과 포인터로 배열 인자가 선형으로 분석될수 없기 때문에 그런걸지도?

tt.c:106:3: note: not vectorized: no grouped stores in basic block.
tt.c:106:3: note: ===vect_slp_analyze_bb===
tt.c:106:3: note: ===vect_slp_analyze_bb===
tt.c:108:32: note: === vect_analyze_data_refs ===
tt.c:108:32: note: not vectorized: not enough data-refs in basic block.

 

모르겠고..

tt.c:228:3: note: not vectorized: data ref analysis failed _47 = *_46;
tt.c:228:3: note: bad data references.

 

모르겠다!!!

tt.c:238:5: note: not vectorized: not suitable for gather load _47 = *_46;
tt.c:238:5: note: bad data references.

 

 

아무튼 AVX로도 변환이 안되는데 .. NEON으로 최적화 될만한 코드는 더더욱 아닐 것 같네.

'프로그램 사용 > gcc' 카테고리의 다른 글

gcc tree vectorize  (0) 2023.01.26
gcc fstack-protector-strong  (0) 2022.12.06
gcc / 문자열 선언  (0) 2022.03.17
static link  (0) 2022.02.07
구조체 타입과 변수명은 구분된다?  (0) 2021.11.18
Posted by 구차니
프로그램 사용/gcc2022. 3. 17. 12:05

weston 소스를 보는데 희한한(?) 문자열 선언이 보여서 확인

static const char * const connector_type_names[] = {
[DRM_MODE_CONNECTOR_Unknown]     = "Unknown",
[DRM_MODE_CONNECTOR_VGA]         = "VGA",
[DRM_MODE_CONNECTOR_DVII]        = "DVI-I",
[DRM_MODE_CONNECTOR_DVID]        = "DVI-D",
[DRM_MODE_CONNECTOR_DVIA]        = "DVI-A",
[DRM_MODE_CONNECTOR_Composite]   = "Composite",
[DRM_MODE_CONNECTOR_SVIDEO]      = "SVIDEO",
[DRM_MODE_CONNECTOR_LVDS]        = "LVDS",
[DRM_MODE_CONNECTOR_Component]   = "Component",
[DRM_MODE_CONNECTOR_9PinDIN]     = "DIN",
[DRM_MODE_CONNECTOR_DisplayPort] = "DP",
[DRM_MODE_CONNECTOR_HDMIA]       = "HDMI-A",
[DRM_MODE_CONNECTOR_HDMIB]       = "HDMI-B",
[DRM_MODE_CONNECTOR_TV]          = "TV",
[DRM_MODE_CONNECTOR_eDP]         = "eDP",
[DRM_MODE_CONNECTOR_VIRTUAL]     = "Virtual",
[DRM_MODE_CONNECTOR_DSI]         = "DSI",
[DRM_MODE_CONNECTOR_DPI]         = "DPI",
};

 

느낌은 알겠는데.. 도대체 어디서 정의된 문법이냐...

$ cat str.c
#include <stdio.h>

static const char * const connector_type_names[] = {
        [0]     = "Unknown",
        [1]         = "VGA",
        [2]        = "DVI-I",
        [3]        = "DVI-D",
        [4]        = "DVI-A",
        [5]   = "Composite",
        [6]      = "SVIDEO",
        [7]        = "LVDS",
        [8]   = "Component",
        [9]     = "DIN",
        [10] = "DP",
        [11]       = "HDMI-A",
        [12]       = "HDMI-B",
        [13]          = "TV",
        [14]         = "eDP",
        [15]     = "Virtual",
        [16]         = "DSI",
        [17]         = "DPI",
};

void main()
{
        for(int i = 0; i < 10; i++)
                printf("%s\n",connector_type_names[i]);
}

$ gcc str.c
$ ./a.out
Unknown
VGA
DVI-I
DVI-D
DVI-A
Composite
SVIDEO
LVDS
Component
DIN

 

 

$ cat str.c
#include <stdio.h>

static const char * const connector_type_names[] = {
        [2]     = "Unknown",
        [1]         = "VGA",
        [0]        = "DVI-I",
        [3]        = "DVI-D",
        [4]        = "DVI-A",
        [5]   = "Composite",
        [6]      = "SVIDEO",
        [7]        = "LVDS",
        [8]   = "Component",
        [9]     = "DIN",
        [10] = "DP",
        [11]       = "HDMI-A",
        [12]       = "HDMI-B",
        [13]          = "TV",
        [14]         = "eDP",
        [15]     = "Virtual",
        [16]         = "DSI",
        [17]         = "DPI",
};

void main()
{
        for(int i = 0; i < 10; i++)
                printf("%s\n",connector_type_names[i]);
}

$ gcc str.c
$ ./a.out
DVI-I
VGA
Unknown
DVI-D
DVI-A
Composite
SVIDEO
LVDS
Component
DIN

 

 

+

ISO C99, GNU C90 에서 지원하는 듯.

In ISO C99 you can give the elements in any order, specifying the array indices or structure field names they apply to, and GNU C allows this as an extension in C90 mode as well. This extension is not implemented in GNU C++.
To specify an array index, write ‘[index] =’ before the element value. For example,

int a[6] = { [4] = 29, [2] = 15 };

[링크 : https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html]

 

$ gcc -std=c89 str.c
str.c: In function ‘main’:
str.c:26:2: error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
  for(int i = 0; i < 10; i++)
  ^~~
str.c:26:2: note: use option -std=c99, -std=gnu99, -std=c11 or -std=gnu11 to compile your code

$ gcc -std=c90 str.c
str.c: In function ‘main’:
str.c:26:2: error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
  for(int i = 0; i < 10; i++)
  ^~~
str.c:26:2: note: use option -std=c99, -std=gnu99, -std=c11 or -std=gnu11 to compile your code

$ gcc -std=c99 str.c

'프로그램 사용 > gcc' 카테고리의 다른 글

gcc fstack-protector-strong  (0) 2022.12.06
gcc vectorization 실패  (0) 2022.06.02
static link  (0) 2022.02.07
구조체 타입과 변수명은 구분된다?  (0) 2021.11.18
gcc unsigned to signed upcast 테스트  (0) 2021.07.08
Posted by 구차니
프로그램 사용/gcc2022. 2. 7. 16:07

전부 정적으로 묶고 일부만 동적으로 묶는거나

동적으로 묶을건 냅두고 필요한(나머지) 것을 정적으로 묶는거나 그게 그건가?

 

-static-libgcc 한다고 해서 glibc 버전 안 맞다고 발생하는 에러를 해결할 순 없다

[링크 : https://stackoverflow.com/questions/26304531]

 

[링크 : https://kldp.org/node/136157]

  [링크 : https://enst.tistory.com/entry/liblibcso6-version-GLIBC27-not-found]

 

 

[링크 : https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html]

  [링크 : https://www.codeproject.com/Questions/1241890/How-to-link-to-libc-statically]

 

-lm 버전 문제 있으면 걍 해당 라이브러리를 static link 해버리면 되니까!

$ g++ -std=c++11 -o classify classify.cc -I/home/pi/work/coral/libedgetpu/tflite/public -I/home/pi/work/coral/tensorflow -I/home/pi/work/coral/tensorflow/tensorflow/lite/tools/make/downloads/flatbuffers/include -L/home/pi/work/coral/tensorflow/tensorflow/lite/tools/make/gen/rpi_armv7l/lib -L/home/pi/work/coral/pycoral/libedgetpu_bin/throttled/armv7a -ltensorflow-lite -static-libgcc -l:libedgetpu.so.1.0 -lpthread -ldl /usr/lib/arm-linux-gnueabihf/libm.a

[링크 : http://www.iamroot.org/xe/index.php?mid=Programming&document_srl=13406]

'프로그램 사용 > gcc' 카테고리의 다른 글

gcc vectorization 실패  (0) 2022.06.02
gcc / 문자열 선언  (0) 2022.03.17
구조체 타입과 변수명은 구분된다?  (0) 2021.11.18
gcc unsigned to signed upcast 테스트  (0) 2021.07.08
gcc vectorized loop  (0) 2021.06.30
Posted by 구차니
프로그램 사용/gcc2021. 11. 18. 12:46

gcc에서 해보는데 헐.. 이게 되네?

변수명과 구조체 타입명은 다른 영역으로 구분되나?

identifier로 동일하게 취급될줄 알았는데 아니라니?

 

$ cat t.c
#include <stdio.h>

struct help { int a; int b; int c;};

struct help a;
int help = 1;

void main()
{
        a.a = 5;
        a.b = 7;
        a.c = 9;
        printf("%d %d %d %d\n",help, a.a, a.b, a.c);
}

 

$ gcc t.c
$ ./a.out
1 5 7 9

 

[링크 : https://cpp.hotexamples.com/site/file?hash=0x3b8107de9c96fe7f1b348af2ed3b6ff15fb2cb740a65ea0a91d33bcab75b25ae&fullName=wheatley-master/wlegl_handle.c&project=jekstrand/wheatley]

 

다만, 함수와 변수명의 식별자 영역은 동일한지 에러가 발생한다.

$ cat t.c
#include <stdio.h>

void help()
{
        printf("%s\n",__func__);
}

struct help { int a; int b; int c;};

struct help a;
int help = 1;

void main()
{
        a.a = 5;
        a.b = 7;
        a.c = 9;
        printf("%d %d %d %d\n",help, a.a, a.b, a.c);
        help();
}

 

$ gcc t.c
t.c:11:5: error: ‘help’ redeclared as different kind of symbol
 int help = 1;
     ^~~~
t.c:3:6: note: previous definition of ‘help’ was here
 void help()
      ^~~~
t.c: In function ‘main’:
t.c:19:2: error: called object ‘help’ is not a function or function pointer
  help();
  ^~~~
t.c:11:5: note: declared here
 int help = 1;
     ^~~~

'프로그램 사용 > gcc' 카테고리의 다른 글

gcc / 문자열 선언  (0) 2022.03.17
static link  (0) 2022.02.07
gcc unsigned to signed upcast 테스트  (0) 2021.07.08
gcc vectorized loop  (0) 2021.06.30
gcc unsigned to signed cast  (0) 2021.06.22
Posted by 구차니
프로그램 사용/gcc2021. 7. 8. 10:40

귀찮으니 날로먹는 코딩으로 테스트

$ cat 1.c
#include <stdio.h>

void main()
{
        unsigned char a = 0xFF;
        char b =a;
        short c = a;
        short d = (char)a;
        short e = (int)a;

        int f = a;
        int g = (int)a;
        int h = (int)(char)a;

        printf("a %d\n",a);
        printf("a %d\n",(int)a);
        printf("a %d\n",(char)a);
        printf("b %d\n",b);
        printf("c %d\n",c);
        printf("d %d\n",d);
        printf("e %d\n",e);
        printf("f %d\n",f);
        printf("g %d\n",g);
        printf("h %d\n",h);
}

 

컴파일러 버전과 아키텍쳐, 그리고 결과인데... 머냐..?!?!

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
$ arm-linux-gnueabihf-gcc -v
Using built-in specs.
COLLECT_GCC=arm-linux-gnueabihf-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc-cross/arm-linux-gnueabihf/7/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-multiarch --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror --enable-multilib --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=arm-linux-gnueabihf --program-prefix=arm-linux-gnueabihf- --includedir=/usr/arm-linux-gnueabihf/include
Thread model: posix
gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)
$ ./1
a 255
a 255
a -1
b -1
c 255
d -1
e 255
f 255
g 255
h -1
# /mnt/1
a 255
a 255
a 255
b 255
c 255
d 255
e 255
f 255
g 255

 

라즈베리 파이 4에서 시도. arm 아키텍쳐용 컴파일러의 특성인가?

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/8/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --disable-libphobos --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-6)
$ ./1
a 255
a 255
a 255
b 255
c 255
d 255
e 255
f 255
g 255
h 255

 

+

두개중에 하나만 주면 되는진 모르겠지만, 둘 다 주거나 -fsigned-char 만 주어도 결과가 -1로 나오긴 한다.

원문으로 보니 --signed-chars는 RVCT 컴파일러를 위한 옵션인 듯.

The ANSI C standard specifies a range for both signed (at least -127 to +127) and unsigned (at least 0 to 255) chars. Simple chars are not specifically defined and it is compiler dependent whether they are signed or unsigned. Although the ARM architecture has the LDRSB instruction, that loads a signed byte into a 32-bit register with sign extension, the earliest versions of the architecture did not. It made sense at the time for the compiler to treat simple chars as unsigned, whereas on the x86 simple chars are, by default, treated as signed.
One workaround for users of GCC is to use the -fsigned-char command line switch or --signed-chars for RVCT, that forces all chars to become signed, but a better practice is to write portable code by declaring char variables appropriately. Unsigned char must be used for accessing memory as a block of bytes or for small unsigned integers. Signed char must be used for small signed integers and simple char must be used only for ASCII characters and strings. In fact, on an ARM core, it is usually better to use ints rather than chars, even for small values, for performance reasons. You can read more on this in Optimizing Code to Run on ARM Processors.

[링크 : https://developer.arm.com/.../Miscellaneous-C-porting-issues/unsigned-char-and-signed-char]

 

LDRSB (Thumb*) - Load Register Signed Byte

[링크 : http://qcd.phys.cmu.edu/QCDcluster/intel/vtune/reference/LDRSB_(Thumb).htm]

 

LDRB - Load Register Byte

[링크 : http://qcd.phys.cmu.edu/QCDcluster/intel/vtune/reference/INST_LDRB.htm]

 

char -> signed char: -fsigned-char == -fno-unsigned-char
char -> unsigned char: -funsigned-char == -fno-signed-char

[링크 : https://jooojub.github.io/gcc-options-fsigned-char/]

 

-fsigned-char
Let the type char be signed, like signed char.
Note that this is equivalent to -fno-unsigned-char, which is the negative form of -funsigned-char. Likewise, the option -fno-signed-char is equivalent to -funsigned-char.

-funsigned-char
Let the type char be unsigned, like unsigned char.
Each kind of machine has a default for what char should be. It is either like unsigned char by default or like signed char by default.
Ideally, a portable program should always use signed char or unsigned char when it depends on the signedness of an object. But many programs have been written to use plain char and expect it to be signed, or expect it to be unsigned, depending on the machines they were written for. This option, and its inverse, let you make such a program work with the opposite default.
The type char is always a distinct type from each of signed char or unsigned char, even though its behavior is always just like one of those two.

[링크 : https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html]

 

+

2021.07.09

으잉? singed char로 하면 되긴 한다. char가 signed 아니었어?!

'프로그램 사용 > gcc' 카테고리의 다른 글

static link  (0) 2022.02.07
구조체 타입과 변수명은 구분된다?  (0) 2021.11.18
gcc vectorized loop  (0) 2021.06.30
gcc unsigned to signed cast  (0) 2021.06.22
gcc %p (nil)  (0) 2021.05.07
Posted by 구차니
프로그램 사용/gcc2021. 6. 30. 11:43

-O3 하면 자동으로 -ftree-vectorize가 추가되었다고.

아무튼 연산만 하고 출력을 안하니 사용하지 않는 코드로 해서 vadd가 안나와서 한참을 헤맸네..

 

$ g++ -O3 -mavx autovector.cpp -fopt-info-vec-all
autovector.cpp:22:22: missed: couldn't vectorize loop
autovector.cpp:25:19: missed: not vectorized: complicated access pattern.
autovector.cpp:23:21: missed: couldn't vectorize loop
autovector.cpp:25:14: missed: not vectorized: complicated access pattern.
autovector.cpp:16:23: optimized: loop vectorized using 32 byte vectors
autovector.cpp:10:5: note: vectorized 1 loops in function.
autovector.cpp:15:43: missed: statement clobbers memory: now = std::chrono::_V2::system_clock::now ();
autovector.cpp:27:77: missed: statement clobbers memory: D.189348 = std::chrono::_V2::system_clock::now ();
autovector.cpp:28:2: missed: statement clobbers memory: __assert_fail ("result[2] == ( 2.0f + 0.1335f)+( 1.50f*2.0f + 0.9383f)-(0.33f*2.0f+0.1172f)+3*(float)(noTests-1)", "autovector.cpp", 28, "int main()");
/usr/include/c++/9/ostream:570:18: missed: statement clobbers memory: std::__ostream_insert<char, std::char_traits<char> > (&cout, "CG> message -channel \"exercise results\" Time used: ", 51);
/usr/include/c++/9/ostream:221:29: missed: statement clobbers memory: _46 = std::basic_ostream<char>::_M_insert<double> (&cout, _42);
/usr/include/c++/9/ostream:570:18: missed: statement clobbers memory: std::__ostream_insert<char, std::char_traits<char> > (_46, "s, N * noTests=", 15);
autovector.cpp:29:112: missed: statement clobbers memory: _35 = std::basic_ostream<char>::operator<< (_46, 2000000000);
/usr/include/c++/9/ostream:113:13: missed: statement clobbers memory: std::endl<char, std::char_traits<char> > (_35);
/usr/include/c++/9/iostream:74:25: missed: statement clobbers memory: std::ios_base::Init::Init (&__ioinit);
/usr/include/c++/9/iostream:74:25: missed: statement clobbers memory: __cxa_atexit (__dt_comp , &__ioinit, &__dso_handle);
$ gcc -mcpu=native -march=native -Q --help=target
The following options are target specific:
  -mabi=                                aapcs-linux
  -mabort-on-noreturn                   [disabled]
  -mandroid                             [disabled]
  -mapcs                                [disabled]
  -mapcs-frame                          [disabled]
  -mapcs-reentrant                      [disabled]
  -mapcs-stack-check                    [disabled]
  -march=                               armv7ve+vfpv3-d16
  -marm                                 [enabled]
  -masm-syntax-unified                  [disabled]
  -mbe32                                [enabled]
  -mbe8                                 [disabled]
  -mbig-endian                          [disabled]
  -mbionic                              [disabled]
  -mbranch-cost=                        -1
  -mcallee-super-interworking           [disabled]
  -mcaller-super-interworking           [disabled]
  -mcmse                                [disabled]
  -mcpu=                                cortex-a7
  -mfix-cortex-m3-ldrd                  [disabled]
  -mflip-thumb                          [disabled]
  -mfloat-abi=                          hard
  -mfp16-format=                        none
  -mfpu=                                vfp
  -mglibc                               [enabled]
  -mhard-float
  -mlittle-endian                       [enabled]
  -mlong-calls                          [disabled]
  -mmusl                                [disabled]
  -mneon-for-64bits                     [disabled]
  -mpic-data-is-text-relative           [enabled]
  -mpic-register=
  -mpoke-function-name                  [disabled]
  -mprint-tune-info                     [disabled]
  -mpure-code                           [disabled]
  -mrestrict-it                         [disabled]
  -msched-prolog                        [enabled]
  -msingle-pic-base                     [disabled]
  -mslow-flash-data                     [disabled]
  -msoft-float
  -mstructure-size-boundary=            8
  -mthumb                               [disabled]
  -mthumb-interwork                     [disabled]
  -mtls-dialect=                        gnu
  -mtp=                                 cp15
  -mtpcs-frame                          [disabled]
  -mtpcs-leaf-frame                     [disabled]
  -mtune=
  -muclibc                              [disabled]
  -munaligned-access                    [enabled]
  -mvectorize-with-neon-double          [disabled]
  -mvectorize-with-neon-quad            [enabled]
  -mword-relocations                    [disabled]

  Known ARM ABIs (for use with the -mabi= option):
    aapcs aapcs-linux apcs-gnu atpcs iwmmxt

  Known __fp16 formats (for use with the -mfp16-format= option):
    alternative ieee none

  Known ARM FPUs (for use with the -mfpu= option):
    auto crypto-neon-fp-armv8 fp-armv8 fpv4-sp-d16 fpv5-d16 fpv5-sp-d16 neon neon-fp-armv8 neon-fp16 neon-vfpv3 neon-vfpv4 vfp vfp3 vfpv2 vfpv3 vfpv3-d16 vfpv3-d16-fp16 vfpv3-fp16 vfpv3xd
    vfpv3xd-fp16 vfpv4 vfpv4-d16

  Valid arguments to -mtp=:
    auto cp15 soft

  Known floating-point ABIs (for use with the -mfloat-abi= option):
    hard soft softfp

  TLS dialect to use:
    gnu gnu2


[링크 : https://www.raspberrypi.org/forums/viewtopic.php?t=155461]

[링크 : https://www.codingame.com/playgrounds/283/sse-avx-vectorization/autovectorization]

 

+

$ cat neon.c
#include <stdio.h>

void main()
{
        int a[256];
        int b[256];
        int c[256];

        int i;
        for(i = 0; i < 256; i++)
        {
                a[i] = b[i] + c[i];
        }

        printf("%d %d %d\n", a[0], b[0], c[0]);
}

 

$ gcc -O3 neon.c -mfpu=neon
$ objdump -d a.out  | grep v
   10320:       e1a01000        mov     r1, r0
   10328:       e1a0300d        mov     r3, sp
   1032c:       f4610add        vld1.64 {d16-d17}, [r1 :64]!
   10330:       f4622add        vld1.64 {d18-d19}, [r2 :64]!
   10334:       f26008e2        vadd.i32        q8, q8, q9
   10338:       f4430add        vst1.64 {d16-d17}, [r3 :64]!
   10360:       e3a0b000        mov     fp, #0
   10364:       e3a0e000        mov     lr, #0
   1036c:       e1a0200d        mov     r2, sp
   1043c:       e3a03001        mov     r3, #1
   10454:       e1a07000        mov     r7, r0
   1046c:       e1a08001        mov     r8, r1
   10470:       e1a09002        mov     r9, r2
   10480:       e3a04000        mov     r4, #0
   1048c:       e1a02009        mov     r2, r9
   10490:       e1a01008        mov     r1, r8
   10494:       e1a00007        mov     r0, r7

 

$ gcc neon.c -mfpu=neon
$ objdump -d a.out  | grep v
   10318:       e3a0b000        mov     fp, #0
   1031c:       e3a0e000        mov     lr, #0
   10324:       e1a0200d        mov     r2, sp
   103f4:       e3a03001        mov     r3, #1
   10418:       e3a03000        mov     r3, #0
   10490:       e1a00000        nop                     ; (mov r0, r0)
   104a4:       e1a07000        mov     r7, r0
   104bc:       e1a08001        mov     r8, r1
   104c0:       e1a09002        mov     r9, r2
   104d0:       e3a04000        mov     r4, #0
   104dc:       e1a02009        mov     r2, r9
   104e0:       e1a01008        mov     r1, r8
   104e4:       e1a00007        mov     r0, r7

 

-fopt-info-vec-all 추가. -all 때문인지 어마어마하게 나오네

-fopt-info-vec 으로만 하니 깔끔하게 vectorized 라고 뜬다.

$ gcc neon.c -mfpu=neon -fopt-info-vec -O3
neon.c:10:2: note: loop vectorized

'프로그램 사용 > gcc' 카테고리의 다른 글

구조체 타입과 변수명은 구분된다?  (0) 2021.11.18
gcc unsigned to signed upcast 테스트  (0) 2021.07.08
gcc unsigned to signed cast  (0) 2021.06.22
gcc %p (nil)  (0) 2021.05.07
gcc -D 옵션 인자를 printf로 출력하기  (0) 2021.04.08
Posted by 구차니
프로그램 사용/gcc2021. 6. 22. 19:15

예전에는 문제없이 unsigned char에 대한 int 형으로의 캐스팅이 문제없이 되었던 것 같은데 안되서

C99나 C11의 영향인지 조금 더 찾아보는 중.

 

[링크 : https://gcc.gnu.org/onlinedocs/gcc/Characters-implementation.html#Characters-implementationl]

[링크 : https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html]

[링크 : https://gcc.gnu.org/wiki/NewWconversion]

 

'프로그램 사용 > gcc' 카테고리의 다른 글

gcc unsigned to signed upcast 테스트  (0) 2021.07.08
gcc vectorized loop  (0) 2021.06.30
gcc %p (nil)  (0) 2021.05.07
gcc -D 옵션 인자를 printf로 출력하기  (0) 2021.04.08
Auto-vectorization in GCC  (0) 2021.03.25
Posted by 구차니