엥... 

라즈베리 파이 3B 64bit OS 에서는 잘 실행되는데

pi@raspberrypi:~ $ mpirun
--------------------------------------------------------------------------
mpirun could not find anything to do.

It is possible that you forgot to specify how many processes to run
via the "-np" argument.
--------------------------------------------------------------------------

pi@raspberrypi:~ $ mpirun --version
mpirun (Open MPI) 4.1.0

Report bugs to http://www.open-mpi.org/community/help/

 

odroid c2 에서는(aarch64) 에러가 난다

$ mpirun
Illegal instruction

$ mpirun --help
Illegal instruction

$ mpirun --version
mpirun (Open MPI) 4.0.3

Report bugs to http://www.open-mpi.org/community/help/

 

strace라던가 써서 추적해보려는데 패키지가 되먹질 않으니 짜증나네 -_ㅠ

 

odroid c2

geteuid()                               = 1000
openat(AT_FDCWD, "/proc/cpuinfo", O_RDONLY) = 3
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x7fa18b7cfc} ---
+++ killed by SIGILL +++
Illegal instruction

 

라즈베리 파이

geteuid()                               = 1000
openat(AT_FDCWD, "/proc/cpuinfo", O_RDONLY) = 3
close(3)                                = 0
getuid()                                = 1000
geteuid()                               = 1000

 

도대체 고작 /proc/cpuinfo 여는걸로 죽다니 머지?

 

+

gdb로 해보니 뜬금없이 libopen-pal.so.40 에서 죽는다. 도대체 PAL 이 멀하는데 illegal instruction 을 띄울 작업을 하는거지?

$ gdb mpirun
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mpirun...
(No debugging symbols found in mpirun)
(gdb) r
Starting program: /usr/bin/mpirun 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000007fb7ec5cfc in ?? () from /usr/lib/aarch64-linux-gnu/libopen-pal.so.40

 

눈이 이상해졌나 open-pam 인 줄.. -_-

아무튼 libopen-pal 혹은 OPAL  이라는 녀석이 현재 문제인데

openmpi는 liboshmem(shared memory 관리?) libmpi 그리고 libopen-pal로 구성된다고 한다.

That is, they are compiled into separate libraries: liboshmem, libmpi, libopen-pal with a strict dependency order: OSHMEM depends on OMPI, OMPI depends on OPAL.

[링크:  https://docs.open-mpi.org/en/v5.0.x/developers/terminology.html]

 

open PAL은 체크포인트 와 프로그램 재시작을 할 수 있도록 해주는 녀석이니..

프로세서에 대해서 잘 알아야 프로그램을 멈추고 실행하게 할테니..

반대로 생각하면 odroid-c2에서 적용된 aarch64 버전이 amlogic 기반의 ap와는 맞지 않게 빌드된걸려나?

Open PAL can involuntarily checkpoint and restart sequential programs. Doing so requires that Open PAL was compiled with thread support and that the back-end checkpointing systems are available at run-time.

[링크 : https://www.open-mpi.org/doc/v4.1/man7/opal_crs.7.php]

 

+

odroid c2 에서 make -j4 로 openmpi-4.1.0 빌드에 걸린시간

중간에 gdb 설치한다고 순수하게 돌리면 24분쯤으로 되지 않았을까?

real 25m20.199s
user 52m9.680s
sys 5m27.920s

 

 

+

빡세게(?) 빌드까지 해서 겨우겨우 돌렸는데 왜 저런 에러가 또 나냐고 ㅠㅠ

Program received signal SIGILL, Illegal instruction.
0x0000007fb7ec236c in opal_timer_linux_find_freq ()
   from /usr/local/lib/libopen-pal.so.40

 

에러를 보니 먼가 함수명이 나와서 추적

$ grep -rn "opal_timer_linux_find_freq" .
Binary file ./opal/mca/timer/linux/.libs/timer_linux_component.o matches
Binary file ./opal/mca/timer/linux/.libs/libmca_timer_linux.a matches
./opal/mca/timer/linux/timer_linux_component.c:105:static int opal_timer_linux_find_freq(void)
./opal/mca/timer/linux/timer_linux_component.c:203:    ret = opal_timer_linux_find_freq();

$ cat opal/mca/timer/linux/timer_linux_component.c
static int opal_timer_linux_find_freq(void)
{
    FILE *fp;
    char *loc;
    float cpu_f;
    int ret;
    char buf[1024];

    fp = fopen("/proc/cpuinfo", "r");
    if (NULL == fp) {
        return OPAL_ERR_IN_ERRNO;
    }

    opal_timer_linux_freq = 0;

#if OPAL_ASSEMBLY_ARCH == OPAL_ARM64
        opal_timer_linux_freq = opal_sys_timer_freq();
#endif


$ grep -rni "opal_sys_timer_freq" .
./mca/timer/linux/timer_linux_component.c:121: opal_timer_linux_freq = opal_sys_timer_freq();
./include/opal/sys/arm64/timer.h:36:opal_sys_timer_freq(void)

$ cat openmpi-4.1.0/opal/include/opal/sys/arm64/timer.h
static inline opal_timer_t
opal_sys_timer_freq(void)
{
    opal_timer_t freq;
    __asm__ __volatile__ ("mrs %0,  CNTFRQ_EL0" : "=r" (freq));
    return (opal_timer_t)(freq);
}

 

레지스터 이름부터가 무시무시하네?

MRS
Move System Register.

Syntax
MRS Xt, (systemreg|Sop0_op1_Cn_Cm_op2)

Where:

Xt
Is the 64-bit name of the general-purpose destination register.
systemreg
Is a System register name.

The System register names are defined in 'AArch64 System Registers' in the System Register XML.

op0
Is an unsigned immediate, and can be either 2 or 3.
op1
Is a 3-bit unsigned immediate, in the range 0 to 7.
Cn
Is a name Cn, with n in the range 0 to 15.
Cm
Is a name Cm, with m in the range 0 to 15.
op2
Is a 3-bit unsigned immediate, in the range 0 to 7.
Usage
Move System Register allows the PE to read an AArch64 System register into a general-purpose register.

[링크 : https://developer.arm.com/documentation/dui0801/h/A64-General-Instructions/MRS]

 

Accessing CNTFRQ_EL0
Accesses to this register use the following encodings in the System register encoding space:

MRS <Xt>, CNTFRQ_EL0
op0

[링크 : https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Registers/CNTFRQ-EL0--Counter-timer-Frequency-register]

 

The two instructions you show above are therefore -

MSR HPFAR_EL2, X0

MSR PSTATEField_SP, #0

[링크 : https://reverseengineering.stackexchange.com/questions/14617/arm-understanding-msr-mrs-instructions]

 

보면.. uboot에도 있는 아주 평범한(?) 녀석인데.. 머지?

unsigned long timer_read_counter(void)
{
unsigned long cntpct;
unsigned long temp;

isb();
asm volatile("mrs %0, cntpct_el0" : "=r" (cntpct));
asm volatile("mrs %0, cntpct_el0" : "=r" (temp));
while (temp != cntpct) {
asm volatile("mrs %0, cntpct_el0" : "=r" (cntpct));
asm volatile("mrs %0, cntpct_el0" : "=r" (temp));
}

return cntpct;
}

[링크 : https://github.com/qemu/u-boot/blob/master/arch/arm/cpu/armv8/generic_timer.c]

 

%0은 이전의 dst 를 의미하는걸려나?

int val = 50;
__asm volatile ("MOV R0, %0":  : "r"(var) );

==> (컴파일러 해석) 

ldr r3, [r7, #4]

mov r0, r3

[링크 : https://dhpark1212.tistory.com/entry/ARM-GCC-Inline-assembly-coding]

 

amlogic 의 Cortex-A53인데 많이 다른가?

ODROID-C2 ODROID-C1+ RPi 3 Model B
CPU
Amlogic S905 SoC
4 x ARM Cortex-A53 1.5GHz
64bit ARMv8 Architecture @28nm
Amlogic S805 SoC
4 x ARM Cortex-A5 1.5GHz
32bit ARMv7 Architecture @28nm
Broadcom BCM2837
4 x ARM Cortex-A53 1.2Ghz
64bit ARMv7 Architecture @40nm

[링크 : https://www.hardkernel.com/ko/shop/odroid-c2/]

'프로그램 사용 > openFOAM' 카테고리의 다른 글

openFOAM + freecad + salome  (0) 2023.06.07
openFOAM tutorial with youtube  (0) 2023.05.24
openfoam on ubuntu  (0) 2023.05.24
openFOAM tutorial  (2) 2023.05.24
openfoam7 on ubuntu 18.04  (0) 2020.08.09
Posted by 구차니