'프로그램 사용/openHPC'에 해당되는 글 45건

  1. 2021.01.07 MPI/PMI
  2. 2021.01.06 warewulf - wwsh 명령어
  3. 2021.01.05 hpl/linpack openmpi slurm
  4. 2021.01.05 openSUSE에 getvnfs extracting failed ㅠㅠ
  5. 2021.01.04 openSUSE에 openHPC 설치하기 part 1
  6. 2021.01.04 slurm, pbs pro, torque/maui
  7. 2020.12.28 slurm gpu
  8. 2020.12.28 xcat 는 arm 미지원
  9. 2020.12.28 xcat stateful, stateless
  10. 2020.12.23 slurm.conf 과 cpu 코어 (1)

회문인가?

찾는데 정말 안나오네 ㅠㅠ

 

[링크 : https://slurm.schedmd.com/mpi_guide.html]

 

Process Management Interface Exascale (PMIx) 

[링크 : https://openpmix.github.io/]

 

These process managers communicate with MPICH processes using a predefined interface called as PMI (process management interface).

[링크 : https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions]

 

+

slurm.schedmd.com/mpiplugins.html

'프로그램 사용 > openHPC' 카테고리의 다른 글

MPI/PMI  (0) 2021.01.07
warewulf - wwsh 명령어  (0) 2021.01.06
hpl/linpack openmpi slurm  (0) 2021.01.05
openSUSE에 getvnfs extracting failed ㅠㅠ  (0) 2021.01.05
openSUSE에 openHPC 설치하기 part 1  (0) 2021.01.04
slurm, pbs pro, torque/maui  (0) 2021.01.04
Posted by 구차니

댓글을 달아 주세요

openhpc 문서에서 하라는대로만 따라해서

어떤 원리로 어떻게 관리되는지 알지 못해서 구조를 좀 파악해야할 필요가 발생 ㅠㅠ

 

[링크 : https://github.com/nicksan2c/wwsh]

[링크 : https://dokuwiki.wesleyan.edu/doku.php?id=cluster:144]

 

+

> file /usr/bin/ww*
/usr/bin/wwan:        symbolic link to bluetooth
/usr/bin/wwbootstrap: Perl script text executable
/usr/bin/wwconfig:    Perl script text executable
/usr/bin/wwinit:      Bourne-Again shell script, ASCII text executable
/usr/bin/wwlivesync:  Perl script text executable
/usr/bin/wwmkchroot:  Bourne-Again shell script, ASCII text executable
/usr/bin/wwnodescan:  Perl script text executable
/usr/bin/wwsh:        Perl script text executable
/usr/bin/wwuseradd:   POSIX shell script, ASCII text executable
/usr/bin/wwvnfs:      Perl script text executable

 

wwsh list 모음

> wwsh bootstrap list
BOOTSTRAP NAME            SIZE (M)      ARCH
5.3.18-lp152.57-default   58.0          x86_64

> wwsh ipmi list
NAME                AUTO    IPMI_IPADDR    IPMI_NETMASK
================================================================================
openhpc-1           no      UNDEF          UNDEF
openhpc-2           no      UNDEF          UNDEF

> wwsh node list
NAME                GROUPS              IPADDR              HWADDR
================================================================================
openhpc-1           UNDEF               10.0.2.4            08:00:27:4b:df:4f
openhpc-2           UNDEF               10.0.2.5            08:00:27:f9:ca:3f

> wwsh object list
NAME                       _TYPE
======================================================
passwd                     file
group                      file
shadow                     file
munge.key                  file
5.3.18-lp152.57-default    bootstrap
leap15.2                   vnfs
network                    file
openhpc-1                  node
dynamic_hosts              file
openhpc-2                  node

> wwsh provision list
NODE                VNFS            BOOTSTRAP             FILES
================================================================================
openhpc-1           leap15.2        5.3.18-lp152.57-de... dynamic_hosts,grou...
openhpc-2           leap15.2        5.3.18-lp152.57-de... dynamic_hosts,grou...

> wwsh vnfs list
VNFS NAME            SIZE (M)   ARCH       CHROOT LOCATION
leap15.2             480.1      x86_64     /opt/ohpc/admin/images/leap15.2

 

도움말 모음

> wwsh help
Warewulf command line shell interface

Welcome to the Warewulf shell interface. This application allows you
to interact with the Warewulf backend database and modules via a
single interface.

  bootstrap        Manage your bootstrap images
  dhcp             Manage DHCP service and configuration
  events           Control how events are handled
  exit             Exit/leave the Warewulf shell
  file             Manage files within the Warewulf data store
  ipmi             Node IPMI configuration
  node             Node manipulation commands
  object           Generically manipulate all Warewulf data store entries
  output           Set the output verbosity level
  provision        Node provision manipulation commands
  pxe              Manage PXE configuration
  quit             Exit/leave the Warewulf shell
  ssh              Spawn parallel ssh connections to nodes.
  vnfs             Manage your VNFS images
USAGE:
     bootstrap <command> [options] [targets]

SUMMARY:
     This interface allows you to manage your bootstrap images within the Warewulf
     data store.

COMMANDS:

         import          Import a bootstrap image into Warewulf
         export          Export a bootstrap image to the local file system
         delete          Delete a bootstrap image from Warewulf
         list            Show all of the currently imported bootstrap images
         set             Set bootstrap attributes
         (re)build       Build (or rebuild) the tftp bootable image(s) on this host
         help            Show usage information
USAGE:
     dhcp <command>

SUMMARY:
        The DHCP command configures/reconfigures the DHCP service.

COMMANDS:

         update          Update the DHCP configuration, and restart the service
         restart         Restart the DHCP service
         help            Show usage information
USAGE:
     events [command]

SUMMARY:
     Control how/if events are handled.

COMMANDS:

     enable          Enable all events for this shell (default)
     disable         Disable the event handler
     help            Show usage information
USAGE:
     file <command> [options] [targets]

SUMMARY:
     The file command is used for manipulating file objects.  It allows you to
     import, export, create, and modify files within the Warewulf data store.
     File objects may be used to supply files to nodes at provision time,
     dynamically create files or scripts based on Warewulf data and more.

COMMANDS:
     import             Import a file into a file object
     export             Export file object(s)
     edit               Edit the file in the data store directly
     new                Create a new file in the data store
     set                Set file attributes/metadata
     show               Show the contents of a file
     list               List a summary of imported file(s)
     print              Print all file attributes
     (re)sync           Sync the data of a file object with its source(s)
     delete             Remove a node configuration from the data store
     help               Show usage information
USAGE:
     ipmi <command> [options] [targets]

SUMMARY:
    The ipmi command is used for setting node ipmi configuration attributes.

COMMANDS:

         set             Modify an existing node configuration
         list            List a summary of the node(s) ipmi configuration
         print           Print the full node(s) ipmi configuration
         poweron         Power on the list of nodes
         poweroff        Power off the list of nodes
         powercycle      Power cycle the list of nodes
         powerstatus     Print the power status of the nodes
         ident           Set chassis identify light to on for the nodes
         noident         Set chassis identify light to off for the nodes
         printsel        Print system event log for the nodes
         clearsel        Clear system event log for the nodes
         printsdr        Print sensor data records for the nodes
         console         Start a serial-over-lan console session.
                         NOTE: Requires that a serial console be defined
                         in kernel arguments, i.e. console=ttyS0,57600

         forcepxe        Force next boot from PXE
         forcedisk       Force next boot from first Hard Drive
         forcecdrom      Force next boot from CD-ROM
         forcebios       Force next boot into BIOS

         help            Show usage information
USAGE:
     node <command> [options] [targets]

SUMMARY:
     The node command is used for viewing and manipulating node objects.

COMMANDS:
         new             Create a new node configuration
         set             Modify an existing node configuration
         list            List a summary of nodes
         print           Print the node configuration
         delete          Remove a node configuration from the data store
         clone           Clone a node configuration to another node
         help            Show usage information
USAGE:
     object <command> [options] [targets]

SUMMARY:
     The object command provides an interface for generically manipulating all
     object types within the Warewulf data store.

COMMANDS:
     modify          Add, delete, and/or set object member variables
     print           Display object(s) and their members
     delete          Completely remove object(s) from the data store
     dump            Recursively dump objects in internal format
     jsondump        Recursively dump objects in json format
     canonicalize    Check and update objects to current standard format
     help            Show usage information
USAGE:
     output [command]

SUMMARY:
    This command sets the desired command output verbosity level.

COMMANDS:

         normal          The standard (and default) output level intended for
                         normal usage
         quiet           Only print warning, error, or critical messages.
         verbose         Increase verbosity over the normal output level
         debug           Show debugging messages (very verbose)
USAGE:
     provision <command> [options] [targets]

SUMMARY:
    The provision command is used for setting node provisioning attributes.

COMMANDS:

         set             Modify an existing node configuration
         list            List a summary of the node(s) provision configuration
         print           Print the full node(s) provision configuration
         help            Show usage information
USAGE:
     pxe <command> [options] [targets]

SUMMARY:
        Manage PXE configuration.

COMMANDS:

         update          Update the PXE configuration
         help            Show usage information
USAGE:
     ssh [nodes/targets] [command]

SUMMARY:
     Run ssh connections to node(s) in parallel by either node names, group
     or any other known lookup.
USAGE:
     vnfs <command> [options] [targets]

SUMMARY:
     This interface allows you to manage your VNFS images within the Warewulf
     data store.

COMMANDS:

         import          Import a VNFS image into Warewulf
         export          Export a VNFS image to the local file system
         delete          Delete a VNFS image from Warewulf
         list            Show all of the currently imported VNFS images
         set             Set any VNFS attributes
         help            Show usage information

 

 

 

'프로그램 사용 > openHPC' 카테고리의 다른 글

MPI/PMI  (0) 2021.01.07
warewulf - wwsh 명령어  (0) 2021.01.06
hpl/linpack openmpi slurm  (0) 2021.01.05
openSUSE에 getvnfs extracting failed ㅠㅠ  (0) 2021.01.05
openSUSE에 openHPC 설치하기 part 1  (0) 2021.01.04
slurm, pbs pro, torque/maui  (0) 2021.01.04
Posted by 구차니

댓글을 달아 주세요

mpirun

 

 

[링크 : http://www.brightcomputing.com/Blog/bid/151678/How-to-run-Linpack-across-a-Bright-Linux-cluster]

[링크 : http://www.open-mpi.org/faq/?category=slurm]

 

+ 2021.01.07

[링크 : https://juser.fz-juelich.de/record/851266/files/Parallel Programming (MPI) and Batch Usage (SLURM).pdf]

[링크 : https://thelinuxcluster.com/.../running-linpack-hpl-test-on-linux-cluster-with-openmpi-and-intel-compilers/]

[링크 : https://ulhpc-tutorials.readthedocs.io/en/latest/parallel/mpi/HPL/]

[링크 : https://stackoverflow.com/questions/13999415/how-do-you-specify-nodes-on-mpiruns-command-line]

[링크 : https://www.open-mpi.org/faq/?category=running#mpirun-scheduling]

 

mpirun 에 host 기재하기

% mpirun -host node1,node1,node2,node2 ...

mpirun 에 hostfile로 지정하기

% cat my_hosts
a slots=2 max_slots=20
b slots=2 max_slots=20
c slots=2 max_slots=20
d slots=2 max_slots=20

Suppose you issue the following command to run program a.out:


% mpirun -np 1 --hostfile my_hosts --host c a.out

[링크 : https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingPrograms.html]

'프로그램 사용 > openHPC' 카테고리의 다른 글

MPI/PMI  (0) 2021.01.07
warewulf - wwsh 명령어  (0) 2021.01.06
hpl/linpack openmpi slurm  (0) 2021.01.05
openSUSE에 getvnfs extracting failed ㅠㅠ  (0) 2021.01.05
openSUSE에 openHPC 설치하기 part 1  (0) 2021.01.04
slurm, pbs pro, torque/maui  (0) 2021.01.04
Posted by 구차니

댓글을 달아 주세요

부팅하다 커널 패닉나서 다시 보니

vnfs를 정상적으로 붙이지 못하는 것 같네..

 

 

우여곡절 끝에 찾아보니 저놈의 checksum 값이 나오는 항목은 찾았는데..

vnfs가 어떤식으로 구성이 되는진 모르겠으나..

# wwsh object dump
Object #5:  OBJECT REF Warewulf::Vnfs=HASH(0x55f062096fd0) {
    "ARCH" (4) => "x86_64" (6)
    "CHECKSUM" (8) => "9f8b23f061ae257ba752e3861b3c4a08" (32)
    "CHROOT" (6) => "/opt/ohpc/admin/images/leap15.2" (31)
    "NAME" (4) => "leap15.2" (8)
    "SIZE" (4) => 503415145
    "_ID" (3) => 6
    "_TIMESTAMP" (10) => 1609744655
    "_TYPE" (5) => "vnfs" (4)
}

[링크 : https://groups.io/g/OpenHPC-users/topic/stateful_provisioning_issues/7717941?p=]

 

# md5sum /srv/warewulf/bootstrap/x86_64/5/initfs.gz
94a63f3001bf9f738bd716a5ab71d61f  /srv/warewulf/bootstrap/x86_64/5/initfs.gz

이거 그건가...? extracting 에서 error가 발생했는데 엉뚱한데를 짚은건가?

 

+

2021.01.07

 

그러고 보니 왜 ID가 6인거지?

The section code that's sitting at is a 'wait "${EXTRACT_PID}"'.
EXTRACT_PID comes from this command:

  gunzip < /tmp/vnfs-download | bsdtar -pxf - 2>/dev/null &

So it seems that is failing somewhere. Do any other VNFS images work?

If you want to go in... you can extract the transport-http capability,
and edit the wwgetvfs script that's contained within. Remove that
"2>/dev/null" from that command, rebuild the capability file, and then
rebuild the bootstrap. That would hopefully point you to what's
actually throwing an error.

The ID always being 5 is correct. If it changed I would be worried.
That should be the Database ID of the VNFS. 

 

+

정상작동하는 openhpc/centos 에서도 ID:6 인데 extrating은 정상적으로 넘어간다.

 

체크섬도 여기서 나오는 같이랑 동일한데, SKIPPED 인건 문제가 없는건가.. 기본값이 SKIP인가?

Object #5:  OBJECT REF Warewulf::Vnfs=HASH(0x55e09d574470) {
    "ARCH" (4) => "x86_64" (6)
    "CHECKSUM" (8) => "3440bd638263bfb2e91e5f19b9afb51b" (32)
    "CHROOT" (6) => "/opt/ohpc/admin/images/centos8.2" (32)
    "NAME" (4) => "centos8.2" (9)
    "SIZE" (4) => 160341375
    "_ID" (3) => 6
    "_TIMESTAMP" (10) => 1608705691
    "_TYPE" (5) => "vnfs" (4)
}

'프로그램 사용 > openHPC' 카테고리의 다른 글

warewulf - wwsh 명령어  (0) 2021.01.06
hpl/linpack openmpi slurm  (0) 2021.01.05
openSUSE에 getvnfs extracting failed ㅠㅠ  (0) 2021.01.05
openSUSE에 openHPC 설치하기 part 1  (0) 2021.01.04
slurm, pbs pro, torque/maui  (0) 2021.01.04
slurm gpu  (0) 2020.12.28
Posted by 구차니

댓글을 달아 주세요

언제나 그렇듯!

.bashrc에 환경 변수 설정

다만, /root 의 .bashrc로 생성해서 환경 변수를 넣어 주는것이 편리하다. (추후 "sudo su -" 를 쓰기 편하기 위해)

openSUSE는 eth1로 뜨네?

export CHROOT=/opt/ohpc/admin/images/leap15.2
export eth_provision=eth1
export num_computes=2
export c_ip=("10.0.2.4" "10.0.2.5")
export c_mac=("08:00:27:4b:df:4f" "08:00:27:f9:ca:3f")
export c_name=("openhpc-1" "openhpc-2")
export compute_regex=openhpc-*

export ntp_server=10.0.2.15
export sms_name=master
export sms_ip=10.0.2.15
export sms_eth_internal=eth1
export internal_netmask=255.0.0.0

2020/12/02 - [프로그램 사용/slurm] - openHPC 설치 part 1?

 

+ DHCP로 ip는 받는데 tftp로 파일을 받아오지 못해서 확인해보니

SuSEfirewall2 라는 서비스 대신 firewalld로 suse leap 15 버전에서는 서비스 명칭이 변경되었다.

echo ${sms_ip} ${sms_name} >> /etc/hosts
systemctl disable firewalld
systemctl stop firewalld

 

 

ohpc용 저장소 추가 (특이하게 rpm을 쓰네?)

$ rpm -ivh http://repos.openhpc.community/OpenHPC/2/Leap_15/x86_64/ohpc-release-2-1.leap15.x86_64.rpm

 

ohpc용 기본 패키지 설치

zypper는 처음 써보는데.. 아무생각없이 하라는 대로 했더니 -n 옵션이 reject로 기본 입력되서 설치가 되지 않는다.

그냥 -n 빼고 물어볼 때 t(temporary)나 a(always)를 선택해 주어야 한다.

(sudo su 로 해보니 root는 -n이 y이고 일반 유저는 reject으로 설정이 되어있는 듯 하다)

> sudo zypper -n install ohpc-base ohpc-warewulf

New repository or package signing key received:

  Repository:       OpenHPC-2 - Base
  Key Name:         private OBS (key without passphrase) <defaultkey@localobs>
  Key Fingerprint:  5392744D 3C543ED5 784765E6 8A306019 DA565C6C
  Key Created:      Tue 17 Dec 2019 04:09:12 AM KST
  Key Expires:      (does not expire)
  Subkey:           210B8BF01271E2F2 2019-12-17 [does not expire]
  Rpm Name:         gpg-pubkey-da565c6c-5df7d658


Do you want to reject the key, trust temporarily, or trust always? [r/t/a/?] (r): r
Error building the cache:
[OpenHPC|http://repos.openhpc.community/OpenHPC/2/Leap_15] Valid metadata not found at specified URL
History:
 - Signature verification failed for repomd.xml
 - Can't provide /repodata/repomd.xml

Warning: Skipping repository 'OpenHPC-2 - Base' because of the above error.

 

귀찮으니 죽죽죽~ ㅋㅋ

3.3 Add provisioning services on master node
$ sudo su -
systemctl enable chronyd.service
echo "server ${ntp_server}" >> /etc/chrony.conf
echo "allow all" >> /etc/chrony.conf
systemctl restart chronyd

3.4 Add resource management services on master node
zypper -n install ohpc-slurm-server
cp /etc/slurm/slurm.conf.ohpc /etc/slurm/slurm.conf
perl -pi -e "s/ControlMachine=\S+/ControlMachine=${sms_name}/" /etc/slurm/slurm.conf

3.7 Complete basic Warewulf setup for master node
perl -pi -e "s/device = eth1/device = ${sms_eth_internal}/" /etc/warewulf/provision.conf
perl -pi -e "s/^DHCPD_INTERFACE=\S+/DHCPD_INTERFACE=${sms_eth_internal}/" /etc/sysconfig/dhcpd
systemctl enable tftp.socket
perl -pi -e "s#\#tftpdir = /var/lib/#tftpdir = /srv/#" /etc/warewulf/provision.conf
export MODFILE=/etc/apache2/conf.d/warewulf-httpd.conf
perl -pi -e "s#modules/mod_perl.so\$#/usr/lib64/apache2/mod_perl.so#" $MODFILE
perl -pi -e "s#modules/mod_version.so\$#/usr/lib64/apache2/mod_version.so#" $MODFILE

ip link set dev ${sms_eth_internal} up
ip address add ${sms_ip}/${internal_netmask} broadcast + dev ${sms_eth_internal}

systemctl enable mysql
systemctl restart mysql
systemctl enable apache2.service
systemctl restart apache2
systemctl enable dhcpd.service
systemctl enable tftp.socket
systemctl start tftp.socket

3.8.1 Build initial BOS image
mkdir -p -m 755 $CHROOT
mkdir -m 755 $CHROOT/dev
mknod -m 666 $CHROOT/dev/zero c 1 5
wwmkchroot -v opensuse-15.2 $CHROOT
cp -p /etc/zypp/repos.d/OpenHPC*.repo $CHROOT/etc/zypp/repos.d
zypper -n --root $CHROOT --no-gpg-checks --gpg-auto-import-keys refresh

3.8.2 Add OpenHPC components
zypper -n --root $CHROOT install ohpc-base-compute
cp -p /etc/resolv.conf $CHROOT/etc/resolv.conf
zypper -n --root $CHROOT --no-gpg-checks --gpg-auto-import-keys refresh
cp /etc/passwd /etc/group $CHROOT/etc
zypper -n --root $CHROOT install ohpc-slurm-client
chroot $CHROOT systemctl enable munge
echo SLURMD_OPTIONS="--conf-server ${sms_ip}" > $CHROOT/etc/sysconfig/slurmd
cp /opt/ohpc/pub/examples/udev/60-ipath.rules $CHROOT/etc/udev/rules.d/
zypper -n --root $CHROOT install chrony #(에러 발생)
chroot $CHROOT systemctl enable chrony #(에러 발생)
echo "server ${sms_ip}" >> $CHROOT/etc/chrony.conf
zypper -n --root $CHROOT install kernel-default
zypper -n --root $CHROOT install lmod-ohpc
chroot $CHROOT systemctl enable sshd.service
mv $CHROOT/etc/hostname $CHROOT/etc/hostname.orig

3.8.3 Customize system configuration
wwinit database
wwinit ssh_keys
echo "${sms_ip}:/home /home nfs nfsvers=3,nodev,nosuid 0 0" >> $CHROOT/etc/fstab
echo "${sms_ip}:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=3,nodev 0 0" >> $CHROOT/etc/fstab
echo "/home *(rw,no_subtree_check,fsid=10,no_root_squash)" >> /etc/exports
echo "/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)" >> /etc/exports
exportfs -a
systemctl restart nfs-server
systemctl enable nfs-server

3.8.5 Import files
wwsh file import /etc/passwd
wwsh file import /etc/group
wwsh file import /etc/shadow
wwsh file import /etc/munge/munge.key

3.9.1 Assemble bootstrap image
wwbootstrap `uname -r`

3.9.2 Assemble Virtual Node File System (VNFS) image
wwvnfs --chroot $CHROOT

3.9.3 Register nodes for provisioning
echo "GATEWAYDEV=${eth_provision}" > /tmp/network.$$
wwsh -y file import /tmp/network.$$ --name network
wwsh -y file set network --path /etc/sysconfig/network --mode=0644 --uid=0
for ((i=0; i<$num_computes; i++)) ; do
wwsh -y node new ${c_name[i]} --ipaddr=${c_ip[i]} --hwaddr=${c_mac[i]} -D ${eth_provision}
done
wwsh -y provision set "${compute_regex}" --vnfs=leap15.2 --bootstrap=`uname -r` \
--files=dynamic_hosts,passwd,group,shadow,munge.key,network
systemctl restart dhcpd
wwsh pxe update

 

'프로그램 사용 > openHPC' 카테고리의 다른 글

hpl/linpack openmpi slurm  (0) 2021.01.05
openSUSE에 getvnfs extracting failed ㅠㅠ  (0) 2021.01.05
openSUSE에 openHPC 설치하기 part 1  (0) 2021.01.04
slurm, pbs pro, torque/maui  (0) 2021.01.04
slurm gpu  (0) 2020.12.28
xcat 는 arm 미지원  (0) 2020.12.28
Posted by 구차니

댓글을 달아 주세요

slurm 으로만 봐서 다른 스케쥴러도 보고 있는데 영 복잡하네

torque/maui, torque/moab 과 같이 조합을 해야 slurm과 비교가 가능하다고 하고

[링크 : https://www.reddit.com/r/HPC/comments/5go1vr/differencesadvantages_of_slurm_vs_torque/]

 

openpbs는 언급이 거의 없는데 pbs pro나 torque와의 연관이 어떻게 되는지 좀 찾아봐야 할 듯.

[링크 : https://community.openpbs.org/t/migrating-from-torque-maui-slurm-vs-pbs-pro/740]

 

+

maui는 on-commercial

moab workload manager는 commercial 이라는데 moad는 maui로 위키 페이지가 연결되네.. 사라졌나?

maui는 개발중단 + 상업적으로 사용할 수 없다고.. (으응?)

 

[링크 : https://en.wikipedia.org/wiki/Maui_Cluster_Scheduler]

  [링크 : https://sourceforge.net/projects/mauisched/]

[링크 : https://en.wikipedia.org/wiki/TORQUE]

'프로그램 사용 > openHPC' 카테고리의 다른 글

openSUSE에 getvnfs extracting failed ㅠㅠ  (0) 2021.01.05
openSUSE에 openHPC 설치하기 part 1  (0) 2021.01.04
slurm, pbs pro, torque/maui  (0) 2021.01.04
slurm gpu  (0) 2020.12.28
xcat 는 arm 미지원  (0) 2020.12.28
xcat stateful, stateless  (0) 2020.12.28
Posted by 구차니

댓글을 달아 주세요

프로그램 사용/openHPC2020. 12. 28. 15:51

slurm에 gpu 설정하는 방법

흐음... 언제쯤 이걸 설정해 볼 수 있으려나?

 

[링크 : http://github.com/neurokernel/gpu-cluster-config/blob/master/slurm.conf]

[링크 : http://slurm.schedmd.com/gres.conf.html]

[링크 : http://stackoverflow.com/questions/60448280]

 

 

'프로그램 사용 > openHPC' 카테고리의 다른 글

openSUSE에 openHPC 설치하기 part 1  (0) 2021.01.04
slurm, pbs pro, torque/maui  (0) 2021.01.04
slurm gpu  (0) 2020.12.28
xcat 는 arm 미지원  (0) 2020.12.28
xcat stateful, stateless  (0) 2020.12.28
slurm.conf 과 cpu 코어  (1) 2020.12.23
Posted by 구차니

댓글을 달아 주세요

프로그램 사용/openHPC2020. 12. 28. 15:46

흐음...

powerpc는 지원해도 arm은 미지원인가...

 

[링크 : http://github.com/openhpc/ohpc/wiki/2.X]

[링크 : http://xcat-docs.readthedocs.io/en/stable/overview/support_matrix.html]

'프로그램 사용 > openHPC' 카테고리의 다른 글

slurm, pbs pro, torque/maui  (0) 2021.01.04
slurm gpu  (0) 2020.12.28
xcat 는 arm 미지원  (0) 2020.12.28
xcat stateful, stateless  (0) 2020.12.28
slurm.conf 과 cpu 코어  (1) 2020.12.23
slurm 먼가 까다롭네...  (3) 2020.12.23
Posted by 구차니

댓글을 달아 주세요

프로그램 사용/openHPC2020. 12. 28. 15:32

openhpc 에서 xcat은 두가지로 문서를 제공하길래 머하는 녀석인가 찾아보니

stateful은 diskful

stateless는 diskless 라고 한다.

 

warewulf 에서 PXE로 한걸 그럼 xcat에서는 stateless 라고 표현하는건가?

 

Provision machines in Diskful (stateful) and Diskless (stateless)

[링크 : https://xcat-docs.readthedocs.io/en/stable/]

 

 

'프로그램 사용 > openHPC' 카테고리의 다른 글

slurm gpu  (0) 2020.12.28
xcat 는 arm 미지원  (0) 2020.12.28
xcat stateful, stateless  (0) 2020.12.28
slurm.conf 과 cpu 코어  (1) 2020.12.23
slurm 먼가 까다롭네...  (3) 2020.12.23
slurm.conf 생성기  (0) 2020.12.23
Posted by 구차니

댓글을 달아 주세요

프로그램 사용/openHPC2020. 12. 23. 17:07

결론만 말하자면, virtualbox에서 가상 cpu의 갯수는 아래의 곱으로 설정을 해주어야 한다.

물리 서버에서 한다면 하이퍼 쓰레드랑 고려해서 적절하게 해주면 될 듯.

 

 

아래의 값이 기본값인데 openhpc 에서는 좀 높게 설정하네?

Sockets=1 CoresPerSocket=1 ThreadsPerCore=1

 

/etc/slurm/slurm.conf.ohpc

Sockets=2 CoresPerSocket=8 ThreadsPerCore=2

 

Socket은 TCP랑은 1도 상관없는 물리적인 CPU 소켓 갯수를 의미한다.

요즘 추세야 1cpu 멀티코어니까 1로 해도 무방할듯하고

 

CoresPerSocket은 1개 물리 CPU에 들어있는 physical CPU의 갯수

 

ThreadsPerCore는 intel 기준 HT 사용시 2로 1개 코어에서 사용하는 쓰레드 갯수를 의미한다.

 

Sockets
Number of physical processor sockets/chips on the node (e.g. "2"). If Sockets is omitted, it will be inferred from CPUs, CoresPerSocket, and ThreadsPerCore. NOTE: If you have multi-core processors, you will likely need to specify these parameters. Sockets and SocketsPerBoard are mutually exclusive. If Sockets is specified when Boards is also used, Sockets is interpreted as SocketsPerBoard rather than total sockets. The default value is 1.

CoresPerSocket
Number of cores in a single physical processor socket (e.g. "2"). The CoresPerSocket value describes physical cores, not the logical number of processors per socket. NOTE: If you have multi-core processors, you will likely need to specify this parameter in order to optimize scheduling. The default value is 1.

ThreadsPerCore
Number of logical threads in a single physical core (e.g. "2"). Note that the Slurm can allocate resources to jobs down to the resolution of a core. If your system is configured with more than one thread per core, execution of a different job on each thread is not supported unless you configure SelectTypeParameters=CR_CPU plus CPUs; do not configure Sockets, CoresPerSocket or ThreadsPerCore. A job can execute a one task per thread from within one job step or execute a distinct job step on each of the threads. Note also if you are running with more than 1 thread per core and running the select/cons_res or select/cons_tres plugin then you will want to set the SelectTypeParameters variable to something other than CR_CPU to avoid unexpected results. The default value is 1.

[링크 : https://slurm.schedmd.com/slurm.conf.html]

 


CPUs: Count of processors on each compute node. If CPUs is omitted, it will be inferred from: Sockets, CoresPerSocket, and ThreadsPerCore.

 Sockets: Number of physical processor sockets/chips on the node. If Sockets is omitted, it will be inferred from: CPUs, CoresPerSocket, and ThreadsPerCore.

 CoresPerSocket: Number of cores in a single physical processor socket. The CoresPerSocket value describes physical cores, not the logical number of processors per socket.

 ThreadsPerCore: Number of logical threads in a single physical core.

[링크 : https://slurm.schedmd.com/configurator.html]

'프로그램 사용 > openHPC' 카테고리의 다른 글

xcat 는 arm 미지원  (0) 2020.12.28
xcat stateful, stateless  (0) 2020.12.28
slurm.conf 과 cpu 코어  (1) 2020.12.23
slurm 먼가 까다롭네...  (3) 2020.12.23
slurm.conf 생성기  (0) 2020.12.23
openhpc, slurm 시도..  (0) 2020.12.22
Posted by 구차니

댓글을 달아 주세요

  1. show454544

    좋은 정보 감사합니다.
    그렇다면 하나당 4코어 8쓰레드인데 두개의 cpu가 장착된 서버라면
    sockets=2 CoresPerSocket=4
    ThreadsPerCore=8 이렇게 설정하고
    씨퓨어 수 설정하는 CPUs= 는 16 입력하면 되겠네요 ??
    아무튼 정보 감사드립니다.

    2021.01.23 00:12 [ ADDR : EDIT/ DEL : REPLY ]