Chapter 25. Configuring SLI and Multi-GPU FrameRendering
______________________________________________________________________________
The NVIDIA Linux driver contains support for NVIDIA SLI FrameRendering and
NVIDIA Multi-GPU FrameRendering. Both of these technologies allow an OpenGL
application to take advantage of multiple GPUs to improve visual performance.
The distinction between SLI and Multi-GPU is straightforward. SLI is used to
leverage the processing power of GPUs across two or more graphics cards, while
Multi-GPU is used to leverage the processing power of two GPUs colocated on
the same graphics card. If you want to link together separate graphics cards,
you should use the "SLI" X config option. Likewise, if you want to link
together GPUs on the same graphics card, you should use the "MultiGPU" X
config option. If you have two cards, each with two GPUs, and you wish to link
them all together, you should use the "SLI" option.
25A. RENDERING MODES
In Linux, with two GPUs SLI and Multi-GPU can both operate in one of three
modes: Alternate Frame Rendering (AFR), Split Frame Rendering (SFR), and
Antialiasing (AA). When AFR mode is active, one GPU draws the next frame while
the other one works on the frame after that. In SFR mode, each frame is split
horizontally into two pieces, with one GPU rendering each piece. The split
line is adjusted to balance the load between the two GPUs. AA mode splits
antialiasing work between the two GPUs. Both GPUs work on the same scene and
the result is blended together to produce the final frame. This mode is useful
for applications that spend most of their time processing with the CPU and
cannot benefit from AFR.
With four GPUs, the same options are applicable. AFR mode cycles through all
four GPUs, each GPU rendering a frame in turn. SFR mode splits the frame
horizontally into four pieces. AA mode splits the work between the four GPUs,
allowing antialiasing up to 64x. With four GPUs SLI can also operate in an
additional mode, Alternate Frame Rendering of Antialiasing. (AFR of AA). With
AFR of AA, pairs of GPUs render alternate frames, each GPU in a pair doing
half of the antialiasing work. Note that these scenarios apply whether you
have four separate cards or you have two cards, each with two GPUs.
With some GPU configurations, there is in addition a special SLI Mosaic Mode
to extend a single X screen transparently across all of the available display
outputs on each GPU. See below for the exact set of configurations which can
be used with SLI Mosaic Mode.
25B. ENABLING MULTI-GPU
Multi-GPU is enabled by setting the "MultiGPU" option in the X configuration
file; see Appendix B for details about the "MultiGPU" option.
The nvidia-xconfig utility can be used to set the "MultiGPU" option, rather
than modifying the X configuration file by hand. For example:
% nvidia-xconfig --multigpu=on
25C. ENABLING SLI
SLI is enabled by setting the "SLI" option in the X configuration file; see
Appendix B for details about the SLI option.
The nvidia-xconfig utility can be used to set the SLI option, rather than
modifying the X configuration file by hand. For example:
% nvidia-xconfig --sli=on
25D. ENABLING SLI MOSAIC MODE
The simplest way to configure SLI Mosaic Mode using a grid of monitors is to
use 'nvidia-settings' (see Chapter 24). The steps to perform this
configuration are as follows:
1. Connect each of the monitors you would like to use to any connector from
any GPU used for SLI Mosaic Mode. If you are going to use fewer monitors
than there are connectors, connect one monitor to each GPU before adding
a second monitor to any GPUs.
2. Install the NVIDIA display driver set.
3. Configure an X screen to use the "nvidia" driver on at least one of the
GPUs (see Chapter 6 for more information).
4. Start X.
5. Run 'nvidia-settings'. You should see a tab in the left pane of
nvidia-settings labeled "SLI Mosaic Mode Settings". Note that you may
need to expand the entry for the X screen you configured earlier.
6. Check the "Use SLI Mosaic Mode" check box.
7. Select the monitor grid configuration you'd like to use from the "display
configuration" dropdown.
8. Choose the resolution and refresh rate at which you would like to drive
each individual monitor.
9. Set any overlap you would like between the displays.
10. Click the "Save to X Configuration File" button. NOTE: If you don't have
permissions to write to your system's X configuration file, you will be
prompted to choose a location to save the file. After doing so, you MUST
copy the X configuration file into a location the X server will consider
upon startup (usually '/etc/X11/xorg.conf' for X.Org servers or
'/etc/X11/XF86Config' for XFree86 servers).
11. Exit nvidia-settings and restart your X server.
Alternatively, nvidia-xconfig can be used to configure SLI Mosaic Mode via a
command like 'nvidia-xconfig --sli=Mosaic --metamodes=METAMODES' where the
METAMODES string specifies the desired grid configuration. For example:
will configure four DFPs in a 2x2 configuration, each running at 1920x1024,
with the two DFPs on GPU-0 driving the top two monitors of the 2x2
configuration, and the two DFPs on GPU-1 driving the bottom two monitors of
the 2x2 configuration.
See the MetaModes X configuration description in details in Chapter 13. See
Appendix C for further details on GPU and Display Device Names.
25E. HARDWARE REQUIREMENTS
SLI functionality requires:
o Identical PCI-Express graphics cards
o A supported motherboard (with the exception of Quadro Plex)
o In most cases, a video bridge connecting the two graphics cards
o To use SLI Mosaic Mode, the GPUs must either be part of a Quadro Plex
Visual Computing System (VCS) Model IV or newer, or each GPU must be
Quadro FX 5800, or Quadro Fermi or newer.
For the latest in supported SLI and Multi-GPU configurations, including SLI-
and Multi-GPU capable GPUs and SLI-capable motherboards, see
http://www.slizone.com.
25F. OTHER NOTES AND REQUIREMENTS
The following other requirements apply to SLI and Multi-GPU:
o Mobile GPUs are NOT supported
o SLI on Quadro-based graphics cards always requires a video bridge
o TwinView is also not supported with SLI or Multi-GPU. Only one display
can be used when SLI or Multi-GPU is enabled, with the exception of
Mosaic.
o If X is configured to use multiple screens and screen 0 has SLI or
Multi-GPU enabled, the other screens configured to use the nvidia driver
will be disabled. Note that if SLI or Multi-GPU is enabled, the GPUs used
by that configuration will be unavailable for single GPU rendering.
FREQUENTLY ASKED SLI AND MULTI-GPU QUESTIONS
Q. Why is glxgears slower when SLI or Multi-GPU is enabled?
A. When SLI or Multi-GPU is enabled, the NVIDIA driver must coordinate the
operations of all GPUs when each new frame is swapped (made visible). For
most applications, this GPU synchronization overhead is negligible.
However, because glxgears renders so many frames per second, the GPU
synchronization overhead consumes a significant portion of the total time,
and the framerate is reduced.
Q. Why is Doom 3 slower when SLI or Multi-GPU is enabled?
A. The NVIDIA Accelerated Linux Graphics Driver does not automatically detect
the optimal SLI or Multi-GPU settings for games such as Doom 3 and Quake 4.
To work around this issue, the environment variable __GL_DOOM3 can be set
to tell OpenGL that Doom 3's optimal settings should be used. In Bash, this
can be done in the same command that launches Doom 3 so the environment
variable does not remain set for other OpenGL applications started in the
same session:
% __GL_DOOM3=1 doom3
Doom 3's startup script can also be modified to set this environment
variable:
#!/bin/sh
# Needed to make symlinks/shortcuts work.
# the binaries must run with correct working directory
cd "/usr/local/games/doom3/"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.
export __GL_DOOM3=1
exec ./doom.x86 "$@"
This environment variable is temporary and will be removed in the future.
Q. Why does SLI or MultiGPU fail to initialize?
A. There are several reasons why SLI or MultiGPU may fail to initialize. Most
of these should be clear from the warning message in the X log file; e.g.:
o "Unsupported bus type"
o "The video link was not detected"
o "GPUs do not match"
o "Unsupported GPU video BIOS"
o "Insufficient PCI-E link width"
The warning message "'Unsupported PCI topology'" is likely due to problems
with your Linux kernel. The NVIDIA driver must have access to the PCI
Bridge (often called the Root Bridge) that each NVIDIA GPU is connected to
in order to configure SLI or MultiGPU correctly. There are many kernels
that do not properly recognize this bridge and, as a result, do not allow
the NVIDIA driver to access this bridge. See the below "How can I determine
if my kernel correctly detects my PCI Bridge?" FAQ for details.
Below are some specific troubleshooting steps to help deal with SLI and
MultiGPU initialization failures.
o Make sure that ACPI is enabled in your kernel. NVIDIA's experience
has been that ACPI is needed for the kernel to correctly recognize
the Root Bridge. Note that in some cases, the kernel's version of
ACPI may still have problems and require an update to a newer kernel.
o Run 'lspci' to check that multiple NVIDIA GPUs can be identified by
the operating system; e.g:
% /sbin/lspci | grep -i nvidia
If 'lspci' does not report all the GPUs that are in your system, then
this is a problem with your Linux kernel, and it is recommended that
you use a different kernel.
Please note: the 'lspci' utility may be installed in a location other
than '/sbin' on your system. If the above command fails with the
error: "'/sbin/lspci: No such file or directory'", please try:
% lspci | grep -i nvidia
, instead. You may also need to install your distribution's
"pciutils" package.
o Make sure you have the most recent SBIOS available for your
motherboard.
o The PCI-Express slots on the motherboard must provide a minimum link
width. Please make sure that the PCI Express slot(s) on your
motherboard meet the following requirements and that you have
connected the graphics board to the correct PCI Express slot(s):
o A dual-GPU board needs a minimum of 8 lanes (i.e. x8 or x16)
o A pair of single-GPU boards requires one of the following
supported link width combinations:
o x16 + x16
o x16 + x8
o x16 + x4
o x8 + x8
Q. How can I determine if my kernel correctly detects my PCI Bridge?
A. As discussed above, the NVIDIA driver must have access to the PCI Bridge
that each NVIDIA GPU is connected to in order to configure SLI or MultiGPU
correctly. The following steps will identify whether the kernel correctly
recognizes the PCI Bridge:
Note that in the first example, bus 81 is connected to Root Bridge
80, but that in the second example there is no Root Bridge 80 and bus
81 is incorrectly connected at the base of the device tree. In the
bad case, the only solution is to upgrade your kernel to one that
properly detects your PCI bus layout.
NVCC는 NV(Nvidia)CC(C Compiler) 인데, 구조적으로 아래와 같은 컴파일 과정을 거친다.
호스트 코드는 일반적인 C 컴파일러(예를 들면 비쥬얼 스튜디오 커맨드 라인이나 gcc)로 컴파일을 떠넘기고
nvcc는 머신코드(CUDA device용 PTX)를 생성한다.
즉, 어떠한 코드를 컴파일 하는데 있어 nvcc만으로는 독립적으로 컴파일이 진행될수 없다.
그런 이유로 윈도우에서는 Visual Studio에 빌붙고, 리눅스에서는 gcc에 빌붙는다.
nvcc의 목적에 나온 내용으로, CUDA가 아닌 내용은 범용 C 컴파일러로 투척(forward)한다고 되어있고
윈도우에서는 MS Visual Studio의 cl을 실행(instance)하여 사용한다고 되어있다.
Purpose of nvcc
This compilation trajectory involves several splitting, compilation, preprocessing,
and merging steps for each CUDA source file, and several of these steps are subtly
different for different modes of CUDA compilation (such as compilation for device
emulation, or the generation of device code repositories). It is the purpose of the
CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from
developers. Additionally, instead of being a specific CUDA compilation driver,
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range of
conventional compiler options, such as for defining macros and include/library
paths, and for steering the compilation process. All non-CUDA compilation steps are forwarded to a general purpose C compiler that is supported by nvcc, and on Windos platforms, where this compiler is an instance of the Microsoft Visual Studio compiler, nvcc will translate its options into appropriate ‘cl’ command syntax. This
extended behavior plus ‘cl’ option translation is intended for support of portable
application build and make scripts across Linux and Windows platforms.
그리고 내 컴퓨터에는 일단..
Visual Studio 6.0이 설치되어 있고, 개인적인 .net 거부반응으로 인해 2002나 2008 이런 녀석들은 설치되어 있지 않다.
아무튼, host compiler에서 Windows platform은
"Microsoft Visual Studio compiler, cl" 이라고 되어 있는디..
VS2002 부터 지원하는지는 모르겠지만 아무튼, cl은 command line이라고
clcc.exe 같은 녀석으로 지원하는 커맨드 라인 MSVS 컴파일러 이다.
혹시나 openCL인줄 알았더니 그것도 아니네 -_-
그리고 Supported build enviroment 에서는 Windows + MinGW shell이 존재한다.
gcc가 아니다 shell 이다 -_- 즉, 죽어도 컴파일러는 Visual Studio를 설치할 수 밖에 없다(윈도우에서는)
오래전에 사용하던 250GB 하드가 슬슬 소리를 내기 시작하길래 다시 한번 수치들을 확인해보았다.
(물론 예전에 BIOS SMART 에서 경고를 냈기에 이미 교체하고 잡다한 날아가도 되는 동영상을 모아놓은 하드로 사용중)
문제가 있는지 노란색으로 줄 쫙!
하지만 항상 그러하듯, Current / Worst / Threshold / Data가 가지는 의미를 이해하지 못하겠다.
원래 의미대로라면 Current의 값은 0부터 시작해서 Worst 까지 가능하며
Threshold는 Worst 보다 작아야 하며 대부분 Threshold 보다 Current가 올라가면 경고가 떠야 할것 같은데..
그리고 Data는 Current가 있는데 또 왜 존재하는 걸까?
아무튼, 설치시 메뉴얼을 일단 복사!
This function uses S.M.A.R.T. (Self-Monitoring Analysis and Reporting
Technology) to get information about the health of the hard disk.
The table
shows the following parameters:
- ID: parameter which is being measured
-
Current: current value
- Worst: the worst value which has been recorded since
the hard disk was first used
- Threshold: the value of any of the parameters
should never get below the threshold.
- Data: shows the usable data which
belongs to the ID.
- Status: status of the parameter (OK or
failed).
Power on time
The power on time is usually indicated
in hours, but some manufacturers show this time in minutes or even seconds.
위의 스샷을 보면 ID는 하드별로 지원하는 항목이 차이가 있어 보이고
Current는 현재값 (어이어이 이렇게 뭉뚱그려 놓으면 어떻게 알아!)
Worst는 하드를 사용하면서 기록된 최악의 값(그럼 이게 더 늘어 날수도 있다는 의미군!)
Threshold는 어떤 값도 이것 이하로 떨어져서는 안된다는데(읭? 번역이 안돼!!!!)
Data는 ID에 속한 사용가능한 값(그러니까 Current는 일종의 정규화를 거친 Data 값 인가?)을 의미한다.
초과한다는 의미는, current 값은 threshold보다 낮아야 하고 값은 항상 증가하는 방향이어야 한다는 의미.
The most basic information that SMART provides is the SMART status. It
provides only two values: "threshold not exceeded" and "threshold
exceeded". Often these are represented as "drive OK" or "drive fail"
respectively. A "threshold exceeded" value is intended to indicate that
there is a relatively high probability that the drive will not be able
to honor its specification in the future: that is, the drive is "about
to fail".
[링크 : http://en.wikipedia.org/wiki/S.M.A.R.T.]
근데 꼭 또 그렇지도 않은듯?
Legend
Higher raw value is better
Lower raw value is better
Critical: red colored row
Potential indicators of imminent electromechanical failure
10
0A
Spin Retry Count
Count of retry of spin start attempts. This attribute stores a total
count of the spin start attempts to reach the fully operational speed
(under the condition that the first attempt was unsuccessful). An
increase of this attribute value is a sign of problems in the hard disk
mechanical subsystem.
문득 네트워크 대역폭을 느릴 이유가 있어서
본딩을 검색하다 보니 Teaming이라는 용어가 나오길래 한번 검색을 해보았다.
근원은 집합연결 이고
하위 기술로 Ethernet bonding, NIC teaming 등이 존재한다.
이러한 기술의 근원은 과거 네트워크가 느렸기 때문 속도의 제약을 뛰어넘고,
안정성의 확보를 위해(단일 링크일 경우 하나만 끊어지면 전체망이 죽어 버리므로) 사용해왔다고 한다.
Other terms for link aggregation include Ethernet bonding, NIC teaming, Trunking, port channel, link bundling, EtherChannel, Multi-link trunking (MLT), NIC bonding, network bonding,[1]Network Fault Tolerance (NFT), Smartgroup (from ZTE), and EtherTrunk (from Huawei).
Link aggregation addresses two problems with Ethernet connections: bandwidth limitations and lack of resilience.
With regard to the first issue: bandwidth requirements do not scale
linearly. Ethernet bandwidths historically have increased by an order of magnitude each generation: 10 Megabit/s,
100 Mbit/s, 1000 Mbit/s, 10,000 Mbit/s. If one started to bump into
bandwidth ceilings, then the only option was to move to the next
generation which could be cost prohibitive. An alternative solution,
introduced by many of the network manufacturers in the early 1990s, is
to combine two physical Ethernet links into one logical link via channel bonding. Most of these solutions required manual configuration and identical equipment on both sides of the aggregation.[2]
The second problem involves the three single points of failure
in a typical port-cable-port connection. In either the usual
computer-to-switch or in a switch-to-switch configuration, the cable
itself or either of the ports the cable is plugged into can fail.
Multiple physical connections can be made, but many of the higher level protocols were not designed to failover completely seamlessly.
On 802.11 (Wi-Fi) channel bonding is used in "Super G" technology, also referred as 108Mbit/s. It bonds two channels of classic 802.11g, which has 54Mbit/s signaling rate.