HOW PLAIN (NON-PUMP) DISTCC WORKS
distcc only ever runs the compiler and assembler remotely. With plain distcc, the preprocessor must always run locally because it needs to access various header files on the local machine which may not be present, or may not be the same, on the volunteer. The linker similarly needs to examine libraries and object files, and so must run locally.
The compiler and assembler take only a single input file (the preprocessed source) and produce a single output (the object file). distcc ships these two files across the network and can therefore run the compiler/assembler remotely.
Fortunately, for most programs running the preprocessor is relatively cheap, and the linker is called relatively infrequent, so most of the work can be distributed.
distcc examines its command line to determine which of these phases are being invoked, and whether the job can be distributed.
HOW DISTCC-PUMP MODE WORKS
In pump mode, distcc runs the preprocessor remotely too. To do so, the preprocessor must have access to all the files that it would have accessed if had been running locally. In pump mode, therefore, distcc gathers all of the recursively included headers, except the ones that are default system headers, and sends them along with the source file to the compilation server.
In distcc-pump mode, the server unpacks the set of all source files in a temporary directory, which contains a directory tree that mirrors the part of the file system that is relevant to preprocessing, including symbolic links.
The compiler is then run from the path in the temporary directory that corresponds to the current working directory on the client. To find and transmit the many hundreds of files that are often part of a single compilation, pump mode uses an incremental include analysis algorithm. The include server is a Python program that implements this algorithm. The pump command starts the include server so that throughout the build it can answer include queries by distcc commands.
The include server uses static analysis of the macro language to deal with conditional compilation and computed includes. It uses the property that when a given header file has already been analyzed for includes, it is not necessary to do so again if all the include options (-I's) are unchanged (along with other conditions).
For large builds, header files are included, on average, hundreds of times each. With distcc-pump mode each such file is analyzed only a few times, perhaps just once, instead of being preprocessed hundreds of times. Also, each source or header file is now compressed only once, because the include server memoizes the compressed files. As a result, the time used for preparing compilations may drop by up to an order of magnitude over the preprocessing of plain distcc.
Because distcc in pump mode is able to push out files up to about ten times faster, build speed may increase 3X or more for large builds compared to plain distcc mode.
RESTRICTIONS FOR PUMP MODE
Using pump mode requires both client and servers to use release 3.0 or later of distcc and distccd (respectively).
The incremental include analysis of distc-pump mode rests on the fundamental assumption that source and header files do not change during the build process. A few complex build systems, such as that for Linux kernel 2.6, do not quite satisfy this requirement. To overcome such issues, and other corner cases such as absolute filepaths in includes, see the include_server(1) man page.
Another important assumption is that the include configuration of all machines must be identical. Thus the headers under the default system path must be the same on all servers and all clients. If a standard GNU compiler installation is used, then this requirement applies to all libraries whose header files are installed under /usr/include or /usr/local/include/. Note that installing software packages often lead to additional headers files being placed in subdirectories of either.
If this assumption does not hold, then it is possible to break builds with distcc-pump mode, or worse, to get wrong results without warning. Presently this condition is not verified, and it is on our TODO list to address this issue.
An easy way to guarantee that the include configurations are identical is to use a cross-compiler that defines a default system search path restricted to directories of the compiler installation.
See the include_server(1) manual for more information on symptoms and causes of violations of distcc-pump mode assumptions.
USING DISTCC WITH CCACHE
ccache is a program that speeds software builds by caching the results of compilations. ccache is normally called before distcc, so that results are retrieved from a normal cache. Some experimentation may be required for idiosyncratic makefiles to make everything work together.
The most reliable method is to set
CCACHE_PREFIX="distcc"
This tells ccache to run distcc as a wrapper around the real compiler. ccache still uses the real compiler to detect compiler upgrades.
ccache can then be run using either a masquerade directory or by setting
CC="ccache gcc"
As of version 2.2, ccache does not cache compilation from preprocessed source and so will never get a cache hit if it is run from distccd or distcc. It must be run only on the client side and before distcc to be any use.
distcc's pump mode is not compatible with ccache.
[링크 : http://distcc.googlecode.com/svn/trunk/doc/web/man/distcc_1.html] |