8. Reducing graph compilation time
If you load a different memory allocation implementation (using LD_PRELOAD) then you can speed up graph compilation times.
LD_PRELOAD
is a Linux environment variable that changes dynamic linking behaviour at runtime. This means that you can force the dynamic linker to search for symbols in libraries passed in via LD_PRELOAD
before it searches for them in its normal paths. This can change the behaviour of any program that is dynamically linked (in other words any program that uses external shared libraries). LD_PRELOAD
will not affect statically linked libraries.
This section describes how to use LD_PRELOAD
to override the default memory allocation (malloc
) library used by your program. Using this may speed up compilation time.
Note
LD_PRELOAD
will not speed up the compilation of a program that does not use malloc
.
8.1. Finding malloc implementation in use
Note
You can skip this step if your program is designed to be used on the IPU, as the malloc
implementation that will be used by default is tbbmalloc
(Section 8.3.1, tbbmalloc).
To check which malloc
implementation your program is using, run:
$ strace python program.py |& grep malloc
You should replace python program.py
with the way your program is ran. If you are using other environment variables to run your program, use the -E
option:
$ strace -E OPTION python program.py |& grep malloc
The output of the strace
command will be similar to:
openat(AT_FDCWD, "/localdata/joesoap/sdk/poplar_sdk-ubuntu_20_04-3.0.0+1130-1a6d8f00d7/poplar-ubuntu_20_04-3.0.0+5476-8a37a205bb/lib/libtbbmalloc_proxy.so", O_RDONLY|O_CLOEXEC) = 4
This output contains a path that ends with an .so
file, which is the path of the malloc
implementation used when the script was run. In this particular case, this file is libtbbmalloc_proxy.so
, which means that the malloc
implementation used to run program.py
is tbbmalloc
(Section 8.3.1, tbbmalloc).
Note
The output of the strace
command will be empty if your program is not using a malloc
implementation.
8.2. Using LD_PRELOAD to change the malloc implementation
LD_PRELOAD
can be used as an environment variable by placing it in front of the program you wish to run and by passing the path to the .so
file for the new malloc
implementation:
$ LD_PRELOAD=/path/to/malloc.so python program.py
You can use strace
with the -E
option to check if your malloc
implementation was correctly preloaded:
$ strace -E LD_PRELOAD=/path/to/malloc.so python program.py |& grep malloc
The output should now contain the path to the new malloc
implementation.
8.3. Different malloc implementations
This section describes three examples of malloc
implementations that you can use and how to install them. There are other implementations available.
Note
Before following the installation instructions, make sure the malloc
implementation you want is not already on your system. To do so, run the following command:
$ locate MALLOC_IMPLEMENTATION.so
where MALLOC_IMPLEMENTATION.so is the specific malloc
library you wish to install.
If there is no output, the implementation is not on your system and you can continue with installation.
8.3.1. tbbmalloc
Threading building blocks (tbbmalloc
) is the default memory allocation implementation used by Poplar programs and is included in the Poplar SDK in:
/path/to/poplar_sdk/poplar_<version>/lib/libtbbmalloc_proxy.so
8.3.2. jemalloc
JEmalloc is a general purpose malloc
implementation that emphasises fragmentation avoidance and scalable concurrency support.
To get jemalloc
:
$ git clone https://github.com/jemalloc/jemalloc.git
$ cd jemalloc
$ ./autogen.sh
$ make
The jemalloc
library file is /installation/path/jemalloc/lib/libjemalloc.so
.
8.3.3. tcmalloc
TCMalloc is a fast, multi-threaded malloc
implementation made by Google, which customises the implementation of the C malloc()
function and the C++ new
operator.
To install tcmalloc
:
$ sudo apt-get install google-perftools
$ locate libtcmalloc.so
This will output the path to libtcmalloc.so
.