8. Reducing graph compilation time

If you load a different memory allocation implementation (using LD_PRELOAD) then you can speed up graph compilation times.

LD_PRELOAD is a Linux environment variable that changes dynamic linking behaviour at runtime. This means that you can force the dynamic linker to search for symbols in libraries passed in via LD_PRELOAD before it searches for them in its normal paths. This can change the behaviour of any program that is dynamically linked (in other words any program that uses external shared libraries). LD_PRELOAD will not affect statically linked libraries.

This section describes how to use LD_PRELOAD to override the default memory allocation (malloc) library used by your program. Using this may speed up compilation time.

Note

LD_PRELOAD will not speed up the compilation of a program that does not use malloc.

8.1. Finding malloc implementation in use

Note

You can skip this step if your program is designed to be used on the IPU, as the malloc implementation that will be used by default is tbbmalloc (Section 8.3.1, tbbmalloc).

To check which malloc implementation your program is using, run:

$ strace python program.py |& grep malloc

You should replace python program.py with the way your program is ran. If you are using other environment variables to run your program, use the -E option:

$ strace -E OPTION python program.py |& grep malloc

The output of the strace command will be similar to:

openat(AT_FDCWD, "/localdata/joesoap/sdk/poplar_sdk-ubuntu_20_04-3.0.0+1130-1a6d8f00d7/poplar-ubuntu_20_04-3.0.0+5476-8a37a205bb/lib/libtbbmalloc_proxy.so", O_RDONLY|O_CLOEXEC) = 4

This output contains a path that ends with an .so file, which is the path of the malloc implementation used when the script was run. In this particular case, this file is libtbbmalloc_proxy.so, which means that the malloc implementation used to run program.py is tbbmalloc (Section 8.3.1, tbbmalloc).

Note

The output of the strace command will be empty if your program is not using a malloc implementation.

8.2. Using LD_PRELOAD to change the malloc implementation

LD_PRELOAD can be used as an environment variable by placing it in front of the program you wish to run and by passing the path to the .so file for the new malloc implementation:

$ LD_PRELOAD=/path/to/malloc.so python program.py

You can use strace with the -E option to check if your malloc implementation was correctly preloaded:

$ strace -E LD_PRELOAD=/path/to/malloc.so python program.py |& grep malloc

The output should now contain the path to the new malloc implementation.

8.3. Different malloc implementations

This section describes three examples of malloc implementations that you can use and how to install them. There are other implementations available.

Note

Before following the installation instructions, make sure the malloc implementation you want is not already on your system. To do so, run the following command:

$ locate MALLOC_IMPLEMENTATION.so

where MALLOC_IMPLEMENTATION.so is the specific malloc library you wish to install.

If there is no output, the implementation is not on your system and you can continue with installation.

8.3.1. tbbmalloc

Threading building blocks (tbbmalloc) is the default memory allocation implementation used by Poplar programs and is included in the Poplar SDK in:

/path/to/poplar_sdk/poplar_<version>/lib/libtbbmalloc_proxy.so

8.3.2. jemalloc

JEmalloc is a general purpose malloc implementation that emphasises fragmentation avoidance and scalable concurrency support.

To get jemalloc:

$ git clone https://github.com/jemalloc/jemalloc.git
$ cd jemalloc
$ ./autogen.sh
$ make

The jemalloc library file is /installation/path/jemalloc/lib/libjemalloc.so.

8.3.3. tcmalloc

TCMalloc is a fast, multi-threaded malloc implementation made by Google, which customises the implementation of the C malloc() function and the C++ new operator.

To install tcmalloc:

$ sudo apt-get install google-perftools
$ locate libtcmalloc.so

This will output the path to libtcmalloc.so.