Hwloc tutorial

Welcome to the hwloc tutorial! The presentation slides which go along this tutorial are available.

Installation | Tools | API


Part 0: installation

hwloc is already available pre-compiled in a lot of Linux distribution, otherwise, windows binaries and the source code can be downloaded from the open-mpi hwloc project website and the installation is as usual with free software:

./configure
... check at the end that the summary shows the features you want to see enabled. For this tutorial PCI support will be useful.
make
sudo make install

If you do not have administration rights for the make install part, you can pass e.g. --prefix=$HOME/install to ./configure, run make install without sudo and you will need to set the following variables in your work shell:

export PATH=$PATH:$HOME/install/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/install/lib
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HOME/install/lib/pkgconfig
export MANPATH=$MANPATH:$HOME/install/share/man

You can also go through Part 1 without installing hwloc: simply run the tools from ./utils/


Part 1: command-line tools

lstopo

lstopo renders the topology of the machine, as discovered by hwloc. It is a very intuitive way to show what is in there.

There are two main modes: textual rendering, and graphical rendering

hwloc-bind

hwloc-bind permits to bind a processus to a given CPU set. For instance,

hwloc-bind core:1 -- sh

will start a new shell, bound to logical core 1.

lstopo --ps

conveniently shows the bound processes inside the lstopo output.

hwloc-bind --pid 1234 core:2

will bind an existing process (with pid 1234) to logical core 1. Details on the specification possibilities are available in man hwloc-bind

Another way to observe the binding is to bind lstopo itself:

hwloc-bind core:1 -- lstopo --pid 0

With the --pid 0 option, lstopo shows in green the set of processors it is bound to. This permits to easily check the understanding of the cpu set specification.

hwloc-calc

This tool takes the same input as hwloc-bind, but shows the resulting cpuset instead of binding a process. This can be used to make advanced cpuset computations.

hwloc-assembler

This tool permits to build network topologies, try for instance

lstopo out.xml
hwloc-assembler out2.xml out.xml out.xml
lstopo --input out2.xml

This builds a network of two machines like yours.


Part 2: API

My first hwloc program

This is a very simple hwloc example (to be saved as mytest.c):

#include <hwloc.h>
#include <stdio.h>

int main(void) {
  hwloc_topology_t topology;
  int nbcores;
  
  hwloc_topology_init(&topology);  // initialization
  hwloc_topology_load(topology);   // actual detection
  
  nbcores = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_CORE);
  printf("%d cores\n", nbcores);

  hwloc_topology_destroy(topology);

  return 0;
}

It is essentially the same as hwloc-calc --number-of-core all . To compile it, the simple way is to use gcc mytest.c -o mytest -lhwloc, but depending on the installation it may not work. The preferred way is:

gcc mytest.c -o mytest $(pkg-config --cflags hwloc) $(pkg-config --libs hwloc)

Or better, using the following Makefile:

CFLAGS += $(shell pkg-config --cflags hwloc)
LDLIBS += $(shell pkg-config --libs hwloc)

all: mytest

and simply running make. If pkg-config does not find hwloc.pc, make sure you have set PKG_CONFIG_PATH as described in part 0.

Check that it runs fine. If it does not find libhwloc.so, make sure you have set LD_LIBRARY_PATH as described in part 0.

Traversals

For the following exercices, it will be useful to have the manpages under the hand. Make sure for instance that man hwloc_obj works, if not make sure you have set MANPATH as described in part 0.

Binding

Once the target object is found, binding to it is very easy:

hwloc_set_cpubind(t, obj->cpuset, 0)

to bind the process (assumed to be single-threaded), or

hwloc_set_cpubind(t, obj->cpuset, HWLOC_CPUBIND_THREAD)

to bind only the current thread, or

hwloc_set_cpubind(t, obj->cpuset, HWLOC_CPUBIND_PROCESS)

to bind the whole process (which can be multithreaded).