Open MPI logo

Portable Hardware Locality (hwloc) Documentation: v2.11.1

  |   Home   |   Support   |   FAQ   |  
Distributing items over a topology

Enumerations

enum  hwloc_distrib_flags_e { HWLOC_DISTRIB_FLAG_REVERSE }
 

Functions

static int hwloc_distrib (hwloc_topology_t topology, hwloc_obj_t *roots, unsigned n_roots, hwloc_cpuset_t *set, unsigned n, int until, unsigned long flags)
 

Detailed Description

Enumeration Type Documentation

◆ hwloc_distrib_flags_e

Flags to be given to hwloc_distrib().

Enumerator
HWLOC_DISTRIB_FLAG_REVERSE 

Distrib in reverse order, starting from the last objects.

Function Documentation

◆ hwloc_distrib()

static int hwloc_distrib ( hwloc_topology_t  topology,
hwloc_obj_t roots,
unsigned  n_roots,
hwloc_cpuset_t set,
unsigned  n,
int  until,
unsigned long  flags 
)
inlinestatic

Distribute n items over the topology under roots.

Array set will be filled with n cpusets recursively distributed linearly over the topology under objects roots, down to depth until (which can be INT_MAX to distribute down to the finest level).

n_roots is usually 1 and roots only contains the topology root object so as to distribute over the entire topology.

This is typically useful when an application wants to distribute n threads over a machine, giving each of them as much private cache as possible and keeping them locally in number order.

The caller may typically want to also call hwloc_bitmap_singlify() before binding a thread so that it does not move at all.

flags should be 0 or a OR'ed set of hwloc_distrib_flags_e.

Returns
0 on success, -1 on error.
Note
On hybrid CPUs (or asymmetric platforms), distribution may be suboptimal since the number of cores or PUs inside packages or below caches may vary (the top-down recursive partitioning ignores these numbers until reaching their levels). Hence it is recommended to distribute only inside a single homogeneous domain. For instance on a CPU with energy-efficient E-cores and high-performance P-cores, one should distribute separately N tasks on E-cores and M tasks on P-cores instead of trying to distribute directly M+N tasks on the entire CPUs.
This function requires the roots objects to have a CPU set.