Open MPI logo

Portable Hardware Locality (hwloc) Documentation: v2.0.4

  |   Home   |   Support   |   FAQ   |  
Upgrading to the hwloc 2.0 API

See How do I handle ABI breaks and API upgrades? for detecting the hwloc version that you are compiling and/or running against.

New Organization of NUMA nodes and Memory

Memory children

In hwloc v1.x, NUMA nodes were inside the tree, for instance Packages contained 2 NUMA nodes which contained a L3 and several cache.

Starting with hwloc v2.0, NUMA nodes are not in the main tree anymore. They are attached under objects as Memory Children on the side of normal children. This memory children list starts at obj->memory_first_child and its size is obj->memory_arity. Hence there can now exist two local NUMA nodes, for instance on Intel Xeon Phi processors.

The normal list of children (starting at obj->first_child, ending at obj->last_child, of size obj->arity, and available as the array obj->children) now only contains CPU-side objects: PUs, Cores, Packages, Caches, Groups, Machine and System. hwloc_get_next_child() may still be used to iterate over all children of all lists.

Hence the CPU-side hierarchy is built using normal children, while memory is attached to that hierarchy depending on its affinity.

Examples

  • a UMA machine with 2 packages and a single NUMA node is now modeled as a "Machine" object with two "Package" children and one "NUMANode" memory children (displayed first in lstopo below):

    Machine (1024MB total)
      NUMANode L#0 (P#0 1024MB)
      Package L#0
        Core L#0 + PU L#0 (P#0)
        Core L#1 + PU L#1 (P#1)
      Package L#1
        Core L#2 + PU L#2 (P#2)
        Core L#3 + PU L#3 (P#3)
    

  • a machine with 2 packages with one NUMA node and 2 cores in each is now:

    Machine (2048MB total)
      Package L#0
        NUMANode L#0 (P#0 1024MB)
        Core L#0 + PU L#0 (P#0)
        Core L#1 + PU L#1 (P#1)
      Package L#1
        NUMANode L#1 (P#1 1024MB)
        Core L#2 + PU L#2 (P#2)
        Core L#3 + PU L#3 (P#3)
    

  • if there are two NUMA nodes per package, a Group object may be added to keep cores together with their local NUMA node:

    Machine (4096MB total)
      Package L#0
        Group0 L#0
          NUMANode L#0 (P#0 1024MB)
          Core L#0 + PU L#0 (P#0)
          Core L#1 + PU L#1 (P#1)
        Group0 L#1
          NUMANode L#1 (P#1 1024MB)
          Core L#2 + PU L#2 (P#2)
          Core L#3 + PU L#3 (P#3)
      Package L#1
        [...]
    

  • if the platform has L3 caches whose localities are identical to NUMA nodes, Groups aren't needed:
    Machine (4096MB total)
      Package L#0
        L3 L#0 (16MB)
          NUMANode L#0 (P#0 1024MB)
          Core L#0 + PU L#0 (P#0)
          Core L#1 + PU L#1 (P#1)
        L3 L#1 (16MB)
          NUMANode L#1 (P#1 1024MB)
          Core L#2 + PU L#2 (P#2)
          Core L#3 + PU L#3 (P#3)
      Package L#1
        [...]
    

NUMA level and depth

NUMA nodes are not in "main" tree of normal objects anymore. Hence, they don't have a meaningful depth anymore (like I/O and Misc objects). They have a virtual (negative) depth (HWLOC_TYPE_DEPTH_NUMANODE) so that functions manipulating depths and level still work, and so that we can still iterate over the level of NUMA nodes just like for any other level.

For instance we can still use lines such as

int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE);
hwloc_obj_t obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NUMANODE, 4);
hwloc_obj_t node = hwloc_get_next_obj_by_depth(topology, HWLOC_TYPE_DEPTH_NUMANODE, prev);

The NUMA depth should not be compared with others. An unmodified code that still compares NUMA and Package depths (to find out whether Packages contain NUMA or the contrary) would now always assume Packages contain NUMA (because the NUMA depth is negative).

However, the depth of the Normal parents of NUMA nodes may be used instead. In the last example above, NUMA nodes are attached to L3 caches, hence one may compare the depth of Packages and L3 to find out that NUMA nodes are contained in Packages. This depth of parents may be retrieved with hwloc_get_memory_parents_depth(). However, this function may return HWLOC_TYPE_DEPTH_MULTIPLE on future platforms if NUMA nodes are attached to different levels.

Finding Local NUMA nodes and looking at Children and Parents

Applications that walked up/down to find NUMANode parent/children must now be updated. Instead of looking directly for a NUMA node, one should now look for an object that has some memory children. NUMA node(s) will be be attached there. For instance, when looking for a NUMA node above a given core core:

hwloc_obj_t parent = core->parent;
while (parent && !parent->memory_arity)
  parent = parent->parent; /* no memory child, walk up */
if (parent)
  /* use parent->memory_first_child (and its siblings if there are multiple local NUMA nodes) */

The list of local NUMA nodes (usually a single one) is also described by the nodeset attribute of each object (which contains the physical indexes of these nodes). Iterating over the NUMA level is also an easy way to find local NUMA nodes:

hwloc_obj_t tmp = NULL;
while ((tmp = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_NUMANODE, tmp)) != NULL) {
  if (hwloc_bitmap_isset(obj->nodeset, tmp->os_index))
    /* tmp is a NUMA node local to obj, use it */
}

Similarly finding objects that are close to a given NUMA nodes should be updated too. Instead of looking at the NUMA node parents/children, one should now find a Normal parent above that NUMA node, and then look at its parents/children as usual:

hwloc_obj_t tmp = obj->parent;
while (hwloc_obj_type_is_memory(tmp))
  tmp = tmp->parent;
/* now use tmp instead of obj */

To avoid such hwloc v2.x-specific and NUMA-specific cases in the code, a generic lookup for any kind of object, including NUMA nodes, might also be implemented by iterating over a level. For instance finding an object of type type which either contains or is included in object obj can be performed by traversing the level of that type and comparing CPU sets:

hwloc_obj_t tmp = NULL;
while ((tmp = hwloc_get_next_obj_by_type(topology, type, tmp)) != NULL) {
  if (hwloc_bitmap_intersects(tmp->cpuset, obj->cpuset))
    /* tmp matches, use it */
}

This generic lookup works whenever type or obj are Normal or Memory objects since both have CPU sets. Moreover, it is compatible with the hwloc v1.x API.

4 Kinds of Objects and Children

I/O and Misc children

I/O children are not in the main object children list anymore either. They are in the list starting at obj->io_first_child and whose size if obj->io_arity.

Misc children are not in the main object children list anymore. They are in the list starting at obj->misc_first_child nd whose size if obj->misc_arity.

See hwloc_obj for details about children lists.

hwloc_get_next_child() may still be used to iterate over all children of all lists.

Kinds of objects

Given the above, objects may now be of 4 kinds:

  • Normal (everything not listed below, including Machine, Package, Core, PU, CPU Caches, etc);
  • Memory (currently only NUMA nodes), attached to parents as Memory children;
  • I/O (Bridges, PCI and OS devices), attached to parents as I/O children;
  • Misc objects, attached to parents as Misc children.

See hwloc_obj for details about children lists.

For a given object type, the kind may be found with hwloc_obj_type_is_normal(), hwloc_obj_type_is_memory(), hwloc_obj_type_is_normal(), or comparing with HWLOC_OBJ_MISC.

Normal and Memory objects have (non-NULL) CPU sets and nodesets, while I/O and Misc objects don't have any sets (they are NULL).

HWLOC_OBJ_CACHE replaced

Instead of a single HWLOC_OBJ_CACHE, there are now 8 types HWLOC_OBJ_L1CACHE, ..., HWLOC_OBJ_L5CACHE, HWLOC_OBJ_L1ICACHE, ..., HWLOC_OBJ_L3ICACHE.

Cache object attributes are unchanged.

hwloc_get_cache_type_depth() is not needed to disambiguate cache types anymore since new types can be passed to hwloc_get_type_depth() without ever getting HWLOC_TYPE_DEPTH_MULTIPLE anymore.

hwloc_obj_type_is_cache(), hwloc_obj_type_is_dcache() and hwloc_obj_type_is_icache() may be used to check whether a given type is a cache, data/unified cache or instruction cache.

allowed_cpuset and allowed_nodeset only in the main topology

Objects do not have allowed_cpuset and allowed_nodeset anymore. They are only available for the entire topology using hwloc_topology_get_allowed_cpuset() and hwloc_topology_get_allowed_nodeset().

As usual, those are only needed when the WHOLE_SYSTEM topology flag is given, which means disallowed objects are kept in the topology. If so, one may find out whether some PUs inside an object is allowed by checking

hwloc_bitmap_intersects(obj->cpuset, hwloc_topology_get_allowed_cpuset(topology))

Replace cpusets with nodesets for NUMA nodes. To find out which ones, replace intersects() with and() to get the actual intersection.

Object depths are now signed int

obj->depth as well as depths given to functions such as hwloc_get_obj_by_depth() or returned by hwloc_topology_get_depth() are now signed int.

Other depth such as cache-specific depth attribute are still unsigned.

Memory attributes become NUMANode-specific

Memory attributes such as obj->memory.local_memory are now only available in NUMANode-specific attributes in obj->attr->numanode.local_memory.

obj->memory.total_memory is available in all objects as obj->total_memory.

See hwloc_obj_attr_u::hwloc_numanode_attr_s and hwloc_obj for details.

Topology configuration changes

The old ignoring API as well as several configuration flags are replaced with the new filtering API, see hwloc_topology_set_type_filter() and its variants, and hwloc_type_filter_e for details.

  • hwloc_topology_ignore_type(), hwloc_topology_ignore_type_keep_structure() and hwloc_topology_ignore_all_keep_structure() are respectively superseded by

    hwloc_topology_set_type_filter(topology, type, HWLOC_TYPE_FILTER_KEEP_NONE);
    hwloc_topology_set_type_filter(topology, type, HWLOC_TYPE_FILTER_KEEP_STRUCTURE);
    hwloc_topology_set_all_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_STRUCTURE);
    

    Also, the meaning of KEEP_STRUCTURE has changed (only entire levels may be ignored, instead of single objects), the old behavior is not available anymore.

  • HWLOC_TOPOLOGY_FLAG_ICACHES is superseded by

    hwloc_topology_set_icache_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_ALL);
    

  • HWLOC_TOPOLOGY_FLAG_WHOLE_IO, HWLOC_TOPOLOGY_FLAG_IO_DEVICES and HWLOC_TOPOLOGY_FLAG_IO_BRIDGES replaced.

    To keep all I/O devices (PCI, Bridges, and OS devices), use:

    hwloc_topology_set_io_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_ALL);
    

    To only keep important devices (Bridges with children, common PCI devices and OS devices):

    hwloc_topology_set_io_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_IMPORTANT);
    

XML changes

2.0 XML files are not compatible with 1.x

2.0 can load 1.x files, but only NUMA distances are imported. Other distance matrices are ignored (they were never used by default anyway).

2.0 can export 1.x-compatible files, but only distances attached to the root object are exported (i.e. distances that cover the entire machine). Other distance matrices are dropped (they were never used by default anyway).

Users are advised to negociate hwloc versions between exporter and importer: If the importer isn't 2.x, the exporter should export to 1.x. Otherwise, things should work by default.

Hence hwloc_topology_export_xml() and hwloc_topology_export_xmlbuffer() have a new flags argument. to force a hwloc-1.x-compatible XML export.

  • If both always support 2.0, don't pass any flag.
  • When the importer uses hwloc 1.x, export with HWLOC_TOPOLOGY_EXPORT_XML_FLAG_V1. Otherwise the importer will fail to import.
  • When the exporter uses hwloc 1.x, it cannot pass any flag, and a 2.0 importer can import without problem.
#if HWLOC_API_VERSION >= 0x20000
   if (need 1.x compatible XML export)
      hwloc_topology_export_xml(...., HWLOC_TOPOLOGY_EXPORT_XML_FLAG_V1);
   else /* need 2.x compatible XML export */
      hwloc_topology_export_xml(...., 0);
#else
   hwloc_topology_export_xml(....);
#endif

Additionally, hwloc_topology_diff_load_xml(), hwloc_topology_diff_load_xmlbuffer(), hwloc_topology_diff_export_xml(), hwloc_topology_diff_export_xmlbuffer() and hwloc_topology_diff_destroy() lost the topology argument: The first argument (topology) isn't needed anymore.

Distances API totally rewritten

The new distances API is in hwloc/distances.h.

Distances are not accessible directly from objects anymore. One should first call hwloc_distances_get() (or a variant) to retrieve distances (possibly with one call to get the number of available distances structures, and another call to actually get them). Then it may consult these structures, and finally release them.

The set of object involved in a distances structure is specified by an array of objects, it may not always cover the entire machine or so.

Return values of functions

Bitmap functions (and a couple other functions) can return errors (in theory).

Most bitmap functions may have to reallocate the internal bitmap storage. In v1.x, they would silently crash if realloc failed. In v2.0, they now return an int that can be negative on error. However, the preallocated storage is 512 bits, hence realloc will not even be used unless you run hwloc on machines with larger PU or NUMAnode indexes.

hwloc_obj_add_info(), hwloc_cpuset_from_nodeset() and hwloc_cpuset_from_nodeset() also return an int, which would be -1 in case of allocation errors.

Misc API changes

API removals and deprecations

  • HWLOC_OBJ_SYSTEM removed: The root object is always HWLOC_OBJ_MACHINE

  • _membind_nodeset() memory binding interfaces deprecated: One should use the variant without _nodeset suffix and pass the HWLOC_MEMBIND_BYNODESET flag.

  • HWLOC_MEMBIND_REPLICATE removed: no supported operating system supports it anymore.

  • hwloc_obj_snprintf() removed because it was long-deprecated by hwloc_obj_type_snprintf() and hwloc_obj_attr_snprintf().

  • hwloc_obj_type_sscanf() deprecated, hwloc_obj_type_of_string() removed.

  • hwloc_cpuset_from/to_nodeset_strict() deprecated: Now useless since all topologies are NUMA. Use the variant without the _strict suffix

  • hwloc_distribute() and hwloc_distributev() removed, deprecated by hwloc_distrib().

  • The Custom interface (hwloc_topology_set_custom(), etc.) was removed, as well as the corresponding command-line tools (hwloc-assembler, etc.). Topologies always start with object with valid cpusets and nodesets.

  • obj->online_cpuset removed: Offline PUs are simply listed in the complete_cpuset as previously.

  • obj->os_level removed.