This FAQ is for Open MPI v4.x and earlier.
If you are looking for documentation for Open MPI v5.x and later, please visit docs.open-mpi.org.
Table of contents:
- What versions of Open MPI contain support for uDAPL?
- What is different between Sun Microsystems ClusterTools 7 and Open
MPI in regards to the uDAPL BTL?
- What values are expected to be used by the
btl_udapl_if_include and btl_udapl_if_exclude MCA parameters?
- Where is the static uDAPL Registry found?
- How come the value reported by
ifconfig is not accepted by the btl_udapl_if_include /btl_udapl_if_exclude MCA parameter?
- I get a warning message about not being able to register memory and possibly out of privileged memory while running on Solaris; what can I do?
1. What versions of Open MPI contain support for uDAPL? |
The following versions of Open MPI contain support for uDAPL:
Open MPI series |
uDAPL supported |
v1.0 series |
No |
v1.1 series |
No |
v1.2 series |
Yes |
v1.3 / v1.4 series |
Yes |
v1.5 / v1.6 series |
Yes |
v1.7 and beyond |
No |
2. What is different between Sun Microsystems ClusterTools 7 and Open
MPI in regards to the uDAPL BTL? |
Sun's ClusterTools is based off of Open MPI with one significant
difference: Sun's ClusterTools includes uDAPL RDMA capabilities in the
uDAPL BTL. Open MPI v1.2 uDAPL BTL does not include the RDMA
capabilities. These improvements do exist today in the Open MPI main
and will be included in future Open MPI releases.
3. What values are expected to be used by the btl_udapl_if_include and btl_udapl_if_exclude MCA parameters? |
The uDAPL BTL looks for a match from the uDAPL static registry which is contained in the dat.conf file. Each non commented or blank line is considered an interface. The first field of each interface entry is the value which must be supplied to the MCA parameter in question.
Solaris Example:
1
2
3
| shell% datadm -v
ibd0 u1.2 nonthreadsafe default udapl_tavor.so.1 SUNW.1.0 " " "driver_name=tavor"
shell% mpirun --mca btl_udapl_if_include ibd0 ... |
Linux Example:
1
2
3
4
| shell% cat /etc/dat.conf
OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "ib0 0" ""
OpenIB-bond u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "bond0 0" ""
shell% mpirun --mca btl_udapl_if_exclude OpenIB-bond ... |
4. Where is the static uDAPL Registry found? |
Solaris: /etc/dat/dat.conf
Linux: /etc/dat.conf
5. How come the value reported by ifconfig is not accepted by the btl_udapl_if_include /btl_udapl_if_exclude MCA parameter? |
uDAPL queries a static registry defined in the dat.conf file to find available interfaces which can be used. As such, the uDAPL BTL needs to match the names found in the registry and these may differ from what is reported by ifconfig .
6. I get a warning message about not being able to register memory and possibly out of privileged memory while running on Solaris; what can I do? |
The error message probably looks something like this:
1
2
3
| WARNING: The uDAPL BTL is not able to register memory. Possibly out of
allowed privileged memory (i.e. memory that can be pinned). Increasing
the allowed privileged memory may alleviate this issue. |
One thing to do is increase the amount of available privileged
memory. On Solaris your system adminstrator can increase the amount of
available privileged memory by editing the /etc/project file on the
nodes. For more information see the Solaris project man page.
As an example of increasing the privileged memory, first determine the
amount available (example of typical value is 978 MB):
1
2
3
4
5
| shell% prctl -n project.max-device-locked-memory -i project default
NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT
project.max-device-locked-memory
privileged 978MB - deny -
system 16.0EB max deny - |
To increase the amount of privileged memory, edit the /etc/project file:
Default /etc/project file.
1
2
3
4
5
| system:0::::
user.root:1::::
noproject:2::::
default:3::::
group.staff:10:::: |
Change to, for example, 4 GB.
1
2
3
4
5
| system:0::::
user.root:1::::
noproject:2::::
default:3::::project.max-device-locked-memory=(priv, 4294967296, deny)
group.staff:10:::: |
|