This FAQ is for Open MPI v4.x and earlier.
If you are looking for documentation for Open MPI v5.x and later, please visit docs.open-mpi.org.
Table of contents:
- What Myrinet-based components does Open MPI have?
- How do I specify to use the Myrinet GM network for MPI messages?
- How do I specify to use the Myrinet MX network for MPI messages?
- But wait — I also have a TCP network. Do I need to explicitly
disable the TCP BTL?
- How do I know what MCA parameters are available for tuning MPI performance?
- I'm experiencing a problem with Open MPI on my Myrinet-based network; how do I troubleshoot and get help?
- How do I adjust the MX first fragment size? Are there constraints?
1. What Myrinet-based components does Open MPI have? |
Some versions of Open MPI support both GM and MX for MPI
communications.
Open MPI series |
GM supported |
MX supported |
v1.0 series |
Yes |
Yes |
v1.1 series |
Yes |
Yes |
v1.2 series |
Yes |
Yes (BTL and MTL) |
v1.3 / v1.4 series |
Yes |
Yes (BTL and MTL) |
v1.5 / v1.6 series |
No |
Yes (BTL and MTL) |
v1.7 / v1.8 series |
No |
Yes (MTL only) |
v1.10 and beyond |
No |
No |
2. How do I specify to use the Myrinet GM network for MPI messages? |
In general, you specify that the gm BTL component should be used.
However, note that you should also specify that the self BTL component
should be used. self is for loopback communication (i.e., when an MPI
process sends to itself). This is technically a different
communication channel than Myrinet. For example:
1
| shell$ mpirun --mca btl gm,self ... |
Failure to specify the self BTL may result in Open MPI being unable
to complete send-to-self scenarios (meaning that your program will run
fine until a process tries to send to itself).
To use Open MPI's shared memory support for on-host communication
instead of GM's shared memory support, simply include the sm BTL.
For example:
1
| shell$ mpirun --mca btl gm,sm,self ... |
Finally, note that if the gm component is
available at run time, Open MPI should automatically use it by
default (ditto for self and sm ). Hence, it's usually unnecessary to
specify these options on the mpirun command line. They are
typically only used when you want to be absolutely positively
definitely sure to use the specific BTL.
3. How do I specify to use the Myrinet MX network for MPI messages? |
As of version 1.2, Open MPI has two different components
to support Myrinet MX, the mx BTL and the mx MTL, only one of which can be
used at a time. Prior versions only have the mx BTL.
If available, the mx BTL is used by default. However, to be sure it is
selected you can specify it. Note that you should also specify the
self BTL component (for loopback communication) and the sm BTL
component (for on-host communication). For example:
1
| shell$ mpirun --mca btl mx,sm,self ... |
To use the mx MTL component, it must be specified. Also, you must use
the cm PML component. For example:
1
| shell$ mpirun --mca mtl mx --mca pml cm ... |
Note that one cannot use both the mx MTL and the mx BTL components
at once. Deciding which to use largely depends on the application being
run.
4. But wait — I also have a TCP network. Do I need to explicitly
disable the TCP BTL? |
No. See this FAQ entry for more details.
5. How do I know what MCA parameters are available for tuning MPI performance? |
The ompi_info command can display all the parameters
available for the gm and mx BTL components and the mx MTL component:
1
2
3
4
5
6
7
8
| # Show the gm BTL parameters
shell$ ompi_info <font color=red><strong>--param btl gm</strong></font>
# Show the mx BTL parameters
shell$ ompi_info <font color=red><strong>--param btl mx</strong></font>
# Show the mx MTL parameters
shell$ ompi_info <font color=red><strong>--param mtl mx</strong></font> |
6. I'm experiencing a problem with Open MPI on my Myrinet-based network; how do I troubleshoot and get help? |
In order for us to help you, it is most helpful if you can
run a few steps before sending an e-mail to both perform some basic
troubleshooting and provide us with enough information about your
environment to help you. Please include answers to the following
questions in your e-mail:
- Which Myricom software stack are you running: GM or MX? Which
version?
- Are you using "fma", the "gm_mapper", or the "mx_mapper"?
- If running GM, include the output from running the
gm_board_info
from a known "good" node and a known "bad" node.
If running MX, include the output from running mx_info from a known
"good" node and a known "bad" node.
- Is the "Map version" value from this output the same across
all nodes?
- NOTE: If the map version
is not the same, ensure that you are not running a mixture of FMA on
some nodes and the mapper on others. Also check the connectivity of
nodes that seem to have an inconsistent map version.
- What are the contents of the file
/var/run/fms/fma.log ?
Gather up this information and see
this page about how to submit a help request to the user's mailing
list.
7. How do I adjust the MX first fragment size? Are there constraints? |
The MX library limits the maximum message fragment size for
both on-node and off-node messages. As of MX v1.0.3, the inter-node
maximum fragment size is 32k, and the intra-node maximum fragment size
is 16k — fragments sent larger than these sizes will fail.
Open MPI automatically fragments large messages; it currently limits
its first fragment size on MX networks to the lower of these two
values — 16k. As such, increasing the value of the MCA parameter
btl_mx_first_frag_size larger than 16k may cause failures in
some cases (e.g., when using MX to send large messages to processes on
the same node); it will cause failures in all cases if it is set above
32k.
Note that this only affects the first fragment of messages; latter
fragments do not have this size restriction. The MCA parameter
btl_mx_max_send_size can be used to vary the maximum size of
subsequent fragments.
|