This FAQ is for Open MPI v4.x and earlier.
If you are looking for documentation for Open MPI v5.x and later, please visit docs.open-mpi.org.
Table of contents:
- How do I debug Open MPI processes in parallel?
- What tools are available for debugging in parallel?
- How do I run with parallel debuggers?
- What controls does Open MPI have that aid in debugging?
- Do I need to build Open MPI with compiler/linker debugging
flags (such as
-g ) to be able to debug MPI applications?
- Can I use serial debuggers (such as
gdb ) to debug MPI
applications?
- My process dies without any output. Why?
- What is Memchecker?
- What kind of errors can Memchecker find?
- How can I use Memchecker?
- How to run my MPI application with Memchecker?
- Does Memchecker cause performance degradation to my application?
- Is Open MPI 'Valgrind-clean' or how can I identify real errors?
1. How do I debug Open MPI processes in parallel? |
This is a difficult question. Debugging in serial can be
tricky: errors, uninitialized variables, stack smashing, etc.
Debugging in parallel adds multiple different dimensions to this
problem: a greater propensity for race conditions, asynchronous
events, and the general difficulty of trying to understand N processes
simultaneously executing — the problem becomes quite formidable.
This FAQ section does not provide any definite solutions to
debugging in parallel. At best, it shows some general techniques and
a few specific examples that may be helpful to your situation.
But there are various controls within Open MPI that can help with
debugging. These are probably the most valuable entries in this FAQ
section.
2. What tools are available for debugging in parallel? |
There are two main categories of tools that can aid in
parallel debugging:
- Debuggers: Both serial and parallel debuggers are useful.
Serial debuggers are what most programmers are used to (e.g., gdb),
while parallel debuggers can attach to all the individual processes in
an MPI job simultaneously, treating the MPI application as a single
entity. This can be an extremely powerful abstraction, allowing the
user to control every aspect of the MPI job, manually replicate race
conditions, etc.
- Profilers: Tools that analyze your usage of MPI and display
statistics and meta information about your application's run. Some
tools present the information "live" (as it occurs), while others
collect the information and display it in a post mortem analysis.
Both freeware and commercial solutions are available for each kind of
tool.
3. How do I run with parallel debuggers? |
See these FAQ entries:
4. What controls does Open MPI have that aid in debugging? |
Open MPI has a series of MCA parameters for the MPI layer
itself that are designed to help with debugging. These parameters can
be can be set in the
usual ways. MPI-level MCA parameters can be displayed by invoking
the following command:
|