Open MPI logo

FAQ:
Debugging applications in parallel

  |   Home   |   Support   |   FAQ   |   all just the FAQ
This FAQ is for Open MPI v4.x and earlier.
If you are looking for documentation for Open MPI v5.x and later, please visit docs.open-mpi.org.

Table of contents:

  1. How do I debug Open MPI processes in parallel?
  2. What tools are available for debugging in parallel?
  3. How do I run with parallel debuggers?
  4. What controls does Open MPI have that aid in debugging?
  5. Do I need to build Open MPI with compiler/linker debugging flags (such as -g) to be able to debug MPI applications?
  6. Can I use serial debuggers (such as gdb) to debug MPI applications?
  7. My process dies without any output. Why?
  8. What is Memchecker?
  9. What kind of errors can Memchecker find?
  10. How can I use Memchecker?
  11. How to run my MPI application with Memchecker?
  12. Does Memchecker cause performance degradation to my application?
  13. Is Open MPI 'Valgrind-clean' or how can I identify real errors?


1. How do I debug Open MPI processes in parallel?

This is a difficult question. Debugging in serial can be tricky: errors, uninitialized variables, stack smashing, etc. Debugging in parallel adds multiple different dimensions to this problem: a greater propensity for race conditions, asynchronous events, and the general difficulty of trying to understand N processes simultaneously executing — the problem becomes quite formidable.

This FAQ section does not provide any definite solutions to debugging in parallel. At best, it shows some general techniques and a few specific examples that may be helpful to your situation.

But there are various controls within Open MPI that can help with debugging. These are probably the most valuable entries in this FAQ section.


2. What tools are available for debugging in parallel?

There are two main categories of tools that can aid in parallel debugging:

  • Debuggers: Both serial and parallel debuggers are useful. Serial debuggers are what most programmers are used to (e.g., gdb), while parallel debuggers can attach to all the individual processes in an MPI job simultaneously, treating the MPI application as a single entity. This can be an extremely powerful abstraction, allowing the user to control every aspect of the MPI job, manually replicate race conditions, etc.
  • Profilers: Tools that analyze your usage of MPI and display statistics and meta information about your application's run. Some tools present the information "live" (as it occurs), while others collect the information and display it in a post mortem analysis.

Both freeware and commercial solutions are available for each kind of tool.


3. How do I run with parallel debuggers?

See these FAQ entries:


4. What controls does Open MPI have that aid in debugging?

Open MPI has a series of MCA parameters for the MPI layer itself that are designed to help with debugging. These parameters can be can be set in the usual ways. MPI-level MCA parameters can be displayed by invoking the following command: