Open MPI logo

mpiexec(1) man page (version 1.1.5)

  |   Home   |   Support   |   FAQ   |  

« Return to documentation listing



NAME

       orterun,  mpirun,  mpiexec  -  Execute serial and parallel jobs in Open
       MPI.

       Note: mpirun, mpiexec, and orterun are  all  exact  synonyms  for  each
       other.   Using any of the names will result in exactly identical behav-
       ior.

SYNOPSIS

       Single Process Multiple Data (SPMD) Model:

       mpirun [ options ] <program> [ <args> ]

       Multiple Instruction Multiple Data (MIMD) Model:

       mpirun [ global_options ]
              [ local_options1 ] <program1> [ <args1> ] :
              [ local_options2 ] <program2> [ <args2> ] :
              ... :
              [ local_optionsN ] <programN> [ <argsN> ]

       Note that in both models, invoking mpirun via an absolute path name  is
       equivalent to specifying the --prefix option with a <dir> value equiva-
       lent to the directory where mpirun resides, minus  its  last  subdirec-
       tory.  For example:

           shell$ /usr/local/bin/mpirun ...

       is equivalent to

           shell$ mpirun --prefix /usr/local

QUICK SUMMARY

       If you are simply looking for how to run an MPI application, you proba-
       bly want to use a command line of the following form:

           shell$ mpirun [ -np X ] [ --hostfile <filename> ]  <program>

       This will run X copies of <program> in your current  run-time  environ-
       ment  (if running under a supported resource manager, Open MPI's mpirun
       will usually  automatically  use  the  corresponding  resource  manager
       process  starter, as opposed to, for example, rsh or ssh, which require
       the use of a hostfile, or will default to running all X copies  on  the
       localhost),  scheduling  (by  default)  in a round-robin fashion by CPU
       slot.  See the rest of this page for more details.

OPTIONS

       mpirun will send the name of the directory where it was invoked on  the
       local  node  to each of the remote nodes, and attempt to change to that
       directory.  See the "Current Working Directory" section below for  fur-
       ther details.

       <args>    Pass  these  run-time  arguments to every new process.  These
                 must always be the last arguments to mpirun. If an  app  con-
                 text file is used, <args> will be ignored.

       -bynode, --bynode
                 Allocate (map) the processes by node in a round-robin scheme.

       -byslot, --byslot
                 Allocate (map) the processes by slot in a round-robin scheme.
                 This is the default.

       -c <#>    Synonym for -np.

       -debug, --debug
                 Invoke    the    user-level   debugger   indicated   by   the
                 orte_base_user_debugger MCA parameter.

       -debugger, --debugger
                 Sequence of debuggers to search  for  when  --debug  is  used
                 (i.e.   a synonym for orte_base_user_debugger MCA parameter).

       -gmca, --gmca <key> <value>
                 Pass global MCA parameters that are applicable  to  all  con-
                 texts.  <key> is the parameter name; <value> is the parameter
                 value.

       -h, --help
                 Display help for this command

       -H <host1,host2,...,hostN>
                 Synonym for -host.

       -host, --host <host1,host2,...,hostN>
                 List of hosts on which to invoke processes.

       -hostfile, --hostfile <hostfile>
                 Provide a hostfile to use.

       -machinefile, --machinefile <machinefile>
                 Synonym for -hostfile.

       -mca, --mca <key> <value>
                 Send arguments to various MCA modules.  See  the  "MCA"  sec-
                 tion, below.

       -n, --n <#>
                 Synonym for -np.

       -nolocal, --nolocal
                 Do not run any copies of the launched application on the same
                 node as orterun is running.  This option will override  list-
                 ing  the  localhost  with --host or any other host-specifying
                 mechanism.

       -nooversubscribe, --nooversubscribe
                 Do not oversubscribe any nodes; error (without  starting  any
                 processes)  if  the requested number of processes would cause
                 oversubscription.  This option  implicitly  sets  "max_slots"
                 equal to the "slots" value for each node.

       -np <#>   Run this many copies of the program on the given nodes.  This
                 of the application) otherwise.

       -nw, --nw Launch the processes and do not wait  for  their  completion.
                 mpirun will complete as soon as successful launch occurs.

       -path, --path <path>
                 <path>  that will be used when attempting to locate requested
                 executables.

       --prefix <dir>
                 Prefix directory that will  be  used  to  set  the  PATH  and
                 LD_LIBRARY_PATH  on  the remote node before invoking Open MPI
                 or the target process.  See the "Remote  Execution"  section,
                 below.

       -q, --quiet
                 Suppress informative messages from orterun during application
                 execution.

       --tmpdir <dir>
                 Set the root for the session directory tree for mpirun  only.

       -tv, --tv Launch  processes  under  the TotalView debugger.  Deprecated
                 backwards compatibility flag. Synonym for --debug.

       --universe <email-address-removed:universe_name>
                 For this application, set the universe name as:
                      email-address-removed:universe_name

       -v, --verbose
                 Be verbose

       -V, --version
                 Print version number.  If no other arguments are given,  this
                 will also cause orterun to exit.

       -wd <dir> Change  to the directory <dir> before the user's program exe-
                 cutes.  See the "Current Working Directory" section for notes
                 on  relative  paths.  Note: If the -wd option appears both on
                 the command line and in an application context,  the  context
                 will take precedence over the command line.

       -x <env>  Export  the  specified  environment  variables  to the remote
                 nodes before executing  the  program.   Existing  environment
                 variables can be specified (see the Examples section, below),
                 or new variable names specified  with  corresponding  values.
                 The  parser  for  the -x option is not very sophisticated; it
                 does not even understand quoted values.  Users are advised to
                 set  variables  in the environment, and then use -x to export
                 (not define) them.

       The following options are useful for developers; they are not generally
       useful to most ORTE and/or MPI users:

       -d, --debug-devel
              Enable  debugging  of  the  OpenRTE  (the run-time layer in Open
              MPI).  This is not generally useful for most users.

       --no-daemonize
              Do not detach OpenRTE daemons used by this application.

DESCRIPTION

       One invocation of mpirun starts an MPI application running  under  Open
       MPI.  If  the  application  is single process multiple data (SPMD), the
       application can be specified on the mpirun command line.

       If the application is multiple instruction multiple data  (MIMD),  com-
       prising  of  multiple programs, the set of programs and argument can be
       specified in one of two ways:  Extended  Command  Line  Arguments,  and
       Application Context.

       An  application  context  describes  the MIMD program set including all
       arguments in a separate file.  This file essentially contains  multiple
       mpirun  command  lines,  less  the command name itself.  The ability to
       specify different options for different instantiations of a program  is
       another reason to use an application context.

       Extended command line arguments allow for the description of the appli-
       cation layout on the command line using  colons  (:)  to  separate  the
       specification  of programs and arguments. Some options are globally set
       across all specified programs (e.g. --hostfile), while others are  spe-
       cific to a single program (e.g. -np).

   Process Slots
       Open  MPI uses "slots" to represent a potential location for a process.
       Hence, a node with 2 slots means that 2 processes can  be  launched  on
       that  node.  For  performance, the community typically equates a "slot"
       with a physical CPU, thus ensuring that any process  assigned  to  that
       slot has a dedicated processor. This is not, however, a requirement for
       the operation of Open MPI.

       Slots can be specified in hostfiles after the hostname.  For example:

       host1.example.com slots=4
           Indicates that there are 4 process slots on host1.

       If no slots value is specified, then Open MPI will automatically assign
       a default value of "slots=1" to that host.

       When  running under resource managers (e.g., SLURM, Torque, etc.), Open
       MPI will obtain both the hostnames and the  number  of  slots  directly
       from  the  resource manger.  For example, if running under a SLURM job,
       Open MPI will automatically receive the hosts that SLURM has  allocated
       to  the  job as well as how many slots on each node that SLURM says are
       usable - in most high-performance environments, the slots  will  equate
       to the number of processors on the node.

       When  deciding  where  to launch processes, Open MPI will first fill up
       all available slots before  oversubscribing  (see  "Location  Nomencla-
       ture", below, for more details on the scheduling algorithms available).
       Unless told otherwise, Open MPI will arbitrarily  oversubscribe  nodes.
       For example, if the only node available is the localhost, Open MPI will
       run as many processes as specified by the -n (or one of  its  variants)
       command  line  option  on  the  localhost  (although they may run quite
       slowly, since they'll all be competing for CPU and other resources).

           Indicates that there are 2 process slots on host3 and that no over-
           subscription  is allowed (similar to the --nooversubscribe option).

       host4.example.com max_slots=2
           Shorthand; same as listing "slots=2 max_slots=2".

       Note that Open MPI's support for resource managers does  not  currently
       set  the "max_slots" values for hosts.  If you wish to prevent oversub-
       scription in such scenarios, use the --nooversubscribe option.

       In scenarios where the user wishes to launch an application across  all
       available  slots  by  not providing a "-n" option on the mpirun command
       line, Open MPI will launch a process on each process slot for each host
       within  the  provided  environment. For example, if a hostfile has been
       provided, then Open MPI will spawn processes on each identified host up
       to  the "slots=x" limit if oversubscription is not allowed. If oversub-
       scription is allowed (the default), then Open MPI will spawn  processes
       on  each  host up to the "max_slots=y" limit if that value is provided.
       In all cases, the "-bynode" and "-byslot" mapping  directives  will  be
       enforced to ensure proper placement of process ranks.

   Location Nomenclature
       As  described above, mpirun can specify arbitrary locations in the cur-
       rent Open MPI universe.  Locations can be specified either by CPU or by
       node.

       Note:  This  nomenclature  does not force Open MPI to bind processes to
       CPUs -- specifying a location "by CPU" is really a  convenience  mecha-
       nism for SMPs that ultimately maps down to a specific node.

       Specifying  locations by node will launch one copy of an executable per
       specified node.  Using the --bynode option tells Open MPI  to  use  all
       available  nodes.   Using the --byslot option tells Open MPI to use all
       slots on an available node before  allocating  resources  on  the  next
       available node.  For example:

       mpirun --bynode -np 4 a.out
           Runs one copy of the the executable a.out on all available nodes in
           the Open MPI universe.  MPI_COMM_WORLD rank 0  will  be  on  node0,
           rank  1  will  be  on  node1, etc. Regardless of how many slots are
           available on each of the nodes.

       mpirun --byslot -np 4 a.out
           Runs one copy of the the executable a.out on each slot on  a  given
           node before running the executable on other available nodes.

   Specifying Hosts
       Hosts can be specified in a number of ways. The most common of which is
       in a

          shell$ cat my-hostfile
          node00 slots=2
          node01 slots=2
          node02 slots=2

       mpirun --hostfile my-hostfile -np 3 a.out
              This will  run  one  copy  of  the  executable  a.out  on  hosts

       mpirun -np 3 --host a,b,c a.out
              Runs one copy of the executable a.out on hosts a, b, and c.

       mpirun -np 3 --hostfile my-hostfile --host node00 a.out
              Runs three copies of the executable a.out on host node00.

       mpirun -np 3 --hostfile my-hostfile --host node10 a.out
              This  will  prompt  an error since node10 is not in my-hostfile;
              mpirun will abort.

       shell$ mpirun -np 1 --host a hostname : -np 2 --host b,c uptime
              Runs one copy of the executable hostname on host a. And runs one
              copy of the executable uptime on hosts b and c.

   No Local Launch
       Using  the  --nolocal  option to orterun tells the system to not launch
       any of the application processes on the same node that orterun is  run-
       ning.    While   orterun  typically  blocks  and  consumes  few  system
       resources, this option can be helpful for  launching  very  large  jobs
       where  orterun  may  actually  need  to use noticable amounts of memory
       and/or processing time.  --nolocal allows orteun to run without sharing
       the  local node with the launched applications, and likewise allows the
       launched applications to run unhindered by orterun's system usage.

       Note that --nolocal will override any other specification to launch the
       application  on  the local node.  It will disqualify the localhost from
       being capable of running any processes in the application.

       shell$ mpirun -np 1 --host localhost --nolocal hostname
              This example will result in an error because  orterun  will  not
              find anywhere to launch the application.

   No Oversubscription
       Using  the  --nooversubscribe  option causes Open MPI to implicitly set
       the "max_slots" value to be the same as  the  "slots"  value  for  each
       node.   This  can  be  especially  helpful  when  running  jobs under a
       resource manager because Open MPI currently only sets the "slots" value
       for each node that it obtains from the resource manager.

   Application Context or Executable Program?
       To  distinguish  the  two  different forms, mpirun looks on the command
       line for --app option.  If it is specified, then the file named on  the
       command  line  is  assumed  to be an application context.  If it is not
       specified, then the file is assumed to be an executable program.

   Locating Files
       If no relative or absolute path is specified for a file, Open MPI  will
       look for files by searching the directories in the user's PATH environ-
       ment variable as defined on the source node(s).

       If a relative directory is specified, it must be relative to  the  ini-
       tial  working  directory  determined  by the specific starter used. For
       example when using the rsh or ssh starters, the  initial  directory  is
       $HOME  by  default. Other starters may set the initial directory to the
       current working directory from the invocation of mpirun.

   Current Working Directory
       If  the -wd option is specified, Open MPI will attempt to change to the
       specified directory on all of the remote nodes. If this  fails,  mpirun
       will abort.

       If  the  -wd  option is not specified, Open MPI will send the directory
       name where mpirun was invoked to each of the remote nodes.  The  remote
       nodes  will  try to change to that directory. If they are unable (e.g.,
       if the directory does not exit on that node), then Open  MPI  will  use
       the default directory determined by the starter.

       All  directory changing occurs before the user's program is invoked; it
       does not wait until MPI_INIT is called.

   Standard I/O
       Open MPI directs UNIX standard input  to  /dev/null  on  all  processes
       except  the  MPI_COMM_WORLD  rank  0 process. The MPI_COMM_WORLD rank 0
       process inherits standard input  from  mpirun.   Note:  The  node  that
       invoked   mpirun   need   not  be  the  same  as  the  node  where  the
       MPI_COMM_WORLD rank 0 process resides. Open MPI handles the redirection
       of mpirun's standard input to the rank 0 process.

       Open  MPI  directs  UNIX standard output and error from remote nodes to
       the node that invoked mpirun and prints it on the standard output/error
       of mpirun.  Local processes inherit the standard output/error of mpirun
       and transfer to it directly.

       Thus it is possible to redirect standard I/O for Open MPI  applications
       by using the typical shell redirection procedure on mpirun.

             shell$ mpirun -np 2 my_app < my_input > my_output

       Note  that  in this example only the MPI_COMM_WORLD rank 0 process will
       receive the stream from my_input on stdin.  The stdin on all the  other
       nodes  will  be  tied to /dev/null.  However, the stdout from all nodes
       will be collected into the my_output file.

   Signal Propagation
       When orterun receives a SIGTERM and SIGINT, it will attempt to kill the
       entire  job  by  sending  all processes in the job a SIGTERM, waiting a
       small number of seconds, then  sending  all  processes  in  the  job  a
       SIGKILL.   SIGUSR1  and  SIGUSR2 signals received by orterun are propa-
       gated to all processes in the job.  Other  signals  are  not  currently
       propagated by orterun.

   Process Termination / Signal Handling
       During  the  run  of  an  MPI  application, if any rank dies abnormally
       (either exiting before invoking MPI_FINALIZE, or dying as the result of
       a  signal), mpirun will print out an error message and kill the rest of
       the MPI application.

       User signal handlers should probably avoid trying to cleanup MPI  state
       (Open  MPI  is,  currently, neither thread-safe nor async-signal-safe).
       For example, if  a  segmentation  fault  occurs  in  MPI_SEND  (perhaps
       because  a  bad  buffer  was  passed  in)  and a user signal handler is
       invoked, if this user handler  attempts  to  invoke  MPI_FINALIZE,  Bad
       Things  could happen since Open MPI was already "in" MPI when the error
       occurred.  Since mpirun will notice that the process died due to a sig-
       nal,  it  is  probably  not necessary (and safest) for the user to only
       RTE  daemon  on remote nodes, and typically executes one or more of the
       user's shell-setup files before launching the Open  RTE  daemon.   When
       running    dynamically    linked   applications   which   require   the
       LD_LIBRARY_PATH environment variable to be set, care must be  taken  to
       ensure that it is correctly set when booting Open MPI.

       See the "Remote Execution" section for more details.

   Remote Execution
       Open  MPI  requires  that  the PATH environment variable be set to find
       executables on remote nodes (this is typically only necessary  in  rsh-
       or  ssh-based  environments  --  batch/scheduled environments typically
       copy the current environment to the execution of remote jobs, so if the
       current  environment  has PATH and/or LD_LIBRARY_PATH set properly, the
       remote nodes will also have it set properly).  If Open MPI was compiled
       with  shared  library  support,  it  may  also be necessary to have the
       LD_LIBRARY_PATH environment variable set on remote nodes as well (espe-
       cially  to  find the shared libraries required to run user MPI applica-
       tions).

       However, it is not always desirable or possible to edit  shell  startup
       files  to set PATH and/or LD_LIBRARY_PATH.  The --prefix option is pro-
       vided for some simple configurations where this is not possible.

       The --prefix option takes a single argument: the base directory on  the
       remote node where Open MPI is installed.  Open MPI will use this direc-
       tory to set the remote PATH and LD_LIBRARY_PATH  before  executing  any
       Open MPI or user applications.  This allows running Open MPI jobs with-
       out having pre-configued the PATH and  LD_LIBRARY_PATH  on  the  remote
       nodes.

       Open  MPI  adds the basename of the current node's "bindir" (the direc-
       tory where Open MPI's executables are installed) to the prefix and uses
       that  to set the PATH on the remote node.  Similarly, Open MPI adds the
       basename of the current node's "libdir" (the directory where Open MPI's
       libraries  are  installed)  to  the  prefix  and  uses  that to set the
       LD_LIBRARY_PATH on the remote node.  For example:

       Local bindir:  /local/node/directory/bin

       Local libdir:  /local/node/directory/lib64

       If the following command line is used:

           shell$ mpirun --prefix /remote/node/directory

       Open  MPI  will  add  "/remote/node/directory/bin"  to  the  PATH   and
       "/remote/node/directory/lib64" to the D_LIBRARY_PATH on the remote node
       before attempting to execute anything.

       Note that --prefix can be set on a per-context basis, allowing for dif-
       ferent values for different nodes.

       The  --prefix option is not sufficient if the installation paths on the
       remote node are different than the local node (e.g., if "/lib" is  used
       on  the local node, but "/lib64" is used on the remote node), or if the
       installation paths are something other than a subdirectory under a com-
       mon prefix.

           shell$ mpirun --prefix /usr/local

   Exported Environment Variables
       All  environment variables that are named in the form OMPI_* will auto-
       matically be exported to new processes on the local and  remote  nodes.
       The  -x  option  to  mpirun  can be used to export specific environment
       variables to the new processes.  While the  syntax  of  the  -x  option
       allows  the  definition of new variables, note that the parser for this
       option is currently not very sophisticated - it does  not  even  under-
       stand  quoted  values.  Users are advised to set variables in the envi-
       ronment and use -x to export them; not to define them.

   MCA (Modular Component Architecture)
       The -mca switch allows the passing of parameters to  various  MCA  mod-
       ules.   MCA  modules  have  direct  impact on MPI programs because they
       allow tunable parameters to be set at run time (such as which BTL  com-
       munication  device  driver to use, what parameters to pass to that BTL,
       etc.).

       The -mca switch takes two arguments:  <key>  and  <value>.   The  <key>
       argument  generally  specifies which MCA module will receive the value.
       For example, the <key> "btl" is used to select which BTL to be used for
       transporting  MPI  messages.  The <value> argument is the value that is
       passed.  For example:

       mpirun -mca btl tcp,self -np 1 foo
           Tells Open MPI to use the "tcp" and "self" BTLs, and to run a  sin-
           gle copy of "foo" an allocated node.

       mpirun -mca btl self -np 1 foo
           Tells  Open  MPI to use the "self" BTL, and to run a single copy of
           "foo" an allocated node.

       The -mca switch can be used multiple times to specify  different  <key>
       and/or  <value>  arguments.   If  the same <key> is specified more than
       once, the <value>s are concatenated with a comma (",") separating them.

       Note:  The  -mca  switch  is  simply a shortcut for setting environment
       variables.  The same effect may be accomplished by setting  correspond-
       ing environment variables before running mpirun.  The form of the envi-
       ronment variables that Open MPI sets are:

             OMPI_<key>=<value>

       Note that the -mca switch  overrides  any  previously  set  environment
       variables.   Also  note  that  unknown <key> arguments are still set as
       environment variable -- they are not checked (by mpirun)  for  correct-
       ness.   Illegal  or  incorrect  <value>  arguments  may  or  may not be
       reported -- it depends on the specific MCA module.

EXAMPLES

       Be sure to also see the examples in the  "Location  Nomenclature"  sec-
       tion, above.

       mpirun -np 1 prog1
           Load  and  execute  prog1 on one node.  Search the user's $PATH for
           the executable file on each node.

RETURN VALUE

       mpirun  returns  0  if  all  ranks started by mpirun exit after calling
       MPI_FINALIZE.  A non-zero  value  is  returned  if  an  internal  error
       occurred  in  mpirun,  or  one  or  more  ranks  exited  before calling
       MPI_FINALIZE.  If an internal error occurred in mpirun, the correspond-
       ing  error  code is returned.  In the event that one or more ranks exit
       before calling MPI_FINALIZE, the  return  value  of  the  rank  of  the
       process that mpirun first notices died before calling MPI_FINALIZE will
       be returned.  Note that, in general, this will be the first  rank  that
       died but is not guaranteed to be so.

       However,  note  that  if  the -nw switch is used, the return value from
       mpirun does not indicate the exit status of the ranks.

Open MPI                          March 2006                         MPIRUN(1)

« Return to documentation listing