Scroll to navigation

MPIX_COMM_GET_FAILED(3) Open MPI MPIX_COMM_GET_FAILED(3)

MPIX_Comm_get_failed - Obtain a group that lists failed processes in a communicator.

This is part of the User Level Fault Mitigation ULFM extension.

SYNTAX

C Syntax

#include <mpi.h>
#include <mpi-ext.h>
int MPIX_Comm_get_failed(MPI_Comm comm, MPI_Group *failedgrp)


Fortran Syntax

USE MPI
USE MPI_EXT
! or the older form: INCLUDE 'mpif.h'
MPIX_COMM_GET_FAILED(COMM, FAILEDGRP, IERROR)

INTEGER COMM, FAILEDGRP, IERROR


Fortran 2008 Syntax

USE mpi_f08
USE mpi_ext_f08
MPIX_Comm_get_failed(comm, failedgrp, ierror)

TYPE(MPI_Comm), INTENT(IN) :: comm
TYPE(MPI_Group), INTENT(OUT) :: failedgrp
INTEGER, OPTIONAL, INTENT(OUT) :: ierror


INPUT PARAMETERS

comm: Communicator (handle).

OUTPUT PARAMETERS

  • failedgrp: Group (handle).
  • ierror: Fortran only: Error status (integer).

DESCRIPTION

This local operation returns the group failedgrp of processes from the communicator comm that are locally known to have failed. The failedgrp can be empty, that is, equal to MPI_GROUP_EMPTY.

For any two groups obtained from calls to that routine at the same MPI process, with the same comm, the intersection of the largest group with the smallest group is MPI_IDENT to the smallest group, that is, the same processes have the same ranks in the two groups, up to the size of the smallest group.

PROCESS FAILURES

MPI makes no assumption about asynchronous progress of the failure detection. A valid MPI implementation may choose to update the group of locally known failed MPI processes only when it enters a function that must raise a fault tolerance error.

It is possible that only the calling MPI process has detected the reported failure. If global knowledge is necessary, MPI processes detecting failures should call MPIX_Comm_revoke to enforce an error at other ranks.

WHEN COMMUNICATOR IS AN INTER-COMMUNICATOR

When the communicator is an inter-communicator, the value of failedgrp contains the members known to have failed in both the local and the remote groups of comm.

ERRORS

Almost all MPI routines return an error value; C routines as the return result of the function and Fortran routines in the last argument.

Before the error value is returned, the current MPI error handler associated with the communication object (e.g., communicator, window, file) is called. If no communication object is associated with the MPI call, then the call is considered attached to MPI_COMM_SELF and will call the associated MPI error handler. When MPI_COMM_SELF is not initialized (i.e., before MPI_Init/MPI_Init_thread, after MPI_Finalize, or when using the Sessions Model exclusively) the error raises the initial error handler. The initial error handler can be changed by calling MPI_Comm_set_errhandler on MPI_COMM_SELF when using the World model, or the mpi_initial_errhandler CLI argument to mpiexec or info key to MPI_Comm_spawn/MPI_Comm_spawn_multiple. If no other appropriate error handler has been set, then the MPI_ERRORS_RETURN error handler is called for MPI I/O functions and the MPI_ERRORS_ABORT error handler is called for all other MPI functions.

Open MPI includes three predefined error handlers that can be used:

  • MPI_ERRORS_ARE_FATAL Causes the program to abort all connected MPI processes.
  • MPI_ERRORS_ABORT An error handler that can be invoked on a communicator, window, file, or session. When called on a communicator, it acts as if MPI_Abort was called on that communicator. If called on a window or file, acts as if MPI_Abort was called on a communicator containing the group of processes in the corresponding window or file. If called on a session, aborts only the local process.
  • MPI_ERRORS_RETURN Returns an error code to the application.

MPI applications can also implement their own error handlers by calling:

  • MPI_Comm_create_errhandler then MPI_Comm_set_errhandler
  • MPI_File_create_errhandler then MPI_File_set_errhandler
  • MPI_Session_create_errhandler then MPI_Session_set_errhandler or at MPI_Session_init
  • MPI_Win_create_errhandler then MPI_Win_set_errhandler

Note that MPI does not guarantee that an MPI program can continue past an error.

See the MPI man page for a full list of MPI error codes.

See the Error Handling section of the MPI-3.1 standard for more information.

SEE ALSO:

  • MPIX_Comm_revoke
  • MPIX_Comm_ack_failed



COPYRIGHT

2003-2024, The Open MPI Community

December 2, 2024