Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2256 Discussions

Crashes with I_MPI_ASYNC_PROGRESS and Trace Collector

GeorgHager
Beginner
123 Views

Hi all,

 

When I run an MPI program with ITAC tracing (using the -trace option for mpirun) and activate I_MPI_ASYNC_PROGRESS at the same time, I get consistent crashes indicating internal errors in MPI functions the code does not even use:

 

bort(137969167) on node 8 (rank 8 in comm 0): Fatal error in PMPI_Comm_dup: Other MPI error, error stack:
PMPI_Comm_dup(166)..................: MPI_Comm_dup(MPI_COMM_WORLD, new_comm=0x7fff65c43be0) failed
PMPI_Comm_dup(151)..................:
MPIR_Comm_dup_impl(49)..............:
MPII_Comm_copy(1031)................:
MPIR_Get_contextid_sparse_group(486):
MPIR_Allreduce_intra_auto_safe(322).:
MPIR_Bcast_intra_auto(85)...........:
MPIR_Bcast_intra_binomial(135)......: message sizes do not match across processes in the collective routine: Received 256 but expected 4100

 

I started the program with:

mpirun -trace -genv I_MPI_ASYNC_PROGRESS 1 -np 36 ./a.out

Tracing works fine with I_MPI_ASYNC_PROGRESS=0. The code itself is very simple and only uses MPI_Send, MPI_Recv, and MPI_Barrier (plus the standard stuff like MPI_Comm_rank etc.), but this problem also appears with other codes and even when using Score-P instead of Intel Trace Collector.

 

Software:  ITAC 2021.6.0, Intel C compiler (LLVM-based) 2021.4.0, Intel MPI 2021.10.0

0 Kudos
0 Replies
Reply