- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
When I run an MPI program with ITAC tracing (using the -trace option for mpirun) and activate I_MPI_ASYNC_PROGRESS at the same time, I get consistent crashes indicating internal errors in MPI functions the code does not even use:
bort(137969167) on node 8 (rank 8 in comm 0): Fatal error in PMPI_Comm_dup: Other MPI error, error stack:
PMPI_Comm_dup(166)..................: MPI_Comm_dup(MPI_COMM_WORLD, new_comm=0x7fff65c43be0) failed
PMPI_Comm_dup(151)..................:
MPIR_Comm_dup_impl(49)..............:
MPII_Comm_copy(1031)................:
MPIR_Get_contextid_sparse_group(486):
MPIR_Allreduce_intra_auto_safe(322).:
MPIR_Bcast_intra_auto(85)...........:
MPIR_Bcast_intra_binomial(135)......: message sizes do not match across processes in the collective routine: Received 256 but expected 4100
I started the program with:
mpirun -trace -genv I_MPI_ASYNC_PROGRESS 1 -np 36 ./a.out
Tracing works fine with I_MPI_ASYNC_PROGRESS=0. The code itself is very simple and only uses MPI_Send, MPI_Recv, and MPI_Barrier (plus the standard stuff like MPI_Comm_rank etc.), but this problem also appears with other codes and even when using Score-P instead of Intel Trace Collector.
Software: ITAC 2021.6.0, Intel C compiler (LLVM-based) 2021.4.0, Intel MPI 2021.10.0
Link Copied
0 Replies

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page