Open MPI question

More
5 years 7 months ago #1555 by ddrake
Open MPI question was created by ddrake
Hi,

I have installed libopenmpi2 2.1.1-8 and libopenmpi-dev 2.1.1-8 on Ubuntu 18.04. I then built NGSolve from source with USE_MPI=ON.

The py_tutorial examples are all working for me in that I can run the mpi examples like this
Code:
mpirun -np 5 ngspy mpi_poisson.py
I can also run the non-mpi examples like this
Code:
netgen poisson.py
-- But I can no longer run the non-mpi examples in this way:
Code:
python3 poisson.py
. If I try, I get an error like this:

importing NGSolve-6.2.1902-107-g5aa0a3e4
[dow-HP-Notebook:18549] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[dow-HP-Notebook:18549] mca_base_component_repository_open: unable to open mca_shmem_mmap: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[dow-HP-Notebook:18549] mca_base_component_repository_open: unable to open mca_shmem_posix: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[dow-HP-Notebook:18549] mca_base_component_repository_open: unable to open mca_shmem_sysv: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)


It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS


It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS


It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)

*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[dow-HP-Notebook:18549] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

This is a problem when I try to run code which uses a custom shared library based on NGSolve libraries but which is not designed for openmpi and doesn't need the netgen gui.

Is there a way to build my custom library so the mpi dependencies are not included so it can still be used by code that is run directly from python? -- or is it recommended to have two builds of NGSolve - one for parallel and one for serial?

Thanks!
Dow
More
5 years 7 months ago #1556 by Guosheng Fu
Replied by Guosheng Fu on topic Open MPI question
It might be the loading of libraries issue...
You can:
Use "ngspy" instead of python3
Set these libraries in the LD_PRELOAD_PATH (have a look at the "ngspy"-file in the NGSolve bin-directory)
The following user(s) said Thank You: ddrake
More
5 years 7 months ago #1558 by lkogler
Replied by lkogler on topic Open MPI question
Yes, that seems to be the issue. I am working on correcting the link order.
The following user(s) said Thank You: ddrake
More
5 years 7 months ago #1563 by lkogler
Replied by lkogler on topic Open MPI question
This also is something that does not seem to happen with newer versions of OpenMPI.
I does not happen with versions 3.1.2 or 4.0.
The following user(s) said Thank You: ddrake
Time to create page: 0.099 seconds