installing issues with openMPI

More
4 years 11 months ago #2187 by johannesh
Hi everyone,
I am trying to install NGSolove with openMPI support from source on Ubuntu 18.04.3 LTS. To do so, I ran cmake with following configuration:
Code:
cmake -DCMAKE_BUILD_TYPE:STRING="RELEASE" -DCMAKE_INSTALL_PREFIX=${NGSUITE}/ngsolve-install ${NGSUITE}/ngsolve-src/ -DCMAKE_CXX_COMPILER:FILEPATH="/usr/bin/mpicxx" -DCMAKE_C_COMPILER:FILEPATH="/usr/bin/mpicc" -DCMAKE_CXX_FLAGS="-march=native -O3" -DCMAKE_C_FLAGS="-march=native -O3" -DMPI_LIBRARY:FILEPATH="/usr" -DUSE_MPI=ON -DUSE_LAPACK=ON -DLAPACK_DIR:FILEPATH="/usr/lib/x86_64-linux-gnu/lapack"

NGSolve builds fine. However, 'make install' produces an error which seems to be related to python and openMPI:
Code:
... -- Up-to-date: /home/johannes/ngsuite/ngsolve-install/lib/python3/dist-packages/ngsolve/TensorProductTools.py [c3po:14049] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored) [c3po:14049] mca_base_component_repository_open: unable to open mca_shmem_posix: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored) [c3po:14049] mca_base_component_repository_open: unable to open mca_shmem_mmap: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored) [c3po:14049] mca_base_component_repository_open: unable to open mca_shmem_sysv: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- --------------------------------------------------------------------------

I already removed all CMake files. Nevertheless, the error persists. I also made sure that there is only one MPI version installed, namely libopenmpi-dev 2.1.1.-8; the files mentioned in the error (e.g., /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_patcher_overwrite.so) exist. The python module mpi4py (3.0.3) has been installed using pip3 after the installation of libopenmpi-dev. The strage thing is, that it works if I purge openMPI from the system and install MPICH instead.
Do you have any clue, where this error might stem from?

If you need any more information I am happy to provide them.
Best,
Johannes
More
4 years 11 months ago #2189 by lkogler
Set CMAKE_C_COMPILER and CMAKE_CXX_COMPILER to the compiler underlying OpenMPI.
Cmake should figure out the proper MPI-specific flags on it's own.

Explanation why I think this might help:
With OpenMPI versions less than I believe 3, there are some issues with loading of the NGSolve libraries from python.
This is why we are installing a small wrapper-script called "ngspy" which preloads some libraries, among them MPI libraries, before starting python3 (it is just a tiny script in ngsolve-install-dir/bin).
The error you are getting in "make install" looks exactly like the one you get when importing ngsolve in a python script without preloading the MPI libraries.
So, I figure cmake tries to compile and run something during make install (I do not know what) and runs into this linking issue.
If this is the case, another workaround might be setting the environment variable LD_PRELOAD_PATH=path_to_your_libmpi.so.

Also, as a sidenote, with mpi4py you have to make sure that NGSolve and mpi4py are using the same MPI installation.
More
4 years 11 months ago #2190 by matthiash

lkogler wrote: So, I figure cmake tries to compile and run something during make install (I do not know what) and runs into this linking issue.


I guess this is due to python stub file generation (for auto-completion in Python). If the fixes above don't work, try to disabled it by configuring with
Code:
-DBUILD_STUB_FILES=OFF

Best,
Matthias
More
4 years 11 months ago #2193 by johannesh
Thank you for your quick response. I tried all three suggested fixes.
Setting CMAKE_C_COMPILER and CMAKE_CXX_COMPILER to gcc and gc++, respectively didn't work for me. CMake wasn't able to detect the MPI compilers in my case. Was this what you suggested or did I use the wrong compilers?
Setting LD_PRELOAD_PATH resulted in the same error as before.
Eventually, the flag -DBUILD_STUB_FILES=OFF resolved the issue.

Thank you for your help.
Best,
Johannes
More
4 years 11 months ago - 4 years 11 months ago #2194 by lkogler
"Setting CMAKE_C_COMPILER and CMAKE_CXX_COMPILER to gcc and gc++, respectively didn't work for me. "
That is what I meant, and it should work. (I assume gc++ was a typo? It should be g++). Does cmake end with an error message? Could you send me your cmake-command, the cmake-output and CMakeCache.txt (in the build folder)?
Last edit: 4 years 11 months ago by lkogler.
More
4 years 11 months ago #2198 by johannesh
As you pointed out gc++ was a typo.
I tried several things now and figured, that -DMPI_LIBRARY in combination with g++/gcc does not work. As soon as I remove the flag completely, CMake is able to find the MPI library (at the path I previously set for -DMPI_LIBRARY). This problem does not occur if mpicxx/mpicc is specified.
However, this does not resolve the initial problems with the MPI installation.
If you are still interested in the CMake files let me know.
Time to create page: 0.123 seconds