- Thank you received: 0
Netgen GUI fails to start when libngsolve loaded and MPI=ON (ALT Linux)
6 years 5 months ago - 6 years 5 months ago #551
by nickel
Hi,
I've encountered an issue recently trying to run netgen GUI built with openMPI (github v6.2.1804):
[host-68.localdomain:12429] *** An error occurred in MPI_comm_size
[host-68.localdomain:12429] *** on communicator MPI_COMM_WORLD
[host-68.localdomain:12429] *** MPI_ERR_COMM: invalid communicator
[host-68.localdomain:12429] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
...
more output under spoiler
However if libngsolve is not loaded netgen starts fine (GUI works).
Are there any solution hints?
I've encountered an issue recently trying to run netgen GUI built with openMPI (github v6.2.1804):
[host-68.localdomain:12429] *** An error occurred in MPI_comm_size
[host-68.localdomain:12429] *** on communicator MPI_COMM_WORLD
[host-68.localdomain:12429] *** MPI_ERR_COMM: invalid communicator
[host-68.localdomain:12429] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
...
more output under spoiler
Code:
[user@host-68 ~]$ /usr/lib64/openmpi-compat/bin/mpirun -np 4 netgen
NETGEN-6.2-dev
Developed by Joachim Schoeberl at
2010-xxxx Vienna University of Technology
2006-2010 RWTH Aachen University
1996-2006 Johannes Kepler University Linz
Including OpenCascade geometry kernel
Running MPI - parallel using 4 processors
MPI-version = 2.1
optfile ./ng.opt does not exist - using default values
togl-version : 2
OCC module loaded
loading ngsolve library
NGSolve-........-..-..
Using Lapack
[host-68.localdomain:12429] *** An error occurred in MPI_comm_size
[host-68.localdomain:12429] *** on communicator MPI_COMM_WORLD
[host-68.localdomain:12429] *** MPI_ERR_COMM: invalid communicator
[host-68.localdomain:12429] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 12429 on
node host-68.localdomain exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[user@host-68 ~]$
However if libngsolve is not loaded netgen starts fine (GUI works).
Code:
[user@host-68 ~]$ /usr/lib64/openmpi-compat/bin/mpirun -np 4 netgen
NETGEN-6.2-dev
Developed by Joachim Schoeberl at
2010-xxxx Vienna University of Technology
2006-2010 RWTH Aachen University
1996-2006 Johannes Kepler University Linz
Including OpenCascade geometry kernel
Running MPI - parallel using 4 processors
MPI-version = 2.1
optfile ./ng.opt does not exist - using default values
togl-version : 2
OCC module loaded
loading ngsolve library
cannot load ngsolve
error: couldn't load file "libngsolve.so": libngsolve.so: cannot open shared object file: No such file or directory
[user@host-68 ~]$
Are there any solution hints?
Last edit: 6 years 5 months ago by nickel.
6 years 4 months ago #592
by ddrake
Replied by ddrake on topic Netgen GUI fails to start when libngsolve loaded and MPI=ON (ALT Linux)
Hi,
This sounds like an issue of needing to add an entry for libgomp to the preload path. Maybe this will help...
find / -name libgomp.so.1 2>&1 | grep -v "Permission denied"
Then in the directory where the netgen binary is installed, look for the small textfile ngspy.
Edit that file, inserting the path to libgomp into the preload path so it looks something like this:
LD_PRELOAD=$LD_PRELOAD:/act/openmpi-2.0/gcc-7.2.0/lib/libmpi.so:/opt/intel/mkl/lib/intel64/libmkl_core.so:/opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so:/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so:/opt/intel/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.so:/usr/lib64/libgomp.so.1 /home/ddrake/common/install/bin/python3 $*
Best,
Dow
This sounds like an issue of needing to add an entry for libgomp to the preload path. Maybe this will help...
find / -name libgomp.so.1 2>&1 | grep -v "Permission denied"
Then in the directory where the netgen binary is installed, look for the small textfile ngspy.
Edit that file, inserting the path to libgomp into the preload path so it looks something like this:
LD_PRELOAD=$LD_PRELOAD:/act/openmpi-2.0/gcc-7.2.0/lib/libmpi.so:/opt/intel/mkl/lib/intel64/libmkl_core.so:/opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so:/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so:/opt/intel/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.so:/usr/lib64/libgomp.so.1 /home/ddrake/common/install/bin/python3 $*
Best,
Dow
Time to create page: 0.114 seconds