custer installation issue (again)

More
6 years 3 months ago #715 by lkogler
Hi Guosheng,

Sorry to hear you are running into issues again :( . Let's see if we can resolve this.

Could you send me a backtrace of gdb for a simple .pde-file?

For example, execute "d1_square.pde" from the pde_tutorials with
Code:
mpirun -np 5 bash .wrap_mpirun gdb -batch -ex "run" -ex bt --args ngs d1_square.pde

".wrap_mpirun" should be something like this (with OpenMPI), and just pipes the output to seperate files
for all mpi-ranks:
Code:
#!/bin/sh ARGS=$@ $ARGS 1>out_p$OMPI_COMM_WORLD_RANK 2>out_p$OMPI_COMM_WORLD_RANK


Also, please send me the output of:
Code:
which ngspy | xargs cat
Code:
which ngs | xargs ldd

Also, could you try to import netgen in python and see if that crashes too?

Finally, your CmakeCache, the cmake-command you used and the cmake-output would be useful.



In principle, the tests are supposed to work with MPI! In this case, of course, they all fail because something is wrong with the ngsolve-libraries.

While this is probably not the issue you are running into in this case, on the cluster, they might still fail because you might not be allowed to run any MPI computations on the login node . If that is the issue, you have to go through the batch system (e.g. by switching to an interactive session and then running the tests as usual.)


Best,
Lukas
More
6 years 3 months ago #716 by Guosheng Fu
Lukas,

Here is my cmake file:
Code:
cmake \ -DUSE_UMFPACK=OFF \ -DCMAKE_PREFIX_PATH=/users/gfu1/data/ngsolve-install-plain \ -DCMAKE_BUILD_TYPE=Release \ -DINSTALL_DIR=/users/gfu1/data/ngsolve-install-plain \ -DUSE_GUI=OFF \ -DUSE_MPI=OFF \ -DUSE_MUMPS=OFF \ -DUSE_HYPRE=OFF \ -DUSE_MKL=ON \ -DMKL_ROOT=/gpfs/runtime/opt/intel/2017.0/mkl \ -DZLIB_INCLUDE_DIR=/gpfs/runtime/opt/zlib/1.2.8/ \ -DZLIB_LIBRARY=/gpfs/runtime/opt/zlib/1.2.8/lib/libz.so \ -DMKL_SDL=OFF \ -DCMAKE_CXX_COMPILER=/gpfs/runtime/opt/gcc/5.2.0/bin/g++ \ -DCMAKE_C_COMPILER=/gpfs/runtime/opt/gcc/5.2.0/bin/gcc \ ../ngsolve-src

I turned off MPI.
Here in the attachment is the CmakeCache in the build directory.

I am doing everything in a computing node via a interactive session.


which ngs | xargs ldd gives the following:
Code:
linux-vdso.so.1 => (0x00007fff643ff000) /usr/local/lib/libslurm.so (0x00007f7bf293f000) libsolve.so => /users/gfu1/data/ngsolve-install-plain/lib/libsolve.so (0x00007f7bf2644000) libngcomp.so => /users/gfu1/data/ngsolve-install-plain/lib/libngcomp.so (0x00007f7bf1adc000) libngfem.so => /users/gfu1/data/ngsolve-install-plain/lib/libngfem.so (0x00007f7bf0523000) libngla.so => /users/gfu1/data/ngsolve-install-plain/lib/libngla.so (0x00007f7befdc6000) libngbla.so => /users/gfu1/data/ngsolve-install-plain/lib/libngbla.so (0x00007f7befadb000) libngstd.so => /users/gfu1/data/ngsolve-install-plain/lib/libngstd.so (0x00007f7bef76c000) libnglib.so => /users/gfu1/data/ngsolve-install-plain/lib/libnglib.so (0x00007f7bef562000) libinterface.so => /users/gfu1/data/ngsolve-install-plain/lib/libinterface.so (0x00007f7bef308000) libstl.so => /users/gfu1/data/ngsolve-install-plain/lib/libstl.so (0x00007f7bef07b000) libgeom2d.so => /users/gfu1/data/ngsolve-install-plain/lib/libgeom2d.so (0x00007f7beee39000) libcsg.so => /users/gfu1/data/ngsolve-install-plain/lib/libcsg.so (0x00007f7beeb33000) libmesh.so => /users/gfu1/data/ngsolve-install-plain/lib/libmesh.so (0x00007f7bee623000) libz.so.1 => /gpfs/runtime/opt/zlib/1.2.8/lib/libz.so.1 (0x00007f7bee40d000) libvisual.so => /users/gfu1/data/ngsolve-install-plain/lib/libvisual.so (0x00007f7bee20c000) libpython3.6m.so.1.0 => /gpfs/runtime/opt/python/3.6.1/lib/libpython3.6m.so.1.0 (0x00007f7bedd04000) /gpfs/runtime/opt/intel/2017.0/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f7bed1e6000) /gpfs/runtime/opt/intel/2017.0/mkl/lib/intel64/libmkl_gnu_thread.so (0x00007f7bec01a000) /gpfs/runtime/opt/intel/2017.0/mkl/lib/intel64/libmkl_core.so (0x00007f7bea52a000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f7bea31e000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7bea101000) libstdc++.so.6 => /gpfs/runtime/opt/gcc/5.2.0/lib64/libstdc++.so.6 (0x00007f7be9d73000) libm.so.6 => /lib64/libm.so.6 (0x00007f7be9aef000) libgomp.so.1 => /gpfs/runtime/opt/gcc/5.2.0/lib64/libgomp.so.1 (0x00007f7be98ce000) libgcc_s.so.1 => /gpfs/runtime/opt/gcc/5.2.0/lib64/libgcc_s.so.1 (0x00007f7be96b7000) libc.so.6 => /lib64/libc.so.6 (0x00007f7be9323000) /lib64/ld-linux-x86-64.so.2 (0x00007f7bf2cf6000) libutil.so.1 => /lib64/libutil.so.1 (0x00007f7be911f000) librt.so.1 => /lib64/librt.so.1 (0x00007f7be8f16000)

I do not have ngspy in my $NETGENDIR directory, where only
ngs, ngscxx, ngsld
are available.
In my laptop version of ngsolve, I also do not have ngspy....

Best,
Guosheng
Attachments:
More
6 years 3 months ago #717 by Guosheng Fu
Now I have a complete rebuild of ngsolve.
The segmentation fault is gone, surprise!
But I have a MKL issue (this bug is way more friendly:>):
Code:
Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.

In my application, I need a hybrid-Poisson solver, so I am using the static condensation approach for the implementation, and apply a sparsecholesky factorization for the resulting hybrid matrix.
It is this line that cause the code crash:
Code:
inva = av.mat.Inverse(fes.FreeDofs(coupling=True), inverse="sparsecholesky")

I recall that I have another build that turned off MKL, which cause the seg. fault before, but now I am not completely sure...........

Best,
Guosheng
More
6 years 3 months ago #718 by lkogler
If you have not enabled MPI, that makes things less complicated. You don't need ngspy in that case
(that only exists because we had some issues with linking MKL libraries and MPI on certain systems).

You can simply run
Code:
gdb -ex run --args python3 poisson.py
or something similar.
More
6 years 3 months ago #719 by lkogler
Try turning MKL_SDL on,
-DMKL_SDL=ON
More
6 years 3 months ago #720 by Guosheng Fu
Ha, with
-DMKL_SDL=ON
the installation works!

This shall save a lot my computing time, thank you guys! (hopefully everything works...)
Time to create page: 0.110 seconds