Forum Message

 

 

We have moved the forum to https://forum.ngsolve.org . This is an archived version of the topics until 05/05/23. All the topics were moved to the new forum and conversations can be continued there. This forum is just kept as legacy to not invalidate old links. If you want to continue a conversation just look for the topic in the new forum.

Notice

The forum is in read only mode.

MPI related questions

More
4 years 11 months ago #1526 by Guosheng Fu
Hello,

I am now starting to play with MPI in ngsolve and got a couple of questions to ask:

(1) For my installation, I am asking ngsolve to download and build for me metis, mumps, and hypre, while it succeed in buliding all the packages, I can not get mumps working. In the final linking phase, my libngla.so has some undefined references...
../linalg/libngla.so: undefined reference to `blacs_gridinfo_'
../linalg/libngla.so: undefined reference to `pzpotrf_'

So, I my working installation, I have to turn off mumps. This is always the most painful part due to my limited c++ experience... Do you see an immediate fix for this? I can provide more details about my build if you want...

(2) In ngsolve.org/docu/latest/how_to/howto_parallel.html , it says taskmanager allows for hybrid parallelization. I am assuming it is MPI+OPENMP, so how to do this hybrid parallization? I tried to run
mpirun -n 4 ngspy mpi_poisson.py
with SetNumThreads(8) in the code, but it didn't work...

(3) Again in ngsolve.org/docu/latest/how_to/howto_parallel.html , it says mpi does not support periodic boundary yet :<
But in the git repository, there is a quite recent commit on mpi+periodic... are you recently working on solving this issue?


Best always,
Guosheng
More
4 years 11 months ago - 4 years 11 months ago #1527 by lkogler
Replied by lkogler on topic MPI related questions
1) I think the problem here is the loading of the BLACS/SCALapack libraries.
You can:
  • Use "ngspy" instead of python3
  • Set these libraries in the LD_PRELOAD_PATH (have a look at the "ngspy"-file in the NGSolve bin-directory)
  • In your python scripts, before importing ngsolve:
    Code:
    from ctypes import CDLL, RTLD_GLOBAL for lib in THE_LIBRARIES_YOU_NEED: CDLL(lib, RTLD_GLOBAL)

2) MPI & C++11 threads. It should work like this:
Code:
ngsglobals.numthreads=X
Assembling/Applying of BLFs will be hybrid parallel, but most of the solvers/preconditioners will still be MPI-only.

3) It should work. Keep in mind that your mesh must contain Surface- and BBND- Elements. (This is only an issue with manually generated meshes). Please contact me if you run into any problems with this.
Last edit: 4 years 11 months ago by lkogler.
More
4 years 11 months ago #1528 by Guosheng Fu
Replied by Guosheng Fu on topic MPI related questions
I still can't get mumps working...

Here is the details of my build, maybe you can help me find the bug:
(1) I have a local gcc-8.1, python3, and mpich installed
(2) I am using intel mkl library to locate Lapack/Blast

do-configure.txt contains my cmake details
c.txt is the output of running "./do-configure",
m.txt is the output of running "make VERBOSE=1", which produce the error message at final linking stage (line 4744--4756)

Thanks!
More
4 years 11 months ago #1529 by Guosheng Fu
Replied by Guosheng Fu on topic MPI related questions
Wait, this is exactly the same error I encountered two years ago (when I am installing in another machine)
ngsolve.org/forum/ngspy-forum/11-install...root-access?start=24
It was the MKL library issue, and I got the issue fixed by using a static library. -DMKL_STATIC=ON lol


But then I encountered an issue with MUMPS, in the demo code
Code:
mpi_poisson.py
, I refined twice the mesh with
Code:
ngmesh.Refine()
to make the problem bigger, then mumps solver failed to factorize the matrix, exit with a segmentation fault..... hyper and masterinverse are working fine...
Is it a bug, or is there still something wrong with my installation?
More
4 years 11 months ago - 4 years 11 months ago #1533 by lkogler
Replied by lkogler on topic MPI related questions
Did it crash or did it terminate with an error message? I am looking into it.
Last edit: 4 years 11 months ago by lkogler.
More
4 years 11 months ago #1534 by Guosheng Fu
Replied by Guosheng Fu on topic MPI related questions
For the mpi_poisson.py file (with two mesh refinement), I run
Code:
mpirun -n X ngspy mpi_possion.py
the code is working fine if I take X to be 1 or 2, converged in 1 iteration.
But generate the following seg fault if I take X to be 3

Update Direct Solver PreconditionerMumps Parallel inverse, symmetric = 0
analysis ... factor ... /afs/crc.nd.edu/user/g/gfu/NG/ngsolve-install-mpi/bin/ngspy: line 2: 32379 Segmentation fault LD_PRELOAD=$LD_PRELOAD:/afs/crc.nd.edu/user/g/gfu/NG/mpich-inst/lib/libmpi.so:/opt/crc/i/intel/19.0/mkl/lib/intel64/libmkl_core.so:/opt/crc/i/intel/19.0/mkl/lib/intel64/libmkl_gnu_thread.so:/opt/crc/i/intel/19.0/mkl/lib/intel64/libmkl_intel_lp64.so:/opt/crc/i/intel/19.0/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so:/usr/lib64/libgomp.so.1 /afs/crc.nd.edu/user/g/gfu/NG/PY3-mpi/bin/python3 $*


And generate the following message if I take X to be 4
Code:
Update Direct Solver PreconditionerMumps Parallel inverse, symmetric = 0 analysis ... factor ... 2 :INTERNAL Error: recvd root arrowhead 2 :not belonging to me. IARR,JARR= -41598 13 2 :IROW_GRID,JCOL_GRID= 0 0 2 :MYROW, MYCOL= 0 2 2 :IPOSROOT,JPOSROOT= 10982 0 application called MPI_Abort(MPI_COMM_WORLD, -99) - process 3


But mumps is working fine with smaller system when I only do one mesh refinement...
Time to create page: 0.106 seconds