mumps solver failed

More
4 years 4 months ago #2992 by Guosheng Fu
Hello,

I get the following error message when running the demo code mpi_poisson.py
Code:
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers
This seems to be related to dynamic/static link of MKL library.
The solver is PCG using bddc with mumps preconditioner.

This error is only activated when the problem size is "sufficiently large".
The code works fine when maxh = 0.08, but fails when maxh = 0.06.

Meanwhile, bddc + usehypre works fine for me.

I build the version '6.2.2006-83-gdc140c7' with the following configuration file (static MKL library):
Code:
... -DUSE_MPI=ON \ -DUSE_MUMPS=ON \ -DUSE_HYPRE=ON \ -DUSE_MKL=ON \ -DMKL_STATIC=ON \ ...
The compiler is gcc 8.3.0, MPI is a MPICH v3.3, MKL library is from Intel19.0

Also, a side question, what options are available for the inverse flag of bddc solver that is suitable for MPI?
Code:
c = Preconditioner(a, type="bddc", inverse = "???")

Best,
Guosheng
More
4 years 4 months ago - 4 years 4 months ago #2998 by lkogler
Replied by lkogler on topic mumps solver failed
Okay, this is a new error for me too. I have never worked with MPICH.
It looks ike MKLMPI_Get_wrappers is in the MKL BLACS library, so maybe NGSolve is linked with the wrong one. MPICH should be linked against libmkl_blacs_intelmpi.., not against libmkl_blacs_openmpi...
Could you upload your CMakeCache.txt from the NGSolve build directory?

You could also try to manually load the SDL library by putting this at the top of your python file:
Code:
from ctypes import CDLL, RTLD_GLOBAL CDLL(<path_to_libmkl_rt.so>, RTDL_GLOBAL)
I know it is not a very clean solution, but it usually works for me.
If this does not bother you too much you could even build NGSolve with dynamic MKL linking and MKL_SDL=ON.

Also, a side question, what options are available for the inverse flag of bddc solver that is suitable for MPI?

Only "mumps" and "masterinverse". Some more direct solvers are available through the PETSc interface.
However, that does not work with the "inverse=" flag, but has to be accessed by "coarsetype=petsc_pc". What petsc_pc does can then be configured via flags.

Best,
Lukas
Last edit: 4 years 4 months ago by lkogler.
More
4 years 3 months ago #3001 by Guosheng Fu
Replied by Guosheng Fu on topic mumps solver failed
Yes, I am using intelmpi from MKL, otherwise the build stage will complain.

But strangely, I can not reproduce the bug anymore today.
What I did is re-install petsc to include mumps and superlu_dist by the following configuration file (adapted from yours) (I can not link the scalapack lib from MKL, so I downloaded it):
Code:
20 ./configure --with-debugging=0 \ 21 --PETSC_ARCH="arch-linux-c-opt" \ 22 --prefix=$HOME/NGSolve-X/petsc \ 23 --with-fortran-bindings=0 \ 24 --with-fortran-type-initialize=0 \ 25 --with-shared-libraries \ 26 --with-mpi=1 --with-mpi-dir=/XXX/mpich/3.3/gcc/8.3.0 \ 27 --with-cxx-dialect=c++11 \ 28 --with-metis=1 --with-metis-dir=XXX/ngbuild-mpi/dependencies/parmetis \ 29 --with-parmetis=1 --with-parmetis-dir=XXX/ngbuild-mpi/dependencies/parmetis \ 30 --with-hypre=1 --download-hypre=yes --download-hypre-shared=0 \ 31 --with-blaslapack=1 --with-blaslapack-dir=XXX/intel/19.0/mkl \ 32 --with-zlib=1 \ 33 --with-ml=1 --download-ml=yes --download-ml-shared=0 \ 34 --with-ilu=1 \ 35 --with-cmake=1 \ 36 --download-slepc=yes \ 37 --with-superlu=1 --download-superlu=yes \ 38 --with-mumps=1 --download-mumps=yes --download-mumps-shared=0 \ 39 --with-scalapack=1 --download-scalapack=yes \ 40 --with-superlu_dist=1 --download-superlu_dist=yes --download-superlu_dist-shared=0

After that, I can use mumps and superlu_dist direct solver via ngs-petsc. And this seems fixed my mumps error. I guess the code just link to the petsc mumps first, rather than the ngsolve-build mumps. Is it correct?

I recall that building with MKL_SDL=ON failed on my machine once, so I will stick with the current version for now.

I am looking into a scalable solver for the hybrid-mixed poisson operator. Here is what I got for the timing of system solver for a test problem with RT0-P0 (code is attached):
Code:
mpirun -np $NSLOTS ngspy mpi_hybrid.py 0 60
24 cores: ~4s
48 cores: ~3s
96 cores: ~5s
192 cores: >100s

I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?

Best,
Guosheng
Attachments:
More
4 years 3 months ago #3014 by Guosheng Fu
Replied by Guosheng Fu on topic mumps solver failed
Just a follow-up on the hybrid-mixed solver.
I ended up using PETSc's ksp solver with a mumps direct solver as the preconditioner:
Code:
ksp = petsc.KSP(mat=mat_wrap, name="someksp", petsc_options={ "ksp_type": "preonly", "pc_type": "cholesky", "pc_factor_mat_solver_type": "mumps"})
As I didn't find a good preconditioner for the high-order hybrid-mixed matrix.
One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.
Code:
gfu.vec.data = ksp * rhs gfu.vec.Cumulate()

Best,
Guosheng
More
4 years 3 months ago #3019 by lkogler
Replied by lkogler on topic mumps solver failed

I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?

No idea. Are you measuring simply the time the entire job takes? Bigger jobs can take longer to start up.
You could try writing trace-files on some ranks and looking at those to get an idea of where the problem comes from
Code:
with Taskmanager(pajetrace=10*1024*1024):

One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.

What are you doing with the vector? Parallel NGSolve objects should accept a CUMULATED or DISTRIBUTED vector as input and do the conversion automatically if needed. But if you are doing something directly to the values you need to cumulate it.

Best,
Lukas
More
4 years 3 months ago #3021 by Guosheng Fu
Replied by Guosheng Fu on topic mumps solver failed

What are you doing with the vector? Parallel NGSolve objects should accept a CUMULATED or DISTRIBUTED vector as input and do the conversion automatically if needed. But if you are doing something directly to the values you need to cumulate it.


I am simply using a static condensation approach for linear system solver. Without Cumulate, the output error is not correct. (see attached code)

For timing the solver, I am simply using python timer for the cost of linear system solve:
Code:
import time ... t0 = time.time() gfu.vec.data = ksp * rhs t1 = time.time() gfu.vec.Cumulate() # cumulate, this line is necessary gfu.vec.data += a.harmonic_extension * gfu.vec gfu.vec.data += a.inner_solve * rhs if rank==0: print("cost: ", t1-t0)

Iterative solver just product unexpected behavior for me.
Attached is a simply hybrid-Poisson solver with hypre preconditioner that fails to be scalable on my machine.
The cost of linear system solver as recorded as follows:
Code:
mpirun -np $NSLOTS ngspy mpi_hybrid.py 12 cpus: 5.7 s 24 cpus: 3.5 s results for three separate runs : 48 cpus: [2.3, 2.4s, 3.2 s] 96 cpus: [11 s, 8s, 8.4s] [speed is degraded from 48 cpus] 192 cpus: [19 s, 31s, (larger than 100s)]
You can clearly see the downgraded performance from 48 cores to 96 cores although # of iterations is still the same. With 192 cpus, the cost is even worse.This result simply does not make any sense to me.
Also, with more cpus, it takes more time to set the hypre preconditioner.
(~9s for 48 cores, ~48s for 96 cores, and ~80 s for 192 cores)


Best,
Guosheng
Time to create page: 0.109 seconds