- Thank you received: 6
mumps solver failed
- Guosheng Fu
- Topic Author
- Offline
- Elite Member
Less
More
4 years 4 months ago #2992
by Guosheng Fu
mumps solver failed was created by Guosheng Fu
Hello,
I get the following error message when running the demo code mpi_poisson.py
This seems to be related to dynamic/static link of MKL library.
The solver is PCG using bddc with mumps preconditioner.
This error is only activated when the problem size is "sufficiently large".
The code works fine when maxh = 0.08, but fails when maxh = 0.06.
Meanwhile, bddc + usehypre works fine for me.
I build the version '6.2.2006-83-gdc140c7' with the following configuration file (static MKL library):
The compiler is gcc 8.3.0, MPI is a MPICH v3.3, MKL library is from Intel19.0
Also, a side question, what options are available for the inverse flag of bddc solver that is suitable for MPI?
Best,
Guosheng
I get the following error message when running the demo code mpi_poisson.py
Code:
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers
The solver is PCG using bddc with mumps preconditioner.
This error is only activated when the problem size is "sufficiently large".
The code works fine when maxh = 0.08, but fails when maxh = 0.06.
Meanwhile, bddc + usehypre works fine for me.
I build the version '6.2.2006-83-gdc140c7' with the following configuration file (static MKL library):
Code:
...
-DUSE_MPI=ON \
-DUSE_MUMPS=ON \
-DUSE_HYPRE=ON \
-DUSE_MKL=ON \
-DMKL_STATIC=ON \
...
Also, a side question, what options are available for the inverse flag of bddc solver that is suitable for MPI?
Code:
c = Preconditioner(a, type="bddc", inverse = "???")
Best,
Guosheng
4 years 4 months ago - 4 years 4 months ago #2998
by lkogler
Replied by lkogler on topic mumps solver failed
Okay, this is a new error for me too. I have never worked with MPICH.
It looks ike MKLMPI_Get_wrappers is in the MKL BLACS library, so maybe NGSolve is linked with the wrong one. MPICH should be linked against libmkl_blacs_intelmpi.., not against libmkl_blacs_openmpi...
Could you upload your CMakeCache.txt from the NGSolve build directory?
You could also try to manually load the SDL library by putting this at the top of your python file:
I know it is not a very clean solution, but it usually works for me.
If this does not bother you too much you could even build NGSolve with dynamic MKL linking and MKL_SDL=ON.
However, that does not work with the "inverse=" flag, but has to be accessed by "coarsetype=petsc_pc". What petsc_pc does can then be configured via flags.
Best,
Lukas
It looks ike MKLMPI_Get_wrappers is in the MKL BLACS library, so maybe NGSolve is linked with the wrong one. MPICH should be linked against libmkl_blacs_intelmpi.., not against libmkl_blacs_openmpi...
Could you upload your CMakeCache.txt from the NGSolve build directory?
You could also try to manually load the SDL library by putting this at the top of your python file:
Code:
from ctypes import CDLL, RTLD_GLOBAL
CDLL(<path_to_libmkl_rt.so>, RTDL_GLOBAL)
If this does not bother you too much you could even build NGSolve with dynamic MKL linking and MKL_SDL=ON.
Only "mumps" and "masterinverse". Some more direct solvers are available through the PETSc interface.Also, a side question, what options are available for the inverse flag of bddc solver that is suitable for MPI?
However, that does not work with the "inverse=" flag, but has to be accessed by "coarsetype=petsc_pc". What petsc_pc does can then be configured via flags.
Best,
Lukas
Last edit: 4 years 4 months ago by lkogler.
- Guosheng Fu
- Topic Author
- Offline
- Elite Member
Less
More
- Thank you received: 6
4 years 3 months ago #3001
by Guosheng Fu
Replied by Guosheng Fu on topic mumps solver failed
Yes, I am using intelmpi from MKL, otherwise the build stage will complain.
But strangely, I can not reproduce the bug anymore today.
What I did is re-install petsc to include mumps and superlu_dist by the following configuration file (adapted from yours) (I can not link the scalapack lib from MKL, so I downloaded it):
After that, I can use mumps and superlu_dist direct solver via ngs-petsc. And this seems fixed my mumps error. I guess the code just link to the petsc mumps first, rather than the ngsolve-build mumps. Is it correct?
I recall that building with MKL_SDL=ON failed on my machine once, so I will stick with the current version for now.
I am looking into a scalable solver for the hybrid-mixed poisson operator. Here is what I got for the timing of system solver for a test problem with RT0-P0 (code is attached):
24 cores: ~4s
48 cores: ~3s
96 cores: ~5s
192 cores: >100s
I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?
Best,
Guosheng
But strangely, I can not reproduce the bug anymore today.
What I did is re-install petsc to include mumps and superlu_dist by the following configuration file (adapted from yours) (I can not link the scalapack lib from MKL, so I downloaded it):
Code:
20 ./configure --with-debugging=0 \
21 --PETSC_ARCH="arch-linux-c-opt" \
22 --prefix=$HOME/NGSolve-X/petsc \
23 --with-fortran-bindings=0 \
24 --with-fortran-type-initialize=0 \
25 --with-shared-libraries \
26 --with-mpi=1 --with-mpi-dir=/XXX/mpich/3.3/gcc/8.3.0 \
27 --with-cxx-dialect=c++11 \
28 --with-metis=1 --with-metis-dir=XXX/ngbuild-mpi/dependencies/parmetis \
29 --with-parmetis=1 --with-parmetis-dir=XXX/ngbuild-mpi/dependencies/parmetis \
30 --with-hypre=1 --download-hypre=yes --download-hypre-shared=0 \
31 --with-blaslapack=1 --with-blaslapack-dir=XXX/intel/19.0/mkl \
32 --with-zlib=1 \
33 --with-ml=1 --download-ml=yes --download-ml-shared=0 \
34 --with-ilu=1 \
35 --with-cmake=1 \
36 --download-slepc=yes \
37 --with-superlu=1 --download-superlu=yes \
38 --with-mumps=1 --download-mumps=yes --download-mumps-shared=0 \
39 --with-scalapack=1 --download-scalapack=yes \
40 --with-superlu_dist=1 --download-superlu_dist=yes --download-superlu_dist-shared=0
After that, I can use mumps and superlu_dist direct solver via ngs-petsc. And this seems fixed my mumps error. I guess the code just link to the petsc mumps first, rather than the ngsolve-build mumps. Is it correct?
I recall that building with MKL_SDL=ON failed on my machine once, so I will stick with the current version for now.
I am looking into a scalable solver for the hybrid-mixed poisson operator. Here is what I got for the timing of system solver for a test problem with RT0-P0 (code is attached):
Code:
mpirun -np $NSLOTS ngspy mpi_hybrid.py 0 60
48 cores: ~3s
96 cores: ~5s
192 cores: >100s
I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?
Best,
Guosheng
Attachments:
- Guosheng Fu
- Topic Author
- Offline
- Elite Member
Less
More
- Thank you received: 6
4 years 3 months ago #3014
by Guosheng Fu
Replied by Guosheng Fu on topic mumps solver failed
Just a follow-up on the hybrid-mixed solver.
I ended up using PETSc's ksp solver with a mumps direct solver as the preconditioner:
As I didn't find a good preconditioner for the high-order hybrid-mixed matrix.
One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.
Best,
Guosheng
I ended up using PETSc's ksp solver with a mumps direct solver as the preconditioner:
Code:
ksp = petsc.KSP(mat=mat_wrap, name="someksp", petsc_options={
"ksp_type": "preonly",
"pc_type": "cholesky",
"pc_factor_mat_solver_type": "mumps"})
One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.
Code:
gfu.vec.data = ksp * rhs
gfu.vec.Cumulate()
Best,
Guosheng
4 years 3 months ago #3019
by lkogler
You could try writing trace-files on some ranks and looking at those to get an idea of where the problem comes from
Best,
Lukas
Replied by lkogler on topic mumps solver failed
No idea. Are you measuring simply the time the entire job takes? Bigger jobs can take longer to start up.I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?
You could try writing trace-files on some ranks and looking at those to get an idea of where the problem comes from
Code:
with Taskmanager(pajetrace=10*1024*1024):
What are you doing with the vector? Parallel NGSolve objects should accept a CUMULATED or DISTRIBUTED vector as input and do the conversion automatically if needed. But if you are doing something directly to the values you need to cumulate it.One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.
Best,
Lukas
- Guosheng Fu
- Topic Author
- Offline
- Elite Member
Less
More
- Thank you received: 6
4 years 3 months ago #3021
by Guosheng Fu
I am simply using a static condensation approach for linear system solver. Without Cumulate, the output error is not correct. (see attached code)
For timing the solver, I am simply using python timer for the cost of linear system solve:
Iterative solver just product unexpected behavior for me.
Attached is a simply hybrid-Poisson solver with hypre preconditioner that fails to be scalable on my machine.
The cost of linear system solver as recorded as follows:
You can clearly see the downgraded performance from 48 cores to 96 cores although # of iterations is still the same. With 192 cpus, the cost is even worse.This result simply does not make any sense to me.
Also, with more cpus, it takes more time to set the hypre preconditioner.
(~9s for 48 cores, ~48s for 96 cores, and ~80 s for 192 cores)
Best,
Guosheng
Replied by Guosheng Fu on topic mumps solver failed
What are you doing with the vector? Parallel NGSolve objects should accept a CUMULATED or DISTRIBUTED vector as input and do the conversion automatically if needed. But if you are doing something directly to the values you need to cumulate it.
I am simply using a static condensation approach for linear system solver. Without Cumulate, the output error is not correct. (see attached code)
For timing the solver, I am simply using python timer for the cost of linear system solve:
Code:
import time
...
t0 = time.time()
gfu.vec.data = ksp * rhs
t1 = time.time()
gfu.vec.Cumulate() # cumulate, this line is necessary
gfu.vec.data += a.harmonic_extension * gfu.vec
gfu.vec.data += a.inner_solve * rhs
if rank==0:
print("cost: ", t1-t0)
Iterative solver just product unexpected behavior for me.
Attached is a simply hybrid-Poisson solver with hypre preconditioner that fails to be scalable on my machine.
The cost of linear system solver as recorded as follows:
Code:
mpirun -np $NSLOTS ngspy mpi_hybrid.py
12 cpus: 5.7 s
24 cpus: 3.5 s
results for three separate runs :
48 cpus: [2.3, 2.4s, 3.2 s]
96 cpus: [11 s, 8s, 8.4s] [speed is degraded from 48 cpus]
192 cpus: [19 s, 31s, (larger than 100s)]
Also, with more cpus, it takes more time to set the hypre preconditioner.
(~9s for 48 cores, ~48s for 96 cores, and ~80 s for 192 cores)
Best,
Guosheng
Attachments:
Time to create page: 0.109 seconds