- Thank you received: 17
mumps solver failed
5 years 1 month ago - 5 years 1 month ago #2998
by lkogler
Replied by lkogler on topic mumps solver failed
Okay, this is a new error for me too. I have never worked with MPICH.
It looks ike MKLMPI_Get_wrappers is in the MKL BLACS library, so maybe NGSolve is linked with the wrong one. MPICH should be linked against libmkl_blacs_intelmpi.., not against libmkl_blacs_openmpi...
Could you upload your CMakeCache.txt from the NGSolve build directory?
You could also try to manually load the SDL library by putting this at the top of your python file:
I know it is not a very clean solution, but it usually works for me.
If this does not bother you too much you could even build NGSolve with dynamic MKL linking and MKL_SDL=ON.
However, that does not work with the "inverse=" flag, but has to be accessed by "coarsetype=petsc_pc". What petsc_pc does can then be configured via flags.
Best,
Lukas
It looks ike MKLMPI_Get_wrappers is in the MKL BLACS library, so maybe NGSolve is linked with the wrong one. MPICH should be linked against libmkl_blacs_intelmpi.., not against libmkl_blacs_openmpi...
Could you upload your CMakeCache.txt from the NGSolve build directory?
You could also try to manually load the SDL library by putting this at the top of your python file:
Code:
from ctypes import CDLL, RTLD_GLOBAL
CDLL(<path_to_libmkl_rt.so>, RTDL_GLOBAL)
If this does not bother you too much you could even build NGSolve with dynamic MKL linking and MKL_SDL=ON.
Only "mumps" and "masterinverse". Some more direct solvers are available through the PETSc interface.Also, a side question, what options are available for the inverse flag of bddc solver that is suitable for MPI?
However, that does not work with the "inverse=" flag, but has to be accessed by "coarsetype=petsc_pc". What petsc_pc does can then be configured via flags.
Best,
Lukas
Last edit: 5 years 1 month ago by lkogler.
- Guosheng Fu
-
Topic Author
- Offline
- Elite Member
-
Less
More
- Thank you received: 6
5 years 1 month ago #3001
by Guosheng Fu
Replied by Guosheng Fu on topic mumps solver failed
Yes, I am using intelmpi from MKL, otherwise the build stage will complain.
But strangely, I can not reproduce the bug anymore today.
What I did is re-install petsc to include mumps and superlu_dist by the following configuration file (adapted from yours) (I can not link the scalapack lib from MKL, so I downloaded it):
After that, I can use mumps and superlu_dist direct solver via ngs-petsc. And this seems fixed my mumps error. I guess the code just link to the petsc mumps first, rather than the ngsolve-build mumps. Is it correct?
I recall that building with MKL_SDL=ON failed on my machine once, so I will stick with the current version for now.
I am looking into a scalable solver for the hybrid-mixed poisson operator. Here is what I got for the timing of system solver for a test problem with RT0-P0 (code is attached):
24 cores: ~4s
48 cores: ~3s
96 cores: ~5s
192 cores: >100s
I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?
Best,
Guosheng
But strangely, I can not reproduce the bug anymore today.
What I did is re-install petsc to include mumps and superlu_dist by the following configuration file (adapted from yours) (I can not link the scalapack lib from MKL, so I downloaded it):
Code:
20 ./configure --with-debugging=0 \
21 --PETSC_ARCH="arch-linux-c-opt" \
22 --prefix=$HOME/NGSolve-X/petsc \
23 --with-fortran-bindings=0 \
24 --with-fortran-type-initialize=0 \
25 --with-shared-libraries \
26 --with-mpi=1 --with-mpi-dir=/XXX/mpich/3.3/gcc/8.3.0 \
27 --with-cxx-dialect=c++11 \
28 --with-metis=1 --with-metis-dir=XXX/ngbuild-mpi/dependencies/parmetis \
29 --with-parmetis=1 --with-parmetis-dir=XXX/ngbuild-mpi/dependencies/parmetis \
30 --with-hypre=1 --download-hypre=yes --download-hypre-shared=0 \
31 --with-blaslapack=1 --with-blaslapack-dir=XXX/intel/19.0/mkl \
32 --with-zlib=1 \
33 --with-ml=1 --download-ml=yes --download-ml-shared=0 \
34 --with-ilu=1 \
35 --with-cmake=1 \
36 --download-slepc=yes \
37 --with-superlu=1 --download-superlu=yes \
38 --with-mumps=1 --download-mumps=yes --download-mumps-shared=0 \
39 --with-scalapack=1 --download-scalapack=yes \
40 --with-superlu_dist=1 --download-superlu_dist=yes --download-superlu_dist-shared=0
After that, I can use mumps and superlu_dist direct solver via ngs-petsc. And this seems fixed my mumps error. I guess the code just link to the petsc mumps first, rather than the ngsolve-build mumps. Is it correct?
I recall that building with MKL_SDL=ON failed on my machine once, so I will stick with the current version for now.
I am looking into a scalable solver for the hybrid-mixed poisson operator. Here is what I got for the timing of system solver for a test problem with RT0-P0 (code is attached):
Code:
mpirun -np $NSLOTS ngspy mpi_hybrid.py 0 60
48 cores: ~3s
96 cores: ~5s
192 cores: >100s
I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?
Best,
Guosheng
Attachments:
- Guosheng Fu
-
Topic Author
- Offline
- Elite Member
-
Less
More
- Thank you received: 6
5 years 1 month ago #3014
by Guosheng Fu
Replied by Guosheng Fu on topic mumps solver failed
Just a follow-up on the hybrid-mixed solver.
I ended up using PETSc's ksp solver with a mumps direct solver as the preconditioner:
As I didn't find a good preconditioner for the high-order hybrid-mixed matrix.
One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.
Best,
Guosheng
I ended up using PETSc's ksp solver with a mumps direct solver as the preconditioner:
Code:
ksp = petsc.KSP(mat=mat_wrap, name="someksp", petsc_options={
"ksp_type": "preonly",
"pc_type": "cholesky",
"pc_factor_mat_solver_type": "mumps"})
One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.
Code:
gfu.vec.data = ksp * rhs
gfu.vec.Cumulate()
Best,
Guosheng
5 years 1 month ago #3019
by lkogler
You could try writing trace-files on some ranks and looking at those to get an idea of where the problem comes from
Best,
Lukas
Replied by lkogler on topic mumps solver failed
No idea. Are you measuring simply the time the entire job takes? Bigger jobs can take longer to start up.I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?
You could try writing trace-files on some ranks and looking at those to get an idea of where the problem comes from
Code:
with Taskmanager(pajetrace=10*1024*1024):
What are you doing with the vector? Parallel NGSolve objects should accept a CUMULATED or DISTRIBUTED vector as input and do the conversion automatically if needed. But if you are doing something directly to the values you need to cumulate it.One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.
Best,
Lukas
- Guosheng Fu
-
Topic Author
- Offline
- Elite Member
-
Less
More
- Thank you received: 6
5 years 1 month ago #3021
by Guosheng Fu
I am simply using a static condensation approach for linear system solver. Without Cumulate, the output error is not correct. (see attached code)
For timing the solver, I am simply using python timer for the cost of linear system solve:
Iterative solver just product unexpected behavior for me.
Attached is a simply hybrid-Poisson solver with hypre preconditioner that fails to be scalable on my machine.
The cost of linear system solver as recorded as follows:
You can clearly see the downgraded performance from 48 cores to 96 cores although # of iterations is still the same. With 192 cpus, the cost is even worse.This result simply does not make any sense to me.
Also, with more cpus, it takes more time to set the hypre preconditioner.
(~9s for 48 cores, ~48s for 96 cores, and ~80 s for 192 cores)
Best,
Guosheng
Replied by Guosheng Fu on topic mumps solver failed
What are you doing with the vector? Parallel NGSolve objects should accept a CUMULATED or DISTRIBUTED vector as input and do the conversion automatically if needed. But if you are doing something directly to the values you need to cumulate it.
I am simply using a static condensation approach for linear system solver. Without Cumulate, the output error is not correct. (see attached code)
For timing the solver, I am simply using python timer for the cost of linear system solve:
Code:
import time
...
t0 = time.time()
gfu.vec.data = ksp * rhs
t1 = time.time()
gfu.vec.Cumulate() # cumulate, this line is necessary
gfu.vec.data += a.harmonic_extension * gfu.vec
gfu.vec.data += a.inner_solve * rhs
if rank==0:
print("cost: ", t1-t0)
Iterative solver just product unexpected behavior for me.
Attached is a simply hybrid-Poisson solver with hypre preconditioner that fails to be scalable on my machine.
The cost of linear system solver as recorded as follows:
Code:
mpirun -np $NSLOTS ngspy mpi_hybrid.py
12 cpus: 5.7 s
24 cpus: 3.5 s
results for three separate runs :
48 cpus: [2.3, 2.4s, 3.2 s]
96 cpus: [11 s, 8s, 8.4s] [speed is degraded from 48 cpus]
192 cpus: [19 s, 31s, (larger than 100s)]
Also, with more cpus, it takes more time to set the hypre preconditioner.
(~9s for 48 cores, ~48s for 96 cores, and ~80 s for 192 cores)
Best,
Guosheng
Attachments:
5 years 1 month ago #3023
by lkogler
Replied by lkogler on topic mumps solver failed
Ah, I see. The problem here is that harmonic_extension and harmonix_extension_trans are local operators. They directly access the locally stored vector values. You can wrap a ParallelMatrix around them to make the code a little more elegant
Now "hex" is an operator that takes a cumulated vector as input and has a cumulated vector as an output. So, when you give it the distributed solution vector from ksp, it automatically cumulates it. You can also write down the entire operation as a single operator:
Then you can just write
(Nothing wrong with your code, I just find this a little more neat)
I will have a look at the timings.
Best,
Lukas
Code:
hex = ParallelMatrix(harmonic_extension, row_pardods=..., col_pardofs=..., op_type=C2C)
hext = ParallelMatrix(harmonic_extension_trans, row_pardods=..., col_pardofs=..., op_type=D2D)
Code:
hex, hext, aiii = a.harmonic_extension, a.harmonic_extension_trans, a.inner_solve
Id = IdentityMatrix(a.mat.height)
if a.space.mesh.comm.size == 1:
full_precond = ((Id + hex) @ precond @ (Id + hext)) + aiii
else:
Ihex = ParallelMatrix(Id + hex, row_pardofs = a.mat.row_pardofs, col_pardofs = a.mat.row_pardofs, op = ngs.ParallelMatrix.C2C)
Ihext = ParallelMatrix(Id + hext, row_pardofs = self.a.mat.row_pardofs, col_pardofs = self.a.mat.row_pardofs, op = ngs.ParallelMatrix.D2D)
Isolve = ParallelMatrix(aiii, row_pardofs = self.a.mat.row_pardofs, col_pardofs = self.a.mat.row_pardofs, op = ngs.ParallelMatrix.D2C)
full_precond = ( Ihex @ precond @ Ihext ) + aiii
Code:
gfu.vec.data = full_precond * f.vec
(Nothing wrong with your code, I just find this a little more neat)
I will have a look at the timings.
Best,
Lukas
Time to create page: 0.104 seconds