mumps solver failed

lkogler
Offline
Premium Member

5 years 1 month ago - 5 years 1 month ago #2998 by lkogler

Replied by lkogler on topic mumps solver failed

Okay, this is a new error for me too. I have never worked with MPICH.
It looks ike MKLMPI_Get_wrappers is in the MKL BLACS library, so maybe NGSolve is linked with the wrong one. MPICH should be linked against libmkl_blacs_intelmpi.., not against libmkl_blacs_openmpi...
Could you upload your CMakeCache.txt from the NGSolve build directory?

You could also try to manually load the SDL library by putting this at the top of your python file:

Code:

from ctypes import CDLL, RTLD_GLOBAL
CDLL(<path_to_libmkl_rt.so>, RTDL_GLOBAL)

I know it is not a very clean solution, but it usually works for me.
If this does not bother you too much you could even build NGSolve with dynamic MKL linking and MKL_SDL=ON.

Also, a side question, what options are available for the inverse flag of bddc solver that is suitable for MPI?

Only "mumps" and "masterinverse". Some more direct solvers are available through the PETSc interface.
However, that does not work with the "inverse=" flag, but has to be accessed by "coarsetype=petsc_pc". What petsc_pc does can then be configured via flags.

Best,
Lukas

Last edit: 5 years 1 month ago by lkogler.

Guosheng Fu
Topic Author
Offline
Elite Member

5 years 1 month ago #3001 by Guosheng Fu

Replied by Guosheng Fu on topic mumps solver failed

Yes, I am using intelmpi from MKL, otherwise the build stage will complain.

But strangely, I can not reproduce the bug anymore today.
What I did is re-install petsc to include mumps and superlu_dist by the following configuration file (adapted from yours) (I can not link the scalapack lib from MKL, so I downloaded it):

Code:

./configure --with-debugging=0 \
--PETSC_ARCH="arch-linux-c-opt" \
--prefix=$HOME/NGSolve-X/petsc \
--with-fortran-bindings=0 \
--with-fortran-type-initialize=0 \
--with-shared-libraries \
--with-mpi=1 --with-mpi-dir=/XXX/mpich/3.3/gcc/8.3.0 \
--with-cxx-dialect=c++11 \
--with-metis=1 --with-metis-dir=XXX/ngbuild-mpi/dependencies/parmetis \
--with-parmetis=1 --with-parmetis-dir=XXX/ngbuild-mpi/dependencies/parmetis \
--with-hypre=1 --download-hypre=yes --download-hypre-shared=0 \
--with-blaslapack=1 --with-blaslapack-dir=XXX/intel/19.0/mkl \
--with-zlib=1 \
--with-ml=1 --download-ml=yes --download-ml-shared=0 \
--with-ilu=1 \
--with-cmake=1 \
--download-slepc=yes \
--with-superlu=1 --download-superlu=yes \
--with-mumps=1 --download-mumps=yes --download-mumps-shared=0  \
--with-scalapack=1 --download-scalapack=yes \
--with-superlu_dist=1 --download-superlu_dist=yes --download-superlu_dist-shared=0

After that, I can use mumps and superlu_dist direct solver via ngs-petsc. And this seems fixed my mumps error. I guess the code just link to the petsc mumps first, rather than the ngsolve-build mumps. Is it correct?

I recall that building with MKL_SDL=ON failed on my machine once, so I will stick with the current version for now.

I am looking into a scalable solver for the hybrid-mixed poisson operator. Here is what I got for the timing of system solver for a test problem with RT0-P0 (code is attached):

Code:

mpirun -np $NSLOTS ngspy mpi_hybrid.py 0 60

24 cores: ~4s
48 cores: ~3s
96 cores: ~5s
192 cores: >100s

I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?

Best,
Guosheng

Attachments:

mpi_hybrid.py

Guosheng Fu
Topic Author
Offline
Elite Member

5 years 1 month ago #3014 by Guosheng Fu

Replied by Guosheng Fu on topic mumps solver failed

Just a follow-up on the hybrid-mixed solver.
I ended up using PETSc's ksp solver with a mumps direct solver as the preconditioner:

Code:

ksp = petsc.KSP(mat=mat_wrap, name="someksp", petsc_options={
         "ksp_type": "preonly",
         "pc_type": "cholesky",
         "pc_factor_mat_solver_type": "mumps"})

As I didn't find a good preconditioner for the high-order hybrid-mixed matrix.
One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.

Code:

gfu.vec.data = ksp * rhs
gfu.vec.Cumulate()

Best,
Guosheng

lkogler
Offline
Premium Member

5 years 1 month ago #3019 by lkogler

Replied by lkogler on topic mumps solver failed

I didn't see the expected speed-up when using more than 24 cores. (my machine is 24 core/node)
Do you know what's going on for my code?

No idea. Are you measuring simply the time the entire job takes? Bigger jobs can take longer to start up.
You could try writing trace-files on some ranks and looking at those to get an idea of where the problem comes from

Code:

with Taskmanager(pajetrace=10*1024*1024):

One thing I noticed is that the PETSc ksp solver returns "DISTRUBUTED" vector, and I need to manually convert it to a "CUMULATIVE" vector to make the code work.

What are you doing with the vector? Parallel NGSolve objects should accept a CUMULATED or DISTRIBUTED vector as input and do the conversion automatically if needed. But if you are doing something directly to the values you need to cumulate it.

Best,
Lukas

Guosheng Fu
Topic Author
Offline
Elite Member

5 years 1 month ago #3021 by Guosheng Fu

Replied by Guosheng Fu on topic mumps solver failed

What are you doing with the vector? Parallel NGSolve objects should accept a CUMULATED or DISTRIBUTED vector as input and do the conversion automatically if needed. But if you are doing something directly to the values you need to cumulate it.

I am simply using a static condensation approach for linear system solver. Without Cumulate, the output error is not correct. (see attached code)

For timing the solver, I am simply using python timer for the cost of linear system solve:

Code:

import time 
...
t0 = time.time()
gfu.vec.data = ksp * rhs
t1 = time.time()
gfu.vec.Cumulate() # cumulate, this line is necessary 
gfu.vec.data += a.harmonic_extension * gfu.vec
gfu.vec.data += a.inner_solve * rhs

if rank==0:
    print("cost: ", t1-t0)

Iterative solver just product unexpected behavior for me.
Attached is a simply hybrid-Poisson solver with hypre preconditioner that fails to be scalable on my machine.
The cost of linear system solver as recorded as follows:

Code:

mpirun -np $NSLOTS ngspy mpi_hybrid.py

cpus:  5.7 s
cpus:  3.5 s
results for three separate runs :
cpus:  [2.3,  2.4s, 3.2 s] 
cpus:  [11 s,  8s,   8.4s] [speed is degraded from 48 cpus]
cpus:  [19 s,  31s, (larger than 100s)] 

You can clearly see the downgraded performance from 48 cores to 96 cores although # of iterations is still the same. With 192 cpus, the cost is even worse.This result simply does not make any sense to me.
Also, with more cpus, it takes more time to set the hypre preconditioner.
(~9s for 48 cores, ~48s for 96 cores, and ~80 s for 192 cores)

Best,
Guosheng

Attachments:

..._2020-07-28.py

lkogler
Offline
Premium Member

5 years 1 month ago #3023 by lkogler

Replied by lkogler on topic mumps solver failed

Ah, I see. The problem here is that harmonic_extension and harmonix_extension_trans are local operators. They directly access the locally stored vector values. You can wrap a ParallelMatrix around them to make the code a little more elegant

Code:

hex  = ParallelMatrix(harmonic_extension, row_pardods=..., col_pardofs=..., op_type=C2C)
hext = ParallelMatrix(harmonic_extension_trans, row_pardods=..., col_pardofs=..., op_type=D2D)

Now "hex" is an operator that takes a cumulated vector as input and has a cumulated vector as an output. So, when you give it the distributed solution vector from ksp, it automatically cumulates it. You can also write down the entire operation as a single operator:

Code:

hex, hext, aiii  = a.harmonic_extension, a.harmonic_extension_trans, a.inner_solve
Id = IdentityMatrix(a.mat.height)
if a.space.mesh.comm.size == 1:
  full_precond = ((Id + hex) @ precond @ (Id + hext)) + aiii
else:
  Ihex = ParallelMatrix(Id + hex, row_pardofs = a.mat.row_pardofs, col_pardofs = a.mat.row_pardofs, op = ngs.ParallelMatrix.C2C)
  Ihext = ParallelMatrix(Id + hext, row_pardofs = self.a.mat.row_pardofs, col_pardofs = self.a.mat.row_pardofs, op = ngs.ParallelMatrix.D2D)
  Isolve = ParallelMatrix(aiii, row_pardofs = self.a.mat.row_pardofs, col_pardofs = self.a.mat.row_pardofs, op = ngs.ParallelMatrix.D2C)
  full_precond = ( Ihex @ precond @ Ihext ) + aiii

Then you can just write

Code:

gfu.vec.data = full_precond * f.vec

(Nothing wrong with your code, I just find this a little more neat)

I will have a look at the timings.

Best,
Lukas

Time to create page: 0.104 seconds