- Thank you received: 0
Building NGSolve with HYPRE support
- JanWesterdiep
- Topic Author
- Offline
- Junior Member
Less
More
4 years 9 months ago - 4 years 9 months ago #2335
by JanWesterdiep
Building NGSolve with HYPRE support was created by JanWesterdiep
Hello!
I want to experiment with some different preconditioners. In the documentation, somewhere the "hypre" and "hypre_ams" preconditioners are mentioned. I found in a semi-recent github commit how to configure cmake with hypre (`cmake -DUSE_HYPRE ../ngsolve-src`).
Pulling the most recent NGSolve version from GitHub (resulting in version NGSolve-6.2.2001-6-g8bbe2629), this command runs fine and downloads hypre, but upon running the subsequent `make`, I get the following error:
Any idea how to get around this compile error?
I want to experiment with some different preconditioners. In the documentation, somewhere the "hypre" and "hypre_ams" preconditioners are mentioned. I found in a semi-recent github commit how to configure cmake with hypre (`cmake -DUSE_HYPRE ../ngsolve-src`).
Pulling the most recent NGSolve version from GitHub (resulting in version NGSolve-6.2.2001-6-g8bbe2629), this command runs fine and downloads hypre, but upon running the subsequent `make`, I get the following error:
Code:
[ 62%] Building CXX object comp/CMakeFiles/ngcomp.dir/bddc.cpp.o
ngsolve/ngsolve-src/comp/bddc.cpp:317:28: error: reference to 'MPI_Op' is ambiguous
AllReduceDofData (weight, MPI_SUM, fes->GetParallelDofs());
^
/usr/local/include/mpi.h:1130:40: note: expanded from macro 'MPI_SUM'
#define MPI_SUM OMPI_PREDEFINED_GLOBAL(MPI_Op, ompi_mpi_op_sum)
^
/usr/local/include/mpi.h:406:27: note: candidate found by name lookup is 'MPI_Op'
typedef struct ompi_op_t *MPI_Op;
^
/Applications/Netgen.app/Contents/Resources/include/core/mpi_wrapper.hpp:265:15: note: candidate found by name lookup is 'ngcore::MPI_Op'
typedef int MPI_Op;
^
ngsolve/ngsolve-src/comp/bddc.cpp:317:28: error: static_cast from 'void *' to 'ngcore::MPI_Op' (aka 'int') is not allowed
AllReduceDofData (weight, MPI_SUM, fes->GetParallelDofs());
^~~~~~~
/usr/local/include/mpi.h:1130:17: note: expanded from macro 'MPI_SUM'
#define MPI_SUM OMPI_PREDEFINED_GLOBAL(MPI_Op, ompi_mpi_op_sum)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/include/mpi.h:381:47: note: expanded from macro 'OMPI_PREDEFINED_GLOBAL'
#define OMPI_PREDEFINED_GLOBAL(type, global) (static_cast<type> (static_cast<void *> (&(global))))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.
Last edit: 4 years 9 months ago by JanWesterdiep.
4 years 9 months ago #2336
by lkogler
Replied by lkogler on topic Building NGSolve with HYPRE support
It looks like you did not turn on MPI (-DUSE_MPI=ON). This is needed by hypre.
We should throw an error when MPI is turned off and hypre is turned on.
We should throw an error when MPI is turned off and hypre is turned on.
- JanWesterdiep
- Topic Author
- Offline
- Junior Member
Less
More
- Thank you received: 0
4 years 9 months ago #2343
by JanWesterdiep
Replied by JanWesterdiep on topic Building NGSolve with HYPRE support
Hey! Yes, perfect, that worked
I installed NGSolve from source with HYPRE and MPI support on two machines, Ubuntu 18 and MacOS 10.14.
Whenever I run *anything*, I get the following error:
I realize this is probably very difficult to debug, also for you, so I have a more general question: is there any way of getting a stack trace at this point? Running this through valgrind produces another very mysterious error:
In the mean time, I will try to checkout the NGSolve repo at the moment HYPRE support was first introduced, and see if that fixes any of my problems.
Thank you for your continued support
I installed NGSolve from source with HYPRE and MPI support on two machines, Ubuntu 18 and MacOS 10.14.
Whenever I run *anything*, I get the following error:
Code:
$ netgen navierstokes.py
NETGEN-6.2-dev
Developed by Joachim Schoeberl at
2010-xxxx Vienna University of Technology
2006-2010 RWTH Aachen University
1996-2006 Johannes Kepler University Linz
Including MPI version 3.1
Problem in Tk_Init:
result = no display name and no $DISPLAY environment variable
optfile ./ng.opt does not exist - using default values
togl-version : 2
no OpenGL
loading ngsolve library
NGSolve-6.2.2001-11-g0929bd80
Using Lapack
Including sparse direct solver UMFPACK
Running parallel using 1 thread(s)
(should) load python file 'navierstokes.py'
loading ngsolve library
NGSolve-6.2.2001-11-g0929bd80
Using Lapack
Including sparse direct solver UMFPACK
Running parallel using 1 thread(s)
Caught SIGSEGV: segmentation fault
I realize this is probably very difficult to debug, also for you, so I have a more general question: is there any way of getting a stack trace at this point? Running this through valgrind produces another very mysterious error:
Code:
$ valgrind !!
valgrind netgen navierstokes.py
==16773== Memcheck, a memory error detector
==16773== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16773== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==16773== Command: netgen navierstokes.py
==16773==
vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0x7D 0x28 0xEF 0xC0 0x83 0xFE 0x8 0xB8
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==16773== valgrind: Unrecognised instruction at address 0x5f81400.
==16773== at 0x5F81400: __mutex_base (std_mutex.h:68)
==16773== by 0x5F81400: mutex (std_mutex.h:94)
==16773== by 0x5F81400: netgen::BlockAllocator::BlockAllocator(unsigned int, unsigned int) (optmem.cpp:20)
==16773== by 0x5D68679: __static_initialization_and_destruction_0 (localh.cpp:27)
==16773== by 0x5D68679: _GLOBAL__sub_I_localh.cpp (localh.cpp:800)
==16773== by 0x4010732: call_init (dl-init.c:72)
==16773== by 0x4010732: _dl_init (dl-init.c:119)
==16773== by 0x40010C9: ??? (in /lib/x86_64-linux-gnu/ld-2.27.so)
==16773== by 0x1: ???
==16773== by 0x1FFF00047E: ???
==16773== by 0x1FFF000485: ???
==16773== Your program just tried to execute an instruction that Valgrind
==16773== did not recognise. There are two possible reasons for this.
==16773== 1. Your program has a bug and erroneously jumped to a non-code
==16773== location. If you are running Memcheck and you just saw a
==16773== warning about a bad jump, it's probably your program's fault.
==16773== 2. The instruction is legitimate but Valgrind doesn't handle it,
==16773== i.e. it's Valgrind's fault. If you think this is the case or
==16773== you are not sure, please let us know and we'll try to fix it.
==16773== Either way, Valgrind will now raise a SIGILL signal which will
==16773== probably kill your program.
Caught SIGILL: illegal instruction
==16773==
==16773== HEAP SUMMARY:
==16773== in use at exit: 2,916 bytes in 62 blocks
==16773== total heap usage: 87 allocs, 25 frees, 881,786 bytes allocated
==16773==
==16773== LEAK SUMMARY:
==16773== definitely lost: 0 bytes in 0 blocks
==16773== indirectly lost: 0 bytes in 0 blocks
==16773== possibly lost: 160 bytes in 2 blocks
==16773== still reachable: 2,756 bytes in 60 blocks
==16773== suppressed: 0 bytes in 0 blocks
==16773== Rerun with --leak-check=full to see details of leaked memory
==16773==
==16773== For counts of detected and suppressed errors, rerun with: -v
==16773== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
In the mean time, I will try to checkout the NGSolve repo at the moment HYPRE support was first introduced, and see if that fixes any of my problems.
Thank you for your continued support
4 years 9 months ago #2344
by lkogler
Replied by lkogler on topic Building NGSolve with HYPRE support
This valgrind error happens when valgrind does not know some instruction. That happens occasionally on newer hardware and might not be related to the segfault you are getting. Could you try gdb instead?
Also, could you try "ngspy navierstokes.py"? It looks like there is some error with the GUI.
Best,
Lukas
Also, could you try "ngspy navierstokes.py"? It looks like there is some error with the GUI.
Best,
Lukas
- JanWesterdiep
- Topic Author
- Offline
- Junior Member
Less
More
- Thank you received: 0
4 years 9 months ago #2345
by JanWesterdiep
Replied by JanWesterdiep on topic Building NGSolve with HYPRE support
Hey Lukas, yeah I figured but am more familiar with valgrind than GDB. Thanks for the proposal, I will try to learn how to use it.
`ngspy navierstokes.py` runs
Now the interesting stuff begins: when I take the preconditioner example from ngsolve.org/docu/latest/i-tutorials/unit.../preconditioner.html and run it with some builtin preconditioner like "local" or "h1amg", it runs fine. When I run it with "hypre", I get the following error:
Unfortunately, the exception seems to be caught by Python or something before the process exists, because running it through GDB produces no stack trace. Any clues?
`ngspy navierstokes.py` runs
Now the interesting stuff begins: when I take the preconditioner example from ngsolve.org/docu/latest/i-tutorials/unit.../preconditioner.html and run it with some builtin preconditioner like "local" or "h1amg", it runs fine. When I run it with "hypre", I get the following error:
Code:
$ ngspy precond_test.py
Generate Mesh from spline geometry
Boundary mesh done, np = 8
CalcLocalH: 8 Points 0 Elements 0 Surface Elements
Meshing domain 1 / 1
load internal triangle rules
Surface meshing done
Edgeswapping, topological
Smoothing
Split improve
Combine improve
Smoothing
Edgeswapping, metric
Smoothing
Split improve
Combine improve
Smoothing
Edgeswapping, metric
Smoothing
Split improve
Combine improve
Smoothing
Update mesh topology
Update clusters
assemble VOL element 6/6
assemble VOL element 6/6
Setup Hypre preconditioner
Traceback (most recent call last):
File "precond_test.py", line 61, in <module>
print(SolveProblem(levels=5, precond="hypre"))
File "precond_test.py", line 41, in SolveProblem
a.Assemble()
netgen.libngpy._meshing.NgException: std::bad_cast
in Assemble BilinearForm 'biform_from_py'
Unfortunately, the exception seems to be caught by Python or something before the process exists, because running it through GDB produces no stack trace. Any clues?
4 years 9 months ago - 4 years 9 months ago #2346
by lkogler
Replied by lkogler on topic Building NGSolve with HYPRE support
in the shell, run:
"gdb python3"
in gdb, run:
"set breakpoint pendong on"
"break RangeException"
"run navierstokes.py"
I cannot reproduce this error with the newest Netgen/NGSolve version.
Are you, by any chance, running on a computer with AVX512? And which compiler are you using? We recently ran into issues with gcc 9.2 andf AVX512, but now this combination should throw an error.
"gdb python3"
in gdb, run:
"set breakpoint pendong on"
"break RangeException"
"run navierstokes.py"
I cannot reproduce this error with the newest Netgen/NGSolve version.
Are you, by any chance, running on a computer with AVX512? And which compiler are you using? We recently ran into issues with gcc 9.2 andf AVX512, but now this combination should throw an error.
Last edit: 4 years 9 months ago by lkogler.
Time to create page: 0.101 seconds