Building NGSolve with HYPRE support

1 year 2 months ago #3017 by JanWesterdiep
Hiya,

I shelved the MPI project for a while, but am back again experimenting. I cloned the latest NGSolve build, compiled with `-DUSE_MPI=ON -DUSE_HYPRE=ON` and ran Lukas' attached `precond_test_2020-02-05.py` as `mpirun -np 4 ngspy precond_test_2020-02-05.py`. I get a segfault, but this time it does give me a backtrace:
#1       at std::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char> >::str() const (in libngcore.dylib) (/Library/Developer/CommandLineTools/usr/include/c++/v1/new:252)
#2       at 0x0000a3ac (in libsystem_platform.dylib)
#3
#4       at HYPRE_IJMatrixGetValues (in libngcomp.dylib) + 125
#5       at HYPRE_IJMatrixAddToValues2 (in libngcomp.dylib) + 131
#6       at ngcomp::HyprePreconditioner::Mult(ngla::BaseVector const&, ngla::BaseVector&) const (in libngcomp.dylib) (/Applications/Netgen.app/Contents/Resources/include/core/profiler.hpp:157)
#7       at ngcomp::HyprePreconditioner::Mult(ngla::BaseVector const&, ngla::BaseVector&) const (in libngcomp.dylib) (/Applications/Netgen.app/Contents/Resources/include/core/mpi_wrapper.hpp:72)
#8       at ngcomp::S_BilinearForm<std::__1::complex<double> >::AddMatrixTP(std::__1::complex<double>, ngla::BaseVector const&, ngla::BaseVector&, ngcore::LocalHeap&) const (in libngcomp.dylib) (/Library/Developer/CommandLineTools/usr/include/c++/v1/memory:2153)
#9       at ngcomp::BilinearForm::Assemble(ngcore::LocalHeap&) (in libngcomp.dylib) (/Library/Developer/CommandLineTools/usr/include/c++/v1/__locale:234)
#10
#11      at std::__1::vector<bool, std::__1::allocator<bool> >::resize(unsigned long, bool) (in libngcomp.dylib) (/Library/Developer/CommandLineTools/usr/include/c++/v1/__bit_reference:0)
#12      at _PyObject_CallFunctionVa (in Python) + 0
#13      at _PyFunction_FastCallKeywords (in Python) + 267
#14      at call_function (in Python) + 1051
#15      at convertitem (in Python) + 8476
#16      at compiler_enter_scope (in Python) + 242
#17      at PyObject_Call (in Python) + 114
#18      at _PyEval_EvalCodeWithName (in Python) + 675
#19      at convertitem (in Python) + 9379
#20      at compiler_enter_scope (in Python) + 242
#21      at PyEval_EvalFrame (in Python) + 11
#22      at PyRun_StringFlags (in Python) + 100
#23      at PyRun_InteractiveOneObject (in Python) + 11
#24      at ScandirIterator_exit (in Python) + 0
#25      at Py_GetArgcArgv (in Python) + 19
#26      at dyld3::OverflowSafeArray<dyld3::closure::Image::BindPattern, 4294967295ul>::growTo(unsigned long) (in libdyld.dylib) + 56

Any ideas?

Please Log in or Create an account to join the conversation.

1 year 2 months ago - 1 year 2 months ago #3018 by lkogler
I cannot reproduce this issue on any machine I tried it on.

Can you tell me what OS and compiler (+version) you are using?


Best,
Lukas

Please Log in or Create an account to join the conversation.

1 year 2 months ago #3024 by JanWesterdiep
$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.0 (clang-1100.0.33.17)
Target: x86_64-apple-darwin18.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ sw_vers -productVersion
10.14.4

What is kind of strange, though, is that `nsgpy` (and the entirety of the Netgen.app`) seems to be linked with python 3.7,
$ ngspy
Python 3.7.4 (v3.7.4:e09359112e, Jul  8 2019, 14:54:52)

$ python3
Python 3.8.5 (v3.8.5:580fbb018f, Jul 20 2020, 12:11:27)
even though I followed the website to install python 3.8 (the two versions now live side-by-side, though). Could this be a source of problems?

BTW, when I run `mpirun -np 4 ngspy mpi_poisson.py` with "hypre", I get the same problem. When I use the "bddc" preconditioner (and masterinverse), it runs OK. When I substitute `ngspy` with `python3.7`, things also run fine, but with `python3` (so `python3.8`), I again get a segfault, though at a much earlier stage. The stack trace is
$ mpirun -np 4 python3.8 mpi_poisson.py
Caught SIGSEGV: segmentation fault
Collecting backtrace...
Caught SIGSEGV: segmentation fault
Collecting backtrace...
Caught SIGSEGV: segmentation fault
Collecting backtrace...
Caught SIGSEGV: segmentation fault
Collecting backtrace...
#1       at std::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char> >::str() const (in libngcore.dylib
) (/Library/Developer/CommandLineTools/usr/include/c++/v1/new:252)
#2       at 0x0000a3ac (in libsystem_platform.dylib)
#3       at we_askshell.cmd (in libsystem_c.dylib) + 974
#4       at 0x00005ede (in libngpy.so)
#5       at PyInit_libngpy (in libngpy.so) (<ngsolve dir>/ngsolve-src/external_dependencies/netgen/ng/netgenp
y.cpp:33)
#6       at _PyWideStringList_Copy (in Python) + 47
#7       at _imp__fix_co_filename (in Python) + 64
#8       at cfunction_vectorcall_FASTCALL_KEYWORDS (in Python) + 112
#9       at PyVectorcall_Call (in Python) + 260
#10      at PyErr_SetFromErrnoWithFilenameObjects (in Python) + 100
#11      at PyCodec_ReplaceErrors (in Python) + 559
#12      at _PyMethodDef_RawFastCallDict (in Python) + 331
#13      at call_function (in Python) + 1087
#14      at context_tp_new (in Python) + 19
#15      at _PyFunction_Vectorcall (in Python) + 19
#16      at call_function (in Python) + 1087
#17      at context_tp_richcompare (in Python) + 93
#18      at _PyFunction_Vectorcall (in Python) + 19
.....
#88      at Py_BytesMain (in Python) + 62

Please Log in or Create an account to join the conversation.

1 year 2 months ago #3025 by lkogler
In principle python3.7 also works. However, compiling ngsolve with python3.7 and then running with 3.8 can absolutely lead to problems.

I will try it with clang and get back to you.

Best,
Lukas

Please Log in or Create an account to join the conversation.

1 year 2 months ago - 1 year 2 months ago #3026 by JanWesterdiep
Thanks alot :-) if you can't reproduce it with clang either, I can move to a linux machine.

Please Log in or Create an account to join the conversation.

1 year 2 months ago #3033 by lkogler
I tried it with clang-11 now, still no problem here.

I am now thinking that it might be connected to the fortran compiler (MUMPS is a fortran library). What are you using there?

Best,
Lukas

Please Log in or Create an account to join the conversation.

© 2019 Netgen/NGSolve