Using MPI without building from source

2 weeks 15 hours ago #2054 by dfoiles
Hello,

I have been trying to run NGsolve in parallel on an HPC and I'd like to know if it's possible to use the launchpad download of NGsolve with MPI? I have been trying to compile it from source both on the cluster and in containers, but have run into errors on both that I have been unable to fix.

Thanks

Please Log in or Create an account to join the conversation.

2 weeks 8 hours ago #2055 by matthiash
Hello,

The setup on HPC clusters varies a lot, thus we do not offer prebuilt binaries for such environments.
Often, the default compiler on clusters is too old, which OS/compiler were you using?
For further hints I need your configuration (cmake) command and the complete command line output.

Best,
Matthias

Please Log in or Create an account to join the conversation.

1 week 6 days ago - 1 week 6 days ago #2061 by dfoiles
The best attempt I've had so far is building NGSolve in a singularity container with an Ubuntu environment. In it, I've installed the packages that are listed on the "Build on Linux" page as well as openmpi-bin, libopenmpi-dev, and numpy/scipy. My cmake command is:
cmake -DUSE_MPI=ON -DUSE_GUI=OFF -DCMAKE_INSTALL_PREFIX=${BASEDIR}/ngsolve-install ${BASEDIR}/ngsolve-src

The error message that I've received is quite long and I don't know what parts are relevant, so I'll attach a text file of the whole message. However, I think the important line is:
error: inlining failed in call to always_inline '__m256d _mm256_fmadd_pd(__m256d, __m256d, __m256d)': target specific option mismatch

I've searched for this error message myself and I've seen people suggest adding flags like "-msse4.1", "-march=native", "-march=nehalem", and "-mavx" to CMAKE_CXX_FLAGS. I've tried this and have still gotten the same error.

Thank you for your willingness to help.

File Attachment:

File Name: Error.txt
File Size:4 KB
Attachments:

Please Log in or Create an account to join the conversation.

1 week 6 days ago #2062 by joachim
edit ngsolve/ngstd.simd.hpp, line 1047

replace #ifdef __AVX2__ by
#ifdef __FMA__

and again in line 1065

Joachim
The following user(s) said Thank You: dfoiles

Please Log in or Create an account to join the conversation.

1 week 6 days ago #2065 by dfoiles
That did the trick. Thank you very much for your help.

Please Log in or Create an account to join the conversation.

1 week 2 days ago #2110 by dfoiles
Sorry to bother you again. Everything in the container is built and I've moved it to the HPC. I can successfully run the MPI tutorials provided in the source, but when I try to run my program, I get segmentation faults. Specifically, I get:
[node4][[23534,1],3][btl_tcp_frag.c:237:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[node4][[23534,1],4][btl_tcp_frag.c:237:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
Caught SIGSEGV: segmentation fault
Collecting backtrace...
#1      /opt/ngsuite/ngsolve-install/lib/python3/dist-packages/netgen/../../../libngcore.so(+0x1864b) [0x7fa70f33664b]
#2      /lib/x86_64-linux-gnu/libc.so.6(+0x43f60) [0x7fa7109bef60]

I have attached a copy of the code that I used.

Thank you for your help.

File Attachment:

File Name: Nanosphere...11-11.py
File Size:5 KB
Attachments:

Please Log in or Create an account to join the conversation.

© 2019 Netgen/NGSolve