Python3 error for NGSolve-6.2.1910-2-g9776705 (CentOS 7)

More
4 years 10 months ago #2265 by gcdiwan
Dear NGSolve developers;

I am testing the NGSolve on CentOS Linux 7 (Core) (python 3.8.0) and run into memory issues. Specifically i am trying the tutorial script under 1.7 (complex valued waves). It runs fine locally on my desktop but returns following error message on the CentOS server:

optfile ./ng.opt does not exist - using default values
togl-version : 2
loading ngsolve library
NGSolve-6.2.1910-2-g9776705
Using Lapack
Including sparse direct solver UMFPACK
Running parallel using 16 thread(s)
importing NGSolve-6.2.1910-2-g9776705
Generate Mesh from spline geometry
*** Error in `python3': free(): invalid pointer: 0x00007f23501a3b20 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7c619)[0x7f238f4f3619]
/lib64/libLLVM-3.9-mesa.so(_ZNSt6locale5_Impl16_M_install_facetEPKNS_2idEPKNS_5facetE+0x142)[0x7f234f673fa2]
/lib64/libLLVM-3.9-mesa.so(_ZNSt6locale5_ImplC1Em+0x1e3)[0x7f234f674433]
/lib64/libLLVM-3.9-mesa.so(_ZNSt6locale18_S_initialize_onceEv+0x15)[0x7f234f6753a5]
/lib64/libpthread.so.0(pthread_once+0x50)[0x7f238ff4fe20]
/lib64/libLLVM-3.9-mesa.so(_ZNSt6locale13_S_initializeEv+0x21)[0x7f234f6753f1]
/lib64/libLLVM-3.9-mesa.so(_ZNSt6localeC2Ev+0x13)[0x7f234f675433]
/opt/apps/alces/ngsolve/bin/../lib/libngcomp.so(_ZN6ngcomp6RegionC2ERKSt10shared_ptrINS_10MeshAccessEEN5ngfem4VorBENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0xcb)[0x7f2343a5df4b]
/opt/apps/alces/ngsolve/bin/../lib/libngcomp.so(+0x944c31)[0x7f2343ce9c31]
/opt/apps/alces/ngsolve/bin/../lib/libngcomp.so(+0x6e6713)[0x7f2343a8b713]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(PyCFunction_Call+0x128)[0x7f2390433d88]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(_PyObject_MakeTpCall+0xa1)[0x7f2390431821]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(+0x9f349)[0x7f2390435349]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(+0x103a1e)[0x7f2390499a1e]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(PyNumber_InPlaceAdd+0x30)[0x7f23904176e0]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x31b1)[0x7f2390405ac1]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0xa0a)[0x7f23905239ca]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x6d)[0x7f2390523bfd]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(PyEval_EvalCode+0x3b)[0x7f2390523c4b]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(PyRun_FileExFlags+0x105)[0x7f239056bdb5]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(PyRun_SimpleFileExFlags+0xe7)[0x7f239056bf87]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(Py_RunMain+0x80f)[0x7f239058af0f]
/opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/libpython3.8.so.1.0(Py_BytesMain+0x4f)[0x7f239058b2bf]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f238f498c05]
python3[0x40072e]
======= Memory map: ========
00400000-00401000 r-xp 00000000 00:28 1187525 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/bin/python3.8
00600000-00601000 r--p 00000000 00:28 1187525 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/bin/python3.8
00601000-00602000 rw-p 00001000 00:28 1187525 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/bin/python3.8
01bb9000-03780000 rw-p 00000000 00:00 0 [heap]
7f2330000000-7f2330021000 rw-p 00000000 00:00 0
7f2330021000-7f2334000000 ---p 00000000 00:00 0
7f2334765000-7f233e03c000 rw-p 00000000 00:00 0
7f233e17c000-7f233e284000 r-xp 00000000 00:28 1714792 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/unicodedata.cpython-38-x86_64-linux-gnu.so
7f233e284000-7f233e483000 ---p 00108000 00:28 1714792 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/unicodedata.cpython-38-x86_64-linux-gnu.so
7f233e483000-7f233e484000 r--p 00107000 00:28 1714792 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/unicodedata.cpython-38-x86_64-linux-gnu.so
7f233e484000-7f233e485000 rw-p 00108000 00:28 1714792 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/unicodedata.cpython-38-x86_64-linux-gnu.so
7f233e485000-7f233e506000 rw-p 00000000 00:00 0
7f233e5c6000-7f233e706000 rw-p 00000000 00:00 0
7f233e706000-7f233e714000 r-xp 00000000 00:28 1714768 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/array.cpython-38-x86_64-linux-gnu.so
7f233e714000-7f233e913000 ---p 0000e000 00:28 1714768 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/array.cpython-38-x86_64-linux-gnu.so
7f233e913000-7f233e914000 r--p 0000d000 00:28 1714768 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/array.cpython-38-x86_64-linux-gnu.so
7f233e914000-7f233e915000 rw-p 0000e000 00:28 1714768 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/array.cpython-38-x86_64-linux-gnu.so
7f233e915000-7f233ea56000 rw-p 00000000 00:00 0
7f233ea56000-7f233ea64000 r-xp 00000000 00:28 1714814 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/_asyncio.cpython-38-x86_64-linux-gnu.so
7f233ea64000-7f233ec63000 ---p 0000e000 00:28 1714814 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/_asyncio.cpython-38-x86_64-linux-gnu.so
7f233ec63000-7f233ec64000 r--p 0000d000 00:28 1714814 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/_asyncio.cpython-38-x86_64-linux-gnu.so
7f233ec64000-7f233ec66000 rw-p 0000e000 00:28 1714814 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/_asyncio.cpython-38-x86_64-linux-gnu.so
7f233ec66000-7f233ec67000 r-xp 00000000 00:28 1714761 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/_contextvars.cpython-38-x86_64-linux-gnu.so
7f233ec67000-7f233ee66000 ---p 00001000 00:28 1714761 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/_contextvars.cpython-38-x86_64-linux-gnu.so
7f233ee66000-7f233ee67000 r--p 00000000 00:28 1714761 /opt/gridware/depots/3dc76222/el7/pkg/apps/python/3.8.0/gcc-4.8.5/lib/python3.8/lib-dynload/_contextvars.cpython-38-x86_64-linux-gnu.soCaught SIGABRT: usually caused by abort() or assert()

I should mention, i have NGSolve-6.2.1909 on my desktop whereas on the cluster i have NGSolve-6.2.1910-2-g9776705. Both use NETGEN-6.2-dev. Any clues what could be wrong? It appears to me that only complex valued problem encounters the issue. I tried scattering.py script from github.com/NGSolve/ngsolve/blob/master/py_tutorials and that too ends up in error: Caught SIGILL: illegal instruction on the line where i have a.Assemble()
Could this be a problem with ngsolve version i am using on CentOS cluster?
Many thanks for the help.
More
4 years 10 months ago #2266 by matthiash
Hello,

Concerning your setup I have following thoughts:

- Did you compile the version on the cluster yourself? If so, which compiler, which configuration? (You can post/send the CMakeCache.txt files in your build dir to answer these questions)
- Usually we disable the GUI on cluster installations (cmake ... -DGUI=OFF)
- "SIGILL: illegal instruction" means that NGSolve was compiled and executed on different CPUs/architectures, is this the case in your setup?
- I experienced memory issues with Python3.8 about a month ago (don't remember on which OS though), could you also test Python 3.7?

Best,
Matthias
More
4 years 10 months ago #2267 by gcdiwan
Hi Matthias,
Thanks for your quick reply.
To answer your questions:

Did you compile the version on the cluster yourself? If so, which compiler, which configuration? (You can post/send the CMakeCache.txt files in your build dir to answer these questions)


No, the version on cluster is installed by the IT services and I am assuming they followed the steps given on the Downloads page. NGSolve is installed on the cluster as a module so i launch it using module load apps/ngsolve. I am attaching the CMakeCache.txt below.

Usually we disable the GUI on cluster installations (cmake ... -DGUI=OFF)

I can't find that option being set from what i see in the CMakeCache.txt

- "SIGILL: illegal instruction" means that NGSolve was compiled and executed on different CPUs/architectures, is this the case in your setup?


I was running the script on master node and that is where python complained of memory issues. Sorry but
i should have mentioned this at the outset: we have got multi-noded cluster and when i login to one of the nodes and launch the script (with complex valued functions), it appears to work fine now. I am surprised though that the poisson example works on the master node but not the one with complex space.

- I experienced memory issues with Python3.8 about a month ago (don't remember on which OS though), could you also test Python 3.7?

I only seem to have python versions 2.7 and 3.8 on the cluster.

File Attachment:

File Name: CMakeCache...1-07.txt
File Size:20 KB
Time to create page: 0.104 seconds