- Thank you received: 0

1 year 8 months ago #3392
by philipp
*MPI global integral* was created by *philipp*

Dear NGSolve community,

I need to evaluate a coefficient function that itself is an integral over the whole domain, but

this integral varies at every evaluation point (in the variational formulation, this is a double integral,

I posted a related question here: ngsolve.org/forum/ngspy-forum/1109-integral-over-product-space )

With sufficiently fine meshes this is a very expensive task, which I try to mitigate by using

MPI.

At the moment, I load the global mesh in every rank to compute this integral.

The PDE I am trying to solve is time-dependent and I need to use part of the solution from

the last time step in the integrand. Is there any way I can gather a grid function at every

rank, so the global values are available there?

Or is there the possibility to compute an integral globally over grid functions? (At the moment,

this step is done in C++ in a custom coefficient function implementation.)

All the best,

Philipp

I need to evaluate a coefficient function that itself is an integral over the whole domain, but

this integral varies at every evaluation point (in the variational formulation, this is a double integral,

I posted a related question here: ngsolve.org/forum/ngspy-forum/1109-integral-over-product-space )

With sufficiently fine meshes this is a very expensive task, which I try to mitigate by using

MPI.

At the moment, I load the global mesh in every rank to compute this integral.

The PDE I am trying to solve is time-dependent and I need to use part of the solution from

the last time step in the integrand. Is there any way I can gather a grid function at every

rank, so the global values are available there?

Or is there the possibility to compute an integral globally over grid functions? (At the moment,

this step is done in C++ in a custom coefficient function implementation.)

All the best,

Philipp

Please Log in or Create an account to join the conversation.

1 year 8 months ago #3393
by lkogler

Replied by *lkogler* on topic *MPI global integral*

Hi Philip,

That is quite the difficult problem, I must say! Unfortunately, I am afraid I have no easy solution for you.

I do have some suggestions, though.

The first thing you could try is hybrid parallelization. Only run one MPI-rank per node (or socket), but use multiple threads on every rank. Per default, NGSolve turns off shared mempory parallelization when run with MPI, but you can turn it back on from python with
That way, you would still have to load the global mesh on every rank to compute the integral, but you would have fewer ranks in total.

To speed up the computation of the integral itself, could you compute multiple integrals at the same time? Possibly, your CoefficientFunction could compute all integrals when it is called the first time in each timestep. Then you would only need to loop through elements once and compute integrals for all points you need.

Finally, if your mesh gets too fine for the hybrid parallelization approach, you could use MPI sub-communicators. Then, every rank that participates in the "main" computation could have a series of workers that partition one instance of the global mesh among them and are responsible for computing the integral only.

I think this would require a good amount of additional C++ coding. You would have to send the coefficients of the gridfunction to the worker nodes, and the integral values from workers to master ranks. This might be quite nasty to do efficiently.

I hope this helps you a bit.

Best,

Lukas

That is quite the difficult problem, I must say! Unfortunately, I am afraid I have no easy solution for you.

I do have some suggestions, though.

The first thing you could try is hybrid parallelization. Only run one MPI-rank per node (or socket), but use multiple threads on every rank. Per default, NGSolve turns off shared mempory parallelization when run with MPI, but you can turn it back on from python with

Code:

SetNumThreads(N)

To speed up the computation of the integral itself, could you compute multiple integrals at the same time? Possibly, your CoefficientFunction could compute all integrals when it is called the first time in each timestep. Then you would only need to loop through elements once and compute integrals for all points you need.

Finally, if your mesh gets too fine for the hybrid parallelization approach, you could use MPI sub-communicators. Then, every rank that participates in the "main" computation could have a series of workers that partition one instance of the global mesh among them and are responsible for computing the integral only.

I think this would require a good amount of additional C++ coding. You would have to send the coefficients of the gridfunction to the worker nodes, and the integral values from workers to master ranks. This might be quite nasty to do efficiently.

I hope this helps you a bit.

Best,

Lukas

Please Log in or Create an account to join the conversation.

1 year 8 months ago - 1 year 8 months ago #3394
by philipp

Replied by *philipp* on topic *MPI global integral*

Hi Lukas,

thank you for your response.

I managed to figure that I could reuse the code written in gridfunction.cpp for the loading and saving

of grid functions in parallel, so I gather the coefficients at rank zero and redistribute from there.

I wouldn't call it efficient, but it works somehow and makes the computations at least feasible (now I

need only 20 min for a time step.)

By accident, I already used your first suggestion: I specify my thread numbers explicitly every time.

I didn't know it would be turned off by default with MPI. Thanks for this clarification.

Your second suggestion I also implemented. I use a cache for the coefficient function and at the beginning of each time step I compute its values for later reuse. Is there maybe an especially good data structure in NGSolve for this?

The third approach is quite interesting. To make sure I understood correctly, let me rephrase it:

You suggest that (additionally, or instead) to thread workers, I spawn for every rank MPI workers.

The advantage here to thread workers is that their amount is not so strongly limited by the socket cores.

I am definitely not an MPI expert which might explain the basic question.

I guess I understood right that there is no python mechanism for gathering a grid function? I mean, I understand. It is not something one would like to do...

All the best,

Philipp

thank you for your response.

I managed to figure that I could reuse the code written in gridfunction.cpp for the loading and saving

of grid functions in parallel, so I gather the coefficients at rank zero and redistribute from there.

I wouldn't call it efficient, but it works somehow and makes the computations at least feasible (now I

need only 20 min for a time step.)

By accident, I already used your first suggestion: I specify my thread numbers explicitly every time.

I didn't know it would be turned off by default with MPI. Thanks for this clarification.

Your second suggestion I also implemented. I use a cache for the coefficient function and at the beginning of each time step I compute its values for later reuse. Is there maybe an especially good data structure in NGSolve for this?

The third approach is quite interesting. To make sure I understood correctly, let me rephrase it:

You suggest that (additionally, or instead) to thread workers, I spawn for every rank MPI workers.

The advantage here to thread workers is that their amount is not so strongly limited by the socket cores.

I am definitely not an MPI expert which might explain the basic question.

I guess I understood right that there is no python mechanism for gathering a grid function? I mean, I understand. It is not something one would like to do...

All the best,

Philipp

Last edit: 1 year 8 months ago by philipp.

Please Log in or Create an account to join the conversation.

1 year 8 months ago - 1 year 8 months ago #3399
by lkogler

Replied by *lkogler* on topic *MPI global integral*

Essentially, yes.The advantage here to thread workers is that their amount is not so strongly limited by the socket cores.

I am definitely not an MPI expert which might explain the basic question.

Not really, no. There is only the option to sequentialize the coefficients of a GridFunction which you already found.I guess I understood right that there is no python mechanism for gathering a grid function?

Last edit: 1 year 8 months ago by lkogler.

The following user(s) said Thank You: philipp

Please Log in or Create an account to join the conversation.

Time to create page: 0.137 seconds