salomemed
6.5.0

The \c OverlapDEC enables the \ref InterpKerRemapGlobal "conservative remapping" of fields between two parallel codes.
The \c OverlapDEC enables the \ref InterpKerRemapGlobal "conservative remapping" of fields between two parallel codes.
This remapping is based on the computation of intersection volumes on a same processor group. On this processor group are defined two fieldtemplates called A and B. The computation is possible for 3D meshes, 2D meshes, 3Dsurface meshes, 1D meshes and 2Dcurve meshes. Dimensions must be similar for the distribution templates A and B. The main difference with InterpKernelDEC is that this DEC manages 2 field templates on each processor of the processor group (A and B) called source and target. Furthermore all processors in processor group cooperates in global interpolation matrix computation. In this respect InterpKernelIDEC is a specialization of OverlapDEC
.
Let's consider the following use case that is ran in ParaMEDMEMTest_OverlapDEC.cxx to describes the different steps of the computation. The processor group contains 3 processors.
In order to reduce as much as possible the amount of communications between distant processors, every processor computes a bounding box for A and B. Then a AllToAll communication is performed so that every processor can compute the global interactions between processor. This computation leads every processor to compute the same global TODO list expressed as a list of pair. A pair (x,y) means that proc x fieldtemplate A can interact with fieltemplate B of proc y because the two bounding boxes interact. In the example above this computation leads to the following a global TODO list :
(0,0),(0,1),(1,0),(1,2),(2,0),(2,1),(2,2)
Here the pair (0,2) does not appear because the bounding box of fieldtemplateA of proc#2 does not intersect that of fieldtemplate B on proc#0.
Stage performed by ParaMEDMEM::OverlapElementLocator::computeBoundingBoxes.
Starting from the global interaction previously computed in Step 1, each proc computes the TODO list per proc. The following rules is chosen : a pair (x,y) can be treated by either proc #x or proc #y, in order to reduce the amount of data transfert among processors. The algorithm chosen for load balancing is the following : Each processor has an empty local TODO list at the beginning. Then for each pair (k,m) in global TODO list, if proc::k has less temporary local list than proc::m pair, (k,m) is added to temparary local TODO list of proc::k. If proc::m has less temporary local TODO list than proc::k pair, (k,m) is added to temporary local TODO list of proc::m. If proc::k and proc::m have the same amount of temporary local TODO list pair, (k,m) is added to temporary local TODO list of proc::k.
In the example above this computation leads to the following local TODO list :
The algorithm described here is not perfect for this use case, we hope to enhance it soon.
At this stage each proc knows precisely its local TODO list (with regard to interpolation). The local TODO list of other procs than local is kept for future computations.
Knowing the local TODO list, the aim now is to exchange fieldtemplates between procs. Each proc computes knowing TODO list per proc computed in Step 2 the exchange TODO list :
In the example above the exchange TODO list gives the following results :
Sending TODO list per proc :
Receiving TODO list per proc :
To avoid as much as possible large volumes of transfers between procs, only relevant parts of meshes are sent. In order for proc::k to send fieldtemplate A to fieldtemplate B of proc #m., proc::k computes the part of mesh A contained in the boundingbox B of proc::m. It implies that the corresponding cellIds or nodeIds of the corresponding part are sent to proc #m too.
Let's consider the couple (k,m) in the TODO list. This couple is treated by either k or m as seen in here in Step2.
As will be dealt in Step 6, for final matrixvector computations, the resulting matrix of the couple (k,m) whereever it is computed (proc #k or proc #m) will be stored in proc::m.
This step is performed in ParaMEDMEM::OverlapElementLocator::exchangeMeshes method.
After mesh exchange in Step3 each processor has all the required information to treat its local TODO list computed in Step2. This step is potentially CPU costly, which is why the local TODO list per proc is expected to be as well balanced as possible.
The interpolation is performed as Remapper does.
This operation is performed by OverlapInterpolationMatrix::addContribution method.
After having performed the TODO list at the end of Step4 we need to assemble the final matrix.
The final aim is to have a distributed matrix on each proc::k. In order to reduce data exchange during the matrix product process, is built using sizeof(Proc group) std::vector<
std::map<int,double>
>
.
For a proc::k, it is necessary to fetch info of all matrices built in Step4 where the first element in pair (i,j) is equal to k.
After this step, the matrix repartition is the following after a call to ParaMEDMEM::OverlapMapping::prepare :
Tuple (2,1) computed on proc 2 is stored in proc 1 after execution of the function "prepare". This is an example of item 0 in Step2. Tuple (0,1) computed on proc 1 is stored in proc 1 too. This is an example of item 1 in Step2.
In the end ParaMEDMEM::OverlapMapping::_proc_ids_to_send_vector_st will contain :
In the end ParaMEDMEM::OverlapMapping::_proc_ids_to_recv_vector_st will contain :
The method in charge to perform this is : ParaMEDMEM::OverlapMapping::prepare.