CarpetLib: some optimisations for the collective buffers communication scheme

* Receive operations are posted earlier now (don't wait until send buffers are filled). * A send operation is posted as soon as its send buffer is full (don't wait until all send buffers have been filled). * MPI_Irsend() is used instead of MPI_Isend() This probably doesn't make a difference with most MPI implementations. * Use MPI_Waitsome() to allow for overlapping of communication and computation to some extent: data from already finished receive operations can be copied back while active receive operations are still going on. MPI_Waitsome() is now called (instead of MPI_Waitall()) to wait for (one or more) posted receive operations to finish. The receive buffers for those operations are then flagged as ready for data copying. The drawback of this overlapping communication/computation scheme is that the comm_state loop may be iterated more often now. My benchmarks on up to 16 processors showed no performance win compared to using MPI_Waitall() (in fact, the performance decreased). Maybe it performs better on larger numbers of processors when there is more potential for network congestion. The feature can be turned on/off by setting CarpetLib::use_waitall to yes/no. For now I recommend using CarpetLib::use_waitall = "yes" (which is not the default setting). darcs-hash:20050411122235-776a0-e4f4179f46fce120572231b19cacb69c940f7b82.gz
author: Thomas Radke <tradke@aei.mpg.de> 2005-04-11 12:22:00 +0000
committer: Thomas Radke <tradke@aei.mpg.de> 2005-04-11 12:22:00 +0000
commit: 5386023def644841cad93ade7380e088faecb0f3 (patch)
tree: 5968b02987063f38d8c789d5f08222d2119f3383 /Carpet/CarpetLib/src/commstate.hh
parent: 9e4e8bcf147a2c84b40a6836e3e1a27bd772d6de (diff)
1 files changed, 17 insertions, 7 deletions
diff --git a/Carpet/CarpetLib/src/commstate.hh b/Carpet/CarpetLib/src/commstate.hh
index 9c5239b21..21c05320e 100644
--- a/Carpet/CarpetLib/src/commstate.hh
+++ b/Carpet/CarpetLib/src/commstate.hh
@@ -90,6 +90,7 @@ public:
   // the following members are used for collective communications
   //////////////////////////////////////////////////////////////////////////
 
+public:
   // CCTK vartype used for this comm_state object
   int vartype;
 
@@ -97,9 +98,6 @@ public:
   // (used as stride for advancing the char-based buffer pointers)
   int vartypesize;
 
-  // MPI datatype corresponding to CCTK vartype
-  MPI_Datatype datatype;
-
   // buffers for collective communications
   struct collbufdesc {
     // the sizes of communication buffers (in elements of type <vartype>)
@@ -115,17 +113,29 @@ public:
                     sendbuf(NULL), recvbuf(NULL),
                     sendbufbase(NULL), recvbufbase(NULL) {}
 
-// FIXME: why can't these be made private ??
-//private:
     // the allocated communication buffers
     char* sendbufbase;
     char* recvbufbase;
   };
   vector<collbufdesc> collbufs;          // [nprocs]
 
+  // flags indicating which receive buffers are ready to be emptied
+  vector<bool> recvbuffers_ready;        // [nprocs]
+
+  // MPI datatype corresponding to CCTK vartype
+  MPI_Datatype datatype;
+
+  // lists of outstanding requests for posted send/recv communications
+  vector<MPI_Request> srequests;         // [nprocs]
 private:
-  // Exchange pairs of send/recv buffers between all processors.
-  void ExchangeBuffers();
+  vector<MPI_Request> rrequests;         // [nprocs]
+
+  // number of posted and already completed receive communications
+  int num_posted_recvs;
+  int num_completed_recvs;
+
+  // wait for completion of posted collective buffer sends/receives
+  bool AllPostedCommunicationsFinished(bool use_waitall);
 };
author	Thomas Radke <tradke@aei.mpg.de>	2005-04-11 12:22:00 +0000
committer	Thomas Radke <tradke@aei.mpg.de>	2005-04-11 12:22:00 +0000
commit	5386023def644841cad93ade7380e088faecb0f3 (patch)
tree	5968b02987063f38d8c789d5f08222d2119f3383 /Carpet/CarpetLib/src/commstate.hh
parent	9e4e8bcf147a2c84b40a6836e3e1a27bd772d6de (diff)