| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Receive operations are posted earlier now (don't wait until send buffers
are filled).
* A send operation is posted as soon as its send buffer is full (don't wait
until all send buffers have been filled).
* MPI_Irsend() is used instead of MPI_Isend()
This probably doesn't make a difference with most MPI implementations.
* Use MPI_Waitsome() to allow for overlapping of communication and computation
to some extent: data from already finished receive operations can be
copied back while active receive operations are still going on.
MPI_Waitsome() is now called (instead of MPI_Waitall()) to wait for
(one or more) posted receive operations to finish. The receive buffers
for those operations are then flagged as ready for data copying.
The drawback of this overlapping communication/computation scheme is
that the comm_state loop may be iterated more often now. My benchmarks on
up to 16 processors showed no performance win compared to using MPI_Waitall()
(in fact, the performance decreased). Maybe it performs better on larger
numbers of processors when there is more potential for network congestion.
The feature can be turned on/off by setting CarpetLib::use_waitall to yes/no.
For now I recommend using CarpetLib::use_waitall = "yes" (which is not the
default setting).
darcs-hash:20050411122235-776a0-e4f4179f46fce120572231b19cacb69c940f7b82.gz
|
|
|
|
|
|
|
|
|
|
|
|
| |
Collective buffers were accidentally used (eg. by CarpetIOHDF5 or CarpetIOASCII)
even if CarpetLib::use_collective_communication_buffers was set to "no".
Now this parameter is evaluated in the comm_state constructor (together with
the variable type given) and the result stored in a flag
comm_state::uses__collective_communication_buffers. This flag is then used
later in comm_state::step() to decide about communication paths.
darcs-hash:20050411100916-776a0-aef034c4a23dac96f515cf831d15c8b7e2ce2f9d.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Collective buffers are used to gather all components' data on a processor
before it gets send off to other processors in one go. This minimizes the
number of outstanding MPI communications down to O(N-1) and thus improves
overall efficiency as benchmarks show.
Each processor allocates a pair of single send/recv buffers to communicate
with all other processors. For this the class (actually, the struct) comm_state
was extended by 3 more states:
state_get_buffer_sizes: accumulates the sizes for the send/recv buffers
state_fill_send_buffers: gathers all the data into the send buffers
state_empty_recv_buffers: copies the data from the recv buffer back into
the processor's components
Send/recv buffers are exchanged during state_fill_send_buffers and
state_empty_recv_buffers. The constructor for a comm_state struct now takes
an argument <datatype> which denotes the CCTK datatype to use for the
attached collective buffers. If a negative value is passed here then it falls
back to using the old send/recv/wait communication scheme. The datatype
argument has a default value of -1 to maintain backwards compatibility to
existing code (which therefore will keep using the old scheme).
The new communication scheme is chosen by setting the parameter
CarpetLib::use_collective_communication_buffers to "yes". It defaults to "no"
meaning that the old send/recv/wait scheme is still used.
So far all the comm_state objects in the higher-level routines in thorn Carpet
(restriction/prolongation, regridding, synchronization) have been enabled to
use collective communication buffers.
Other thorns (CarpetInterp, CarpetIO*, CarpetSlab) will follow in separate
commits.
darcs-hash:20050330152811-776a0-51f426887fea099d1a67b42bd79e4f786979ba91.gz
|
|
|
|
|
|
|
| |
Restructure the lightweight communication buffers.
Use lightweight communication buffers for interpolation as well.
darcs-hash:20050103200712-891bb-7e42816d3b8d667916084e3f32527c8f35327d7f.gz
|
|
|
|
|
|
|
|
|
|
|
| |
Lightweight communication buffers use essentially only a vector<T>
instead of a data<T> to transfer data between processors. This should
reduce the computational overhead.
Set the parameter "use_lightweight_buffers" to use this feature. This
feature is completely untested.
darcs-hash:20050102173524-891bb-6a3999cbd63e367c8520c175c8078374d294eaa8.gz
|
|
darcs-hash:20050101193846-891bb-7bb505d29a25b04c0d23e792eea7ff404d1f4200.gz
|