| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
Introduce dist::barrier as low-level implementation of a named
barrier.
Use it in Carpet::NamedBarrier.
Use the above in almost all barrier calls.
|
| |
|
| |
|
|
|
|
|
|
| |
single mechanism provided by CarpetLib.
Use this mechanism everywhere.
|
|
|
|
| |
Ignore-this: 309b4dd613f4af2b84aa5d6743fdb6b3
|
| |
|
|
|
|
|
|
| |
Correct a memory leak and simplify the code in the commstate class by using
C++ datatypes instead of new and delete.
Add many assert statements to catch potential problems.
|
|
|
|
|
|
|
|
|
|
| |
Use vector<char> instead of new char[] in the commstate class. This
corrects a memory management error.
Use .AT() instead of [] to access vector elements to catch indexing
errors.
darcs-hash:20080219044221-dae7b-ecd72b45833617920a33311953d5c2f00c42568c.gz
|
|
|
|
|
|
|
|
|
| |
Always use collective communication buffers in commstate class.
Add functions to reserve space in a commbuf, to get a pointer into the
space, and to commit space. This encapsulates using commbufs.
darcs-hash:20070419013946-dae7b-fce3d05b5e90fb37588939d1b11dce6d48ea2ead.gz
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement a second timer based on Intel's rdtsc instruction, which is
much faster and much more accurate than MPI_Wtime.
Place the timer classes into the CarpetLib namespace.
Create a TimerSet class. Make the Timer class automatically register
all timers with a singleton object, removing all global variables.
darcs-hash:20070203211128-dae7b-42765e79446eda6a2337ba22cd390869055c555a.gz
|
|
|
|
|
|
|
|
| |
Define a macro AT() to index into std::vector. Depending on the macro
NDEBUG, AT() is defined either as at(), providing index checking, or
as operator[], providing no checking.
darcs-hash:20070203205854-dae7b-a1999c88c95ba12b1ee66505f712aefdd67d7e6f.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add new parameters:
BOOLEAN interleave_communications:
Try to interleave communications with each other; each processor
begins to communicate with its 'right neighbour' in rank, instead of
with the root processor
BOOLEAN vary_tags:
Use different tags for each communication
BOOLEAN barrier_between_stages:
Add a barrier between the communication stages (slows down, but may
make timing numbers easier to interpret)
BOOLEAN combine_sends:
Send data together and in order of processor ranks
BOOLEAN reduce_mpi_waitall:
Call MPI_Waitall only for requests that are not null
BOOLEAN use_mpi_send:
Use MPI_Send instead of MPI_Isend
BOOLEAN use_mpi_ssend:
Use MPI_Ssend instead of MPI_Isend
darcs-hash:20061206165333-dae7b-8ba40bd19fb1733336e60cb7e6bfa0ebfe0d546d.gz
|
|
|
|
|
|
|
| |
When CarpetLib::poison_new_memory is set, poison communications
buffers as well.
darcs-hash:20060904020453-dae7b-762dfc46dcaea77cdff48fcd5e63805bf14e6dc0.gz
|
|
|
|
|
|
|
|
|
|
| |
Add timers for the new communication infrastructure.
Enhance the timers to also track the minimum and maximum time spent.
Add a parameter to output timing information to files.
darcs-hash:20060731152618-dae7b-1d049b2b37397610c14648078fd0ee92f252ca2a.gz
|
|
|
|
| |
darcs-hash:20060731151355-dae7b-fa5ddb6af45a3eec44780b7bff81e8a07a1aa861.gz
|
|
|
|
|
|
| |
No functionality change, but this requires all callers to be changed.
darcs-hash:20051119202604-dae7b-3492487bfdc4f3d228ec57a2b2ea02116f5cb64c.gz
|
|
|
|
|
|
|
| |
Replace some int local variables by size_t local variables. This
eliminates some compiler warnings about signed/unsigned comparisons.
darcs-hash:20051119202008-dae7b-8cf4f1bf5673b3b68164b2488f3e8c738fa55726.gz
|
|
|
|
|
|
|
|
|
| |
CarpetLib's comm_state class (actually, it's still just a struct) has been
extended to handle collective buffer communications for all possible C datatypes
at the same time. This makes it unnecessary for the higher-level communication
routines to loop over each individual datatype separately.
darcs-hash:20050815150023-776a0-dddc1aca7ccaebae872f9f451b2c3595cd951fed.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
single components
The default communication scheme in Carpet (which does an individual send/recv
operation for each component) comes with two parameters for fine tuning:
CarpetLib::use_lightweight_buffers
CarpetLib::combine_recv_send
the defaults of which are set to use a well-tested but also slower
communication pattern (as turned out during benchmark runs).
This patch cleans up the implementation of this communication scheme so that the
fastest communication pattern (combined posting of send/recv; use of lightweight
buffers) is now always used. The above parameters therefore became obsolete
and shouldn't be used anymore in parfiles.
darcs-hash:20050526114253-776a0-780933a1539a260d74da8b92522fa2f48c714964.gz
|
|
|
|
|
|
|
|
|
| |
Using ready mode sends in the collective buffers communication scheme was wrong
because it is not guaranteed that the corresponding receive operations have
been posted already on other processors at that point.
Now standard mode non-blocking sends, MPI_Isend(), are used (again).
darcs-hash:20050512161846-776a0-09b27a8a9928d6c45751634c4e8f6c3af9e2dbec.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Receive operations are posted earlier now (don't wait until send buffers
are filled).
* A send operation is posted as soon as its send buffer is full (don't wait
until all send buffers have been filled).
* MPI_Irsend() is used instead of MPI_Isend()
This probably doesn't make a difference with most MPI implementations.
* Use MPI_Waitsome() to allow for overlapping of communication and computation
to some extent: data from already finished receive operations can be
copied back while active receive operations are still going on.
MPI_Waitsome() is now called (instead of MPI_Waitall()) to wait for
(one or more) posted receive operations to finish. The receive buffers
for those operations are then flagged as ready for data copying.
The drawback of this overlapping communication/computation scheme is
that the comm_state loop may be iterated more often now. My benchmarks on
up to 16 processors showed no performance win compared to using MPI_Waitall()
(in fact, the performance decreased). Maybe it performs better on larger
numbers of processors when there is more potential for network congestion.
The feature can be turned on/off by setting CarpetLib::use_waitall to yes/no.
For now I recommend using CarpetLib::use_waitall = "yes" (which is not the
default setting).
darcs-hash:20050411122235-776a0-e4f4179f46fce120572231b19cacb69c940f7b82.gz
|
|
|
|
|
|
|
|
|
|
|
|
| |
Collective buffers were accidentally used (eg. by CarpetIOHDF5 or CarpetIOASCII)
even if CarpetLib::use_collective_communication_buffers was set to "no".
Now this parameter is evaluated in the comm_state constructor (together with
the variable type given) and the result stored in a flag
comm_state::uses__collective_communication_buffers. This flag is then used
later in comm_state::step() to decide about communication paths.
darcs-hash:20050411100916-776a0-aef034c4a23dac96f515cf831d15c8b7e2ce2f9d.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Collective buffers are used to gather all components' data on a processor
before it gets send off to other processors in one go. This minimizes the
number of outstanding MPI communications down to O(N-1) and thus improves
overall efficiency as benchmarks show.
Each processor allocates a pair of single send/recv buffers to communicate
with all other processors. For this the class (actually, the struct) comm_state
was extended by 3 more states:
state_get_buffer_sizes: accumulates the sizes for the send/recv buffers
state_fill_send_buffers: gathers all the data into the send buffers
state_empty_recv_buffers: copies the data from the recv buffer back into
the processor's components
Send/recv buffers are exchanged during state_fill_send_buffers and
state_empty_recv_buffers. The constructor for a comm_state struct now takes
an argument <datatype> which denotes the CCTK datatype to use for the
attached collective buffers. If a negative value is passed here then it falls
back to using the old send/recv/wait communication scheme. The datatype
argument has a default value of -1 to maintain backwards compatibility to
existing code (which therefore will keep using the old scheme).
The new communication scheme is chosen by setting the parameter
CarpetLib::use_collective_communication_buffers to "yes". It defaults to "no"
meaning that the old send/recv/wait scheme is still used.
So far all the comm_state objects in the higher-level routines in thorn Carpet
(restriction/prolongation, regridding, synchronization) have been enabled to
use collective communication buffers.
Other thorns (CarpetInterp, CarpetIO*, CarpetSlab) will follow in separate
commits.
darcs-hash:20050330152811-776a0-51f426887fea099d1a67b42bd79e4f786979ba91.gz
|
|
|
|
|
|
|
| |
Restructure the lightweight communication buffers.
Use lightweight communication buffers for interpolation as well.
darcs-hash:20050103200712-891bb-7e42816d3b8d667916084e3f32527c8f35327d7f.gz
|
|
darcs-hash:20050101193846-891bb-7bb505d29a25b04c0d23e792eea7ff404d1f4200.gz
|