| Commit message (Collapse) | Author | Age |
|
|
|
| |
darcs-hash:20050411203309-891bb-5b74d6135f6cd6995f1eed6cc74dd2f29c42f8a8.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code to minimise the number of outstanding communication requests is
superseded by the collective buffers communication code. Therefore the
corresponding parameter has been deactivated (but not removed in order to keep
backwards compatibility with older checkpoints).
It is marked as deprecated in the param.ccl file and should not be used anymore
(use CarpetLib::use_collective_communication_buffers instead).
A level-2 warning of that meaning is printed at startup if the parameter is
still set in a user's parfile.
darcs-hash:20050411155524-776a0-ed9919869cc1f2821ab8b2fa23b4abea203b72ed.gz
|
|
|
|
|
|
| |
communication code
darcs-hash:20050411141439-776a0-98125bb76dcb733d3649cd50f9a27e4e7c9d2d6d.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Receive operations are posted earlier now (don't wait until send buffers
are filled).
* A send operation is posted as soon as its send buffer is full (don't wait
until all send buffers have been filled).
* MPI_Irsend() is used instead of MPI_Isend()
This probably doesn't make a difference with most MPI implementations.
* Use MPI_Waitsome() to allow for overlapping of communication and computation
to some extent: data from already finished receive operations can be
copied back while active receive operations are still going on.
MPI_Waitsome() is now called (instead of MPI_Waitall()) to wait for
(one or more) posted receive operations to finish. The receive buffers
for those operations are then flagged as ready for data copying.
The drawback of this overlapping communication/computation scheme is
that the comm_state loop may be iterated more often now. My benchmarks on
up to 16 processors showed no performance win compared to using MPI_Waitall()
(in fact, the performance decreased). Maybe it performs better on larger
numbers of processors when there is more potential for network congestion.
The feature can be turned on/off by setting CarpetLib::use_waitall to yes/no.
For now I recommend using CarpetLib::use_waitall = "yes" (which is not the
default setting).
darcs-hash:20050411122235-776a0-e4f4179f46fce120572231b19cacb69c940f7b82.gz
|
|
|
|
|
|
|
|
|
|
|
|
| |
Collective buffers were accidentally used (eg. by CarpetIOHDF5 or CarpetIOASCII)
even if CarpetLib::use_collective_communication_buffers was set to "no".
Now this parameter is evaluated in the comm_state constructor (together with
the variable type given) and the result stored in a flag
comm_state::uses__collective_communication_buffers. This flag is then used
later in comm_state::step() to decide about communication paths.
darcs-hash:20050411100916-776a0-aef034c4a23dac96f515cf831d15c8b7e2ce2f9d.gz
|
|
|
|
|
|
|
|
| |
Resolve the conflict that exists between the patches that introduce
the mem<T> class conflicts and the option
CarpetLib::use_collective_communication_buffers.
darcs-hash:20050410175106-891bb-a66f3783fd8c897d65ed07f55b812e346b406baa.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a new class mem<T> for memory management. Memory management
has become sufficiently complicated to move into its own class. The
class mem<T> features:
1. Allocating nelem items of type T
2. Managing contiguous regions of memory for several data<T> objects
for vector groups
3. Allowing a pointer to a memory region to be passed in, which is
used instead of allocating memory through new
4. Reference counting, so that the mem<T> object only goes away once
the last using data<T> object does not need it any more.
This makes it unnecessary to delete the first data<T> objects for a
grid function group last.
darcs-hash:20050305174647-891bb-e1f53adca34e5a668af96c662845cca0f259f8e6.gz
|
|
|
|
|
|
| |
dist::datatype() to get rid of g++ compiler warnings
darcs-hash:20050406145726-776a0-16ef8cd6d00ca41fcd3662b93bffe649476ff31f.gz
|
|
|
|
| |
darcs-hash:20050316130144-3fd61-1f95c63b76c29de63f212546b5e4fa226afe7299.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Collective buffers are used to gather all components' data on a processor
before it gets send off to other processors in one go. This minimizes the
number of outstanding MPI communications down to O(N-1) and thus improves
overall efficiency as benchmarks show.
Each processor allocates a pair of single send/recv buffers to communicate
with all other processors. For this the class (actually, the struct) comm_state
was extended by 3 more states:
state_get_buffer_sizes: accumulates the sizes for the send/recv buffers
state_fill_send_buffers: gathers all the data into the send buffers
state_empty_recv_buffers: copies the data from the recv buffer back into
the processor's components
Send/recv buffers are exchanged during state_fill_send_buffers and
state_empty_recv_buffers. The constructor for a comm_state struct now takes
an argument <datatype> which denotes the CCTK datatype to use for the
attached collective buffers. If a negative value is passed here then it falls
back to using the old send/recv/wait communication scheme. The datatype
argument has a default value of -1 to maintain backwards compatibility to
existing code (which therefore will keep using the old scheme).
The new communication scheme is chosen by setting the parameter
CarpetLib::use_collective_communication_buffers to "yes". It defaults to "no"
meaning that the old send/recv/wait scheme is still used.
So far all the comm_state objects in the higher-level routines in thorn Carpet
(restriction/prolongation, regridding, synchronization) have been enabled to
use collective communication buffers.
Other thorns (CarpetInterp, CarpetIO*, CarpetSlab) will follow in separate
commits.
darcs-hash:20050330152811-776a0-51f426887fea099d1a67b42bd79e4f786979ba91.gz
|
|
|
|
| |
darcs-hash:20050323191002-776a0-4a40d844dee2a66e8802669d960709e3488216c4.gz
|
|
|
|
|
|
|
|
|
| |
Checking the invariant of the bboxset class is probably O(N^3) in the
number of bboxes, possibly worse. It is a very slow operation when
there are many components in a simulation, especially with AMR, and
possibly also when running on many processors.
darcs-hash:20050321020814-891bb-c5eb1cde6f3ac064e39a362b19bd20e15d03bc24.gz
|
|
|
|
| |
darcs-hash:20050321020511-891bb-2775aab7e620a3c8da997c10cefcf5ac53124509.gz
|
|
|
|
|
|
| |
{copy,interpolate}_from_{recv,send,wait}() further up into {copy,interpolate}_from() so that the code is shared
darcs-hash:20050316132044-776a0-525aa7485c2718a9717b6f253553982524872727.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch greatly reduces the number of outstanding MPI_Isend/MPI_Irecv
communication requests by moving the loop over comm_states (recv,send,wait)
from the outermost to the innermost.
This resolves problems with certain MPI implementations (specifically LAM,
MPICH-NCSA, and Mvapich over Infiniband) which potentially resulted in some
communication buffer overflow and caused the Cactus application to abort or
hang forever.
Preliminary benchmarks with BSSN_MoL show that the patch does not have a
negative impact on myrinet clusters (measured to 64 processors).
It even improves the Carpet performance on GigE clusters (measured up to 16
processors).
The order of the communication loops is controlled by the boolean parameter
CarpetRegrid::minimise_outstanding_communications
which defaults to "no" (preserving the old behaviour).
darcs-hash:20050311160040-3fd61-04d40ac79ef218252f9364a8d18796e9b270d295.gz
|
|
|
|
| |
darcs-hash:20050303170559-891bb-836694ccb8375a1f09cfaeda646e4430e3e5bd07.gz
|
|
|
|
| |
darcs-hash:20050303101420-891bb-6c23d70652146f074a392970443190966c909d10.gz
|
|
|
|
| |
darcs-hash:20050303101349-891bb-251f1432b873c898f73e0315ed63a7764f8714e8.gz
|
|
|
|
| |
darcs-hash:20050209223915-891bb-4c4001d07890086e95de8d5a91deffc66c32e469.gz
|
|
|
|
| |
darcs-hash:20050209222027-891bb-e8501dfa40575303af1338c1d2d4528d08ea273c.gz
|
|
|
|
|
|
|
| |
Reverse the order that variables are destroyed when refinement levels
are removed. This ensures that vector GFs are treated correctly.
darcs-hash:20050209173129-58c7f-c2507b49252fe45782dd06803201ef1cff74f889.gz
|
|
|
|
| |
darcs-hash:20050201231956-891bb-8c892504000762557eb01b8b6ef48d8f0b815e06.gz
|
|
|
|
| |
darcs-hash:20050201231759-891bb-db87543a706110d2cd819a7f38c1e67cf27e16a3.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the way in which the grid hierarchy is stored. The new hierarchy is
map
mglevel
reflevel
component
timelevel
i.e., mglevel moved from the bottom to almost the top. This is
because mglevel used to be a true multigrid level, but is now meant to
be a convergence level.
Do not allocate all storage all the time. Allow storage to be
switched on an off per refinement level (and for a single mglevel,
which prompted the change above). Handle storage management with
CCTK_{In,De}creaseGroupStorage instead of
CCTK_{En,Dis}ableGroupStorage.
darcs-hash:20050201225827-891bb-eae3b6bd092ae8d6b5e49be84c6f09f0e882933e.gz
|
|
|
|
| |
darcs-hash:20050112093608-891bb-925d83a354f44638d28d3871d3e67a3ac5343fa6.gz
|
|
|
|
|
|
|
| |
Restructure the lightweight communication buffers.
Use lightweight communication buffers for interpolation as well.
darcs-hash:20050103200712-891bb-7e42816d3b8d667916084e3f32527c8f35327d7f.gz
|
|
|
|
|
|
| |
data::try_without_time_interpolation
darcs-hash:20050103135332-891bb-e92a19212dfbadde889fda0760232f5b7749aac3.gz
|
|
|
|
| |
darcs-hash:20050103135305-891bb-8813921b6e0e2988e9afd0be90c3d7ff092cee1e.gz
|
|
|
|
|
|
|
|
|
|
|
| |
Lightweight communication buffers use essentially only a vector<T>
instead of a data<T> to transfer data between processors. This should
reduce the computational overhead.
Set the parameter "use_lightweight_buffers" to use this feature. This
feature is completely untested.
darcs-hash:20050102173524-891bb-6a3999cbd63e367c8520c175c8078374d294eaa8.gz
|
|
|
|
| |
darcs-hash:20050102173453-891bb-833515dd47ce1469ebe319718ab169e5eb82c6c4.gz
|
|
|
|
| |
darcs-hash:20050101193846-891bb-7bb505d29a25b04c0d23e792eea7ff404d1f4200.gz
|
|
|
|
| |
darcs-hash:20050101191615-891bb-20b262ff1a4468d5e1c5ac8626a3ead0727c2da9.gz
|
|
|
|
| |
darcs-hash:20050101190036-891bb-cf588a05c760e0d465d2efc352defedae6ba4ce5.gz
|
|
|
|
| |
darcs-hash:20050101185718-891bb-143c84dacf00f458eed1b9c985900bbaf5e3b98b.gz
|
|
|
|
| |
darcs-hash:20050101185325-891bb-197dd6cea208ec8d17507e31d99c22f0161fa21b.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turn most of the templates in CarpetLib, which used to have the form
template<int D> class XXX
into classes, i.e., into something like
class XXX
by setting D to the new global integer constant dim, which in turn is set to 3.
The templates gf and data, which used to be of the form
template<typename T, int D> class XXX
are now of the form
template<typename T> class XXX
The templates vect, bbox, and bboxset remain templates.
This change simplifies the code somewhat.
darcs-hash:20050101182234-891bb-c3063528841f0d078b12cc506309ea27d8ce730d.gz
|
|
|
|
| |
darcs-hash:20050101171429-891bb-130630de8631b8f9bbe494e135662ffb089ecca0.gz
|
|
|
|
| |
darcs-hash:20050201222816-891bb-3ac829f630cbb58de05bbc229f2de47b80d1434f.gz
|
|
|
|
| |
darcs-hash:20050201214347-891bb-286c20316478d9f1c8384f94764174cab5adb9e0.gz
|
|
|
|
| |
darcs-hash:20050201212131-891bb-df7215694d99c95de5b1e2fceba6c0aff56ef586.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove functions gdata::lives_on_this_processor() and
gdata::this_processor_is(proc). Introduce dist::rank() and
dist::size() instead.
Re-introduce assert statement in data::proc().
Move declaration and definition of assignment operator near the
constructor and destructor.
darcs-hash:20041230191026-891bb-90eeb1be4c04753c165e13e7c1e65f06847180ca.gz
|
|
|
|
| |
darcs-hash:20050127104948-891bb-214d1033924b4db7a215a624eb337f8a619ac82d.gz
|
|
|
|
| |
darcs-hash:20050101191544-891bb-4f1f960cbe9c99a9cdc42c1ba590ecdc6eb75f5f.gz
|
|
|
|
| |
darcs-hash:20050118180403-776a0-eb91906a7335386c49a03ae70f5caaf20a5441c3.gz
|
|
|
|
|
|
|
|
| |
Try harder to normalise bboxsets. This is slower, but also more
successful at finding the smalles possible number of bboxes. This
improves e.g. recombined HDF5 I/O.
darcs-hash:20050104214651-891bb-b6e00cf25685c8111472315f5fcdc37518db3700.gz
|
|
|
|
| |
darcs-hash:20050101162121-891bb-ac9d070faecc19f91b4b57389d3507bfc6c6e5ee.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Collect more timing statistics in the data class. Print these
statistics to stdout when the Cactus parameter print_timestats is set.
Create a timer class "timestat". This is a timer that can be started,
stopped, and it prints the total time as well as some statistics.
For memory allocation statistics, count the number of objects as well
as the number of bytes.
darcs-hash:20041230212136-891bb-c14edfa7d539ae9b135eee76afadaad51fd0b098.gz
|
|
|
|
| |
darcs-hash:20041230191410-891bb-fba06c8c0054b5324a4164e79205b36c7e5bfc3c.gz
|
|
|
|
|
|
|
|
| |
Add missing this-> prefixes
Declare template specialisation before definigg it
Rename some local variables to avoid name clashes
darcs-hash:20041228183523-891bb-acc5a1a8c1f247512a38dba56ff5419d96280fa3.gz
|
|
|
|
|
|
|
|
|
|
| |
Rename it to fill_bbox_array.
Declare it in a better place in the header file.
Make it not virtual.
Change pointer arguments to arrays.
Change hard-coded number 3 to D.
darcs-hash:20041225202612-891bb-e6249d004fdf0b3d8d24cbf8e5a4ae713786bdfb.gz
|