| Commit message (Collapse) | Author | Age |
|
|
|
| |
darcs-hash:20050501171344-891bb-d608668bef718cc9f4c6244b721b3d43d7c33ae0.gz
|
|
|
|
|
|
| |
When there are zero reductions, allow the output pointers to be NULL.
darcs-hash:20050430154439-891bb-95ea930518b19378040601526504ceb6be039a97.gz
|
|
|
|
|
|
|
| |
When a grid function has only one time level, emit a level 1 warning
instead of aborting with an assertion failure.
darcs-hash:20050428151159-891bb-53d3330b499a68f0304e99f86aa176af2ca00ecf.gz
|
|
|
|
|
|
|
| |
Use assert (dist::rank() == proc()) instead of assert (_owns_storage).
The latter is wrong; it misinterprets the meaning of the field _owns_storage.
darcs-hash:20050416184109-891bb-a07ba29e020dad420edebaa0c824b7c207d1ac2b.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CarpetIOHDF5 used to output unchunked data only, ie. all ghostzones and
boundary zones were cut off from the bboxes to be output.
This caused problems after recovery: uninitialized ghostzones led to wrong
results. The obvious solution, calling CCTK_SyncGroup() for all groups after
recovery, was also problematic because that (1) synchronised only the current
timelevel and (2) boundary prolongation was done in a scheduling order
different to the regular order used during checkpointing.
The solution implemented now by this patch is to write checkpoint files always
in chunked mode (which includes all ghostzones and boundary zones). This also
makes synchronisation of all groups after recovery unnecessary.
Regular HDF5 output files can also be written in chunked mode but the default
(still) is unchunked. A new boolean parameter IOHDF5::out_unchunked (with
default value "yes") was introduced to toggle this option.
Note that this parameter has the same meaning as IO::out_unchunked but an
opposite default value. This is the only reason why IOHDF5::out_unchunked
was introduced.
darcs-hash:20050412161430-776a0-d5efd21ecdbe41ad9a804014b816acad0cd71b2c.gz
|
|
|
|
| |
darcs-hash:20050411183135-891bb-5d2ced682685fb55a00da1864560e54bd113f765.gz
|
|
|
|
|
|
|
| |
I think there were some errors in handling the mem<T> objects, but I'm
not completely sure.
darcs-hash:20050411183030-891bb-f1b5510bb4866c8d4bab48a7b320cb6de71b1121.gz
|
|
|
|
| |
darcs-hash:20050411182954-891bb-6f141054635439136d978f98f528af1204919199.gz
|
|
|
|
| |
darcs-hash:20050411172219-891bb-2308d62a8e9de1310efb51d40e6f298310b8bd21.gz
|
|
|
|
|
|
|
|
|
| |
data<> objects cannot be implicitly copied. The standard template
library containers copy objects arbitrarily. That means that one has
to store pointers to data<> objects instead of the objects themselves,
and has to allocate and free them manually.
darcs-hash:20050411170907-891bb-406b9b6bb6b97df2f47c349f32d91398338439ae.gz
|
|
|
|
|
|
|
|
| |
Use the type CCTK_REAL instead of double for storing meta data in the
HDF5 files. This is necessary if CCTK_REAL has more precision than
double.
darcs-hash:20050411170627-891bb-374e4c2581155d825f9a1925b1d4319051bc36d6.gz
|
|
|
|
| |
darcs-hash:20050411203309-891bb-5b74d6135f6cd6995f1eed6cc74dd2f29c42f8a8.gz
|
|
|
|
|
|
|
|
| |
Using CarpetLib::use_waitall = "yes" seems to improve Carpet performance
both for the standard and for the collective buffers communication scheme.
So I made it the default.
darcs-hash:20050411173355-776a0-a1046bde7c4ccb4eebc00765b4264701b012c8d8.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code to minimise the number of outstanding communication requests is
superseded by the collective buffers communication code. Therefore the
corresponding parameter has been deactivated (but not removed in order to keep
backwards compatibility with older checkpoints).
It is marked as deprecated in the param.ccl file and should not be used anymore
(use CarpetLib::use_collective_communication_buffers instead).
A level-2 warning of that meaning is printed at startup if the parameter is
still set in a user's parfile.
darcs-hash:20050411155524-776a0-ed9919869cc1f2821ab8b2fa23b4abea203b72ed.gz
|
|
|
|
|
|
| |
communication code
darcs-hash:20050411141439-776a0-98125bb76dcb733d3649cd50f9a27e4e7c9d2d6d.gz
|
|
|
|
| |
darcs-hash:20050411165653-891bb-42b5923d95fc75e8717c6fdcd7f3d180669711da.gz
|
|
|
|
|
|
|
| |
Only complain about a missing regridding function when more than one
refinement levels are possible.
darcs-hash:20050411165426-891bb-1c4b6916615461eae57106750888ba9fec2e80e7.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Receive operations are posted earlier now (don't wait until send buffers
are filled).
* A send operation is posted as soon as its send buffer is full (don't wait
until all send buffers have been filled).
* MPI_Irsend() is used instead of MPI_Isend()
This probably doesn't make a difference with most MPI implementations.
* Use MPI_Waitsome() to allow for overlapping of communication and computation
to some extent: data from already finished receive operations can be
copied back while active receive operations are still going on.
MPI_Waitsome() is now called (instead of MPI_Waitall()) to wait for
(one or more) posted receive operations to finish. The receive buffers
for those operations are then flagged as ready for data copying.
The drawback of this overlapping communication/computation scheme is
that the comm_state loop may be iterated more often now. My benchmarks on
up to 16 processors showed no performance win compared to using MPI_Waitall()
(in fact, the performance decreased). Maybe it performs better on larger
numbers of processors when there is more potential for network congestion.
The feature can be turned on/off by setting CarpetLib::use_waitall to yes/no.
For now I recommend using CarpetLib::use_waitall = "yes" (which is not the
default setting).
darcs-hash:20050411122235-776a0-e4f4179f46fce120572231b19cacb69c940f7b82.gz
|
|
|
|
|
|
| |
CCTK_GroupStorageIncrease() to find out the number of timelevels to checkpoint
darcs-hash:20050411121428-776a0-13b4d0626e749b2e20079d8101e7a5e9e57e18e1.gz
|
|
|
|
|
|
|
|
|
|
|
|
| |
Collective buffers were accidentally used (eg. by CarpetIOHDF5 or CarpetIOASCII)
even if CarpetLib::use_collective_communication_buffers was set to "no".
Now this parameter is evaluated in the comm_state constructor (together with
the variable type given) and the result stored in a flag
comm_state::uses__collective_communication_buffers. This flag is then used
later in comm_state::step() to decide about communication paths.
darcs-hash:20050411100916-776a0-aef034c4a23dac96f515cf831d15c8b7e2ce2f9d.gz
|
|
|
|
|
|
|
|
|
| |
The return code of CarpetSlab_Get() must be checked against 0 for successful
completion.
The return code of CarpetSlab_GetList() should be the number of slabs returned,
or negative in case of errors.
darcs-hash:20050410140710-776a0-f293cb8176f10a3cc4fd20a2a0eae71fbda09d9e.gz
|
|
|
|
|
|
|
|
| |
Resolve the conflict that exists between the patches that introduce
the mem<T> class conflicts and the option
CarpetLib::use_collective_communication_buffers.
darcs-hash:20050410175106-891bb-a66f3783fd8c897d65ed07f55b812e346b406baa.gz
|
|
|
|
| |
darcs-hash:20050305175432-891bb-6e79165d8a094d6e54e981cada0c891dfca3d8dd.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a new class mem<T> for memory management. Memory management
has become sufficiently complicated to move into its own class. The
class mem<T> features:
1. Allocating nelem items of type T
2. Managing contiguous regions of memory for several data<T> objects
for vector groups
3. Allowing a pointer to a memory region to be passed in, which is
used instead of allocating memory through new
4. Reference counting, so that the mem<T> object only goes away once
the last using data<T> object does not need it any more.
This makes it unnecessary to delete the first data<T> objects for a
grid function group last.
darcs-hash:20050305174647-891bb-e1f53adca34e5a668af96c662845cca0f259f8e6.gz
|
|
|
|
| |
darcs-hash:20050410120954-891bb-c8ce14e43bfe5591204679e18e27da5721ff0468.gz
|
|
|
|
| |
darcs-hash:20050409201450-891bb-24e5e0f2c92eeba86560d85754b40b8525434a3e.gz
|
|
|
|
| |
darcs-hash:20050409195510-891bb-78231ea04482513e00b2a0a278b514ae95e1a8c8.gz
|
|
|
|
| |
darcs-hash:20050409190317-891bb-e3601ddafcb9617a6780e43042cd008957a50d78.gz
|
|
|
|
|
|
| |
Synchronise all variables of the same vartype at once by calling Carpet::SyncProlongateGroups().
darcs-hash:20050407153843-776a0-e567718c6ba858f4c074c5ec65dd0fc5cb373526.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In one of my previous patches I accidentally added two bugs when optimising
the high-level synchronisation routines in Carpet:
* a SYNC statement did only synchronise the ghostzones
but forgot to prolongate the boundaries for grid functions
* SyncGroups() also tried to synchronise non-CCTK_GF variables
at multigrid and refinement levels other than 0
darcs-hash:20050407150044-776a0-46b495efd5b68ab401ad00c5ac705f786069022c.gz
|
|
|
|
|
|
| |
dist::datatype() to get rid of g++ compiler warnings
darcs-hash:20050406145726-776a0-16ef8cd6d00ca41fcd3662b93bffe649476ff31f.gz
|
|
|
|
| |
darcs-hash:20050316130144-3fd61-1f95c63b76c29de63f212546b5e4fa226afe7299.gz
|
|
|
|
|
|
|
| |
Intel compilers seem to ignore the qualifier but g++ didn't compile
automatic.cc anymore.
darcs-hash:20050405095600-776a0-14eb3587897219c3d8fb95a23befc5c8fa2a8227.gz
|
|
|
|
|
|
|
|
|
|
| |
collective communication buffers
So far collective buffers can be used only for the collector object.
For the case where all processors should receive the resulting hyperslab,
the comm_state loop was left untouched because I didn't understand the code.
darcs-hash:20050331082252-776a0-bae45b204fdf31f38969bee81c0ae97edae68f5c.gz
|
|
|
|
|
|
| |
collective communication buffers
darcs-hash:20050331080034-776a0-629822f876800af1b76d5d43ca131f5373e991a4.gz
|
|
|
|
|
|
| |
collective communication buffers
darcs-hash:20050331074851-776a0-fe39223cec4a68197e224c9b92f4fbef7b6258d8.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Collective buffers are used to gather all components' data on a processor
before it gets send off to other processors in one go. This minimizes the
number of outstanding MPI communications down to O(N-1) and thus improves
overall efficiency as benchmarks show.
Each processor allocates a pair of single send/recv buffers to communicate
with all other processors. For this the class (actually, the struct) comm_state
was extended by 3 more states:
state_get_buffer_sizes: accumulates the sizes for the send/recv buffers
state_fill_send_buffers: gathers all the data into the send buffers
state_empty_recv_buffers: copies the data from the recv buffer back into
the processor's components
Send/recv buffers are exchanged during state_fill_send_buffers and
state_empty_recv_buffers. The constructor for a comm_state struct now takes
an argument <datatype> which denotes the CCTK datatype to use for the
attached collective buffers. If a negative value is passed here then it falls
back to using the old send/recv/wait communication scheme. The datatype
argument has a default value of -1 to maintain backwards compatibility to
existing code (which therefore will keep using the old scheme).
The new communication scheme is chosen by setting the parameter
CarpetLib::use_collective_communication_buffers to "yes". It defaults to "no"
meaning that the old send/recv/wait scheme is still used.
So far all the comm_state objects in the higher-level routines in thorn Carpet
(restriction/prolongation, regridding, synchronization) have been enabled to
use collective communication buffers.
Other thorns (CarpetInterp, CarpetIO*, CarpetSlab) will follow in separate
commits.
darcs-hash:20050330152811-776a0-51f426887fea099d1a67b42bd79e4f786979ba91.gz
|
|
|
|
|
|
| |
groups
darcs-hash:20050325093919-776a0-2cf1c8734de6187a8622ad69f4d6ac3b1f86e14f.gz
|
|
|
|
| |
darcs-hash:20050323191002-776a0-4a40d844dee2a66e8802669d960709e3488216c4.gz
|
|
|
|
|
|
|
| |
Correct some errors in the automatic regridding routine.
Add a parameter for verbose screen output.
darcs-hash:20050323211540-891bb-8591fe329d8878afb826f7336d00daf6bb1345cb.gz
|
|
|
|
|
|
| |
case when CarpetRegrid::refinement_levels was also set in the recovery parfile
darcs-hash:20050321110931-776a0-6fd09edfbd764f2b4d3f296a3f8c429f1000e407.gz
|
|
|
|
|
|
|
|
|
| |
Checking the invariant of the bboxset class is probably O(N^3) in the
number of bboxes, possibly worse. It is a very slow operation when
there are many components in a simulation, especially with AMR, and
possibly also when running on many processors.
darcs-hash:20050321020814-891bb-c5eb1cde6f3ac064e39a362b19bd20e15d03bc24.gz
|
|
|
|
| |
darcs-hash:20050321020511-891bb-2775aab7e620a3c8da997c10cefcf5ac53124509.gz
|
|
|
|
|
|
| |
statements left that broke compiling Comm.cc
darcs-hash:20050317150756-776a0-d9e6719f3c75b27fe4aaa01a773e9db489a84a5c.gz
|
|
|
|
|
|
| |
into SyncGVGroup()
darcs-hash:20050316140925-3fd61-fd64d2290d26975fa5521f57f0d83442d5af7feb.gz
|
|
|
|
| |
darcs-hash:20050316123248-3fd61-b9695858d99c5d6dc769c0b4e1db3c50c9e5032a.gz
|
|
|
|
|
|
| |
{copy,interpolate}_from_{recv,send,wait}() further up into {copy,interpolate}_from() so that the code is shared
darcs-hash:20050316132044-776a0-525aa7485c2718a9717b6f253553982524872727.gz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch greatly reduces the number of outstanding MPI_Isend/MPI_Irecv
communication requests by moving the loop over comm_states (recv,send,wait)
from the outermost to the innermost.
This resolves problems with certain MPI implementations (specifically LAM,
MPICH-NCSA, and Mvapich over Infiniband) which potentially resulted in some
communication buffer overflow and caused the Cactus application to abort or
hang forever.
Preliminary benchmarks with BSSN_MoL show that the patch does not have a
negative impact on myrinet clusters (measured to 64 processors).
It even improves the Carpet performance on GigE clusters (measured up to 16
processors).
The order of the communication loops is controlled by the boolean parameter
CarpetRegrid::minimise_outstanding_communications
which defaults to "no" (preserving the old behaviour).
darcs-hash:20050311160040-3fd61-04d40ac79ef218252f9364a8d18796e9b270d295.gz
|
|
|
|
| |
darcs-hash:20050307170026-891bb-03754477692ad245563fda22ecdd4510da4549ab.gz
|
|
|
|
|
|
|
|
| |
Add a new flag Carpet::constant_load_per_processor which takes the
specified grid size and multiplies it by the number of processors.
When running benchmarks, this keeps the local load constant.
darcs-hash:20050307165844-891bb-7b4c36a5e3bb152086d2eb240a898cb2ac5a3122.gz
|