aboutsummaryrefslogtreecommitdiff
path: root/doc/documentation.tex
blob: 4efbd95af86838846274fd847d7cb86f24e288e6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
% $Header$

\documentclass{article}

% Use the Cactus ThornGuide style file
% (Automatically used from Cactus distribution, if you have a
%  thorn without the Cactus Flesh download this from the Cactus
%  homepage at www.cactuscode.org)
\usepackage{../../../../doc/latex/cactus}
\RequirePackage{alltt}
\RequirePackage{fancyvrb}

\begin{document}

% The author of the documentation
\author{Steve White \textless swhite@aei.mpg.de\textgreater}

% The title of the document (not necessarily the name of the Thorn)
\title{ManualTermination\\
       Manual Termination of Cactus Simulations}

% the date your document was last changed, if your document is in CVS,
% please use:
\date{$ $Date$ $}

\maketitle

% Do not delete next line
% START CACTUS THORNGUIDE

\begin{abstract}
Thorn \textbf{ManualTermination} safely terminates Cactus
simulation jobs, and can be configured to allow other users to 
terminate the job.

The thorn can also be configured to terminate a certain number of minutes
before a given maximum walltime has elapsed.  Also, it can be configured
to periodically check the contents of a given file, and terminate based
on the contents of that file.

In either case, the job should be checkpointed.
\end{abstract}



\section{Requirements}

The program must be set up for checkpointing.  (It can be argued that
checkpointing functionality is common sense and good etiquette for
long-running programs in a multi-user environment.)

\section{Setup}


\begin{Verbatim}[commandchars=\\\{\},frame=single]
# # # # # # # # # # # # # # # Checkpointing / Recovery
ActiveThorns                = "IOHDF5Util IOHDF5"

IO::checkpoint_dir          = "cpr/"
IO::checkpoint_file         = "chain"          # Name to taste
IO::checkpoint_on_terminate = "yes"
IO::recover_dir             = "cpr/"
IO::recover_file            = "chain"          # Same name
IO::recover                 = "autoprobe"
IOHDF5::checkpoint          = "yes"

# # # # # # # # # # # # # # # Termination
ActiveThorns                = "ManualTermination"

           # termination by wall time
ManualTermination::on_remaining_walltime=1400   #minutes before termination
ManualTermination::max_walltime=12   # hours

           # termination from a file
ManualTermination::termination_from_file=yes
ManualTermination::check_file_every=10          #evolution steps
ManualTermination::output_remtime_every_minutes=2 # how often to remind user

\end{Verbatim}

\section{Use}

The two modes, termination by wall time and termination from file, are
meant to be independent and can be used together or separately.

The default file checked is
\texttt{/tmp//cactus\_terminate.\textit{job\_id}},
where by default, \texttt{\textit{job\_id}} is gotten from the \texttt{PBS\_JOBID}
environment variable.  If the environment variable
\texttt{MANUAL\_TERMINATION\_JOB\_ID} is set, that will be used instead
as the \texttt{\textit{job\_id}}.

In this configuration, any user may terminate the run by putting a '1' into
the specified file.

The the termination file is removed when the run shuts down.

It should be possible to use thorn \textbf{ManualTermination} with thorn
\textbf{JobChaining}.  If a job is terminated by \textbf{ManualTermination},
\textbf{JobChaining} will not attempt to re-queue the simulation.

\section{Licensing and Support}

Thorn \textbf{ManualTermination} is distributed under the GNU Lesser Public
License.
For details please see the file \texttt{README} in the top-level
directory of this thorn.

Please send any suggestions or comments to the maintainer of the thorn.

% Do not delete next line
% END CACTUS THORNGUIDE

\end{document}