1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
|
% $Header$
\documentclass{article}
% Use the Cactus ThornGuide style file
% (Automatically used from Cactus distribution, if you have a
% thorn without the Cactus Flesh download this from the Cactus
% homepage at www.cactuscode.org)
\usepackage{../../../../doc/latex/cactus}
\RequirePackage{alltt}
\RequirePackage{fancyvrb}
\begin{document}
% The author of the documentation
\author{Steve White \textless swhite@aei.mpg.de\textgreater}
% The title of the document (not necessarily the name of the Thorn)
\title{ManualTermination\\
Manual Termination of Cactus Simulations}
% the date your document was last changed, if your document is in CVS,
% please use:
\date{$ $Date$ $}
\maketitle
% Do not delete next line
% START CACTUS THORNGUIDE
\begin{abstract}
Thorn \textbf{ManualTermination} safely terminates Cactus
simulation jobs, and can be configured to allow other users to
terminate the job.
The thorn can also be configured to terminate a certain number of minutes
before a given maximum walltime has elapsed. Also, it can be configured
to periodically check the contents of a given file, and terminate based
on the contents of that file.
In either case, the job should be checkpointed.
\end{abstract}
\section{Requirements}
The program must be set up for checkpointing. (It can be argued that
checkpointing functionality is common sense and good etiquette for
long-running programs in a multi-user environment.)
\section{Setup}
\begin{Verbatim}[commandchars=\\\{\},frame=single]
# # # # # # # # # # # # # # # Checkpointing / Recovery
ActiveThorns = "IOHDF5Util IOHDF5"
IO::checkpoint_dir = "cpr/"
IO::checkpoint_file = "chain" # Name to taste
IO::checkpoint_on_terminate = "yes"
IO::recover_dir = "cpr/"
IO::recover_file = "chain" # Same name
IO::recover = "autoprobe"
IOHDF5::checkpoint = "yes"
# # # # # # # # # # # # # # # Termination
ActiveThorns = "ManualTermination"
# termination by wall time
ManualTermination::on_remaining_walltime=1400 #minutes before termination
ManualTermination::max_walltime=12 # hours
# termination from a file
ManualTermination::termination_from_file=yes
ManualTermination::check_file_every=10 #evolution steps
ManualTermination::output_remtime_every_minutes=2 # how often to remind user
\end{Verbatim}
\section{Use}
The two modes, termination by wall time and termination from file, are
meant to be independent and can be used together or separately.
The default file checked is
\texttt{/tmp//cactus\_terminate.\textit{job\_id}},
where by default, \texttt{\textit{job\_id}} is gotten from the \texttt{PBS\_JOBID}
environment variable. If the environment variable
\texttt{MANUAL\_TERMINATION\_JOB\_ID} is set, that will be used instead
as the \texttt{\textit{job\_id}}.
In this configuration, any user may terminate the run by putting a '1' into
the specified file.
The the termination file is removed when the run shuts down.
It should be possible to use thorn \textbf{ManualTermination} with thorn
\textbf{JobChaining}. If a job is terminated by \textbf{ManualTermination},
\textbf{JobChaining} will not attempt to re-queue the simulation.
\section{Licensing and Support}
Thorn \textbf{ManualTermination} is distributed under the GNU Lesser Public
License.
For details please see the file \texttt{README} in the top-level
directory of this thorn.
Please send any suggestions or comments to the maintainer of the thorn.
% Do not delete next line
% END CACTUS THORNGUIDE
\end{document}
|