% $Header$ \documentclass{article} % Use the Cactus ThornGuide style file % (Automatically used from Cactus distribution, if you have a % thorn without the Cactus Flesh download this from the Cactus % homepage at www.cactuscode.org) \usepackage{../../../../doc/latex/cactus} \RequirePackage{alltt} \RequirePackage{fancyvrb} \begin{document} % The author of the documentation \author{Steve White \textless swhite@aei.mpg.de\textgreater} % The title of the document (not necessarily the name of the Thorn) \title{ManualTermination\\ Manual Termination of Cactus Simulations} % the date your document was last changed, if your document is in CVS, % please use: \date{$ $Date$ $} \maketitle % Do not delete next line % START CACTUS THORNGUIDE \begin{abstract} Thorn \textbf{ManualTermination} safely terminates Cactus simulation jobs, and can be configured to allow other users to terminate the job. The thorn can also be configured to terminate a certain number of minutes before a given maximum walltime has elapsed. Also, it can be configured to periodically check the contents of a given file, and terminate based on the contents of that file. In either case, the job should be checkpointed. \end{abstract} \section{Requirements} The program must be set up for checkpointing. (It can be argued that checkpointing functionality is common sense and good etiquette for long-running programs in a multi-user environment.) \section{Setup} \begin{verbatim}[commandchars=\\\{\},frame=single] # # # # # # # # # # # # # # # Checkpointing / Recovery ActiveThorns = "IOHDF5Util IOHDF5" IO::checkpoint_dir = "cpr/" IO::checkpoint_file = "chain" # Name to taste IO::checkpoint_on_terminate = "yes" IO::recover_dir = "cpr/" IO::recover_file = "chain" # Same name IO::recover = "autoprobe" IOHDF5::checkpoint = "yes" # # # # # # # # # # # # # # # Termination ActiveThorns = "ManualTermination" # termination by wall time ManualTermination::on_remaining_walltime=1400 #minutes before termination ManualTermination::max_walltime=12 # hours # termination from a file ManualTermination::termination_from_file=yes ManualTermination::check_file_every=10 #evolution steps ManualTermination::output_remtime_every_minutes=2 # how often to remind user \end{verbatim} \section{Use} The two modes, termination by wall time and termination from file, are meant to be independent and can be used together or separately. The default file checked is on the root node (the node of MPI rank 0) \texttt{/tmp//cactus\_terminate.\textit{job\_id}}, where by default, \texttt{\textit{job\_id}} is gotten from the \texttt{PBS\_JOBID} environment variable. If the environment variable \texttt{MANUAL\_TERMINATION\_JOB\_ID} is set, that will be used instead as the \texttt{\textit{job\_id}}. In this configuration, any user may terminate the run by putting a `1' into the specified file. The the termination file is removed when the run shuts down. It should be possible to use thorn \textbf{ManualTermination} with thorn \textbf{JobChaining}. If a job is terminated by \textbf{ManualTermination}, \textbf{JobChaining} will not attempt to re-queue the simulation. \section{Licensing and Support} Thorn \textbf{ManualTermination} is distributed under the GNU Lesser Public License. For details please see the file \texttt{README} in the top-level directory of this thorn. Please send any suggestions or comments to the maintainer of the thorn. % Do not delete next line % END CACTUS THORNGUIDE \end{document}