Where does job output go?
Data Depot
The second question I need to answer from the top three storage questions [1] a friend sent me is "How do you know where data is located after a job is finished?" This is an excellent question that HPC users who use a resource manager (job scheduler) should contemplate. The question is straightforward to answer, but the question also opens a broader, perhaps philosophical question: Where "should" or "could" your data be located when running a job (application)?
To answer the question with a little background, I'll start with the idea of a "job." Assume you run a job with a resource manager such as Slurm. You create a script that runs your job – this script is generically referred to as a "job script" – and submit the job script to the resource manager with a simple command, creating a "job." The job is then added to the job queue controlled by the resource manager. Your job script can define the resources you need; set up the environment; execute commands, including defining environment variables; execute the application(s); and so on. When the job finishes or the time allowed is exceeded, the job stops and releases the resources.
As resources change in the system (e.g., nodes become available), the resource manager checks the resource requirements of the job, along with any internal rules that have been defined about job priorities, and determines which job to run next. Many times, the rule is simply to run the jobs in the order they were submitted – first in, first out (FIFO).
When you submit your job script to the "queue," creating the job, the resource manager holds the details of the job script and a few other items, such as details of the submit command that was used. After creating the job, you don't have to stay logged in to the system. The resource manager runs the job (job script) on your behalf when the
...Buy this article as PDF
(incl. VAT)