Experimental Overlayed File Transfer

This is documentation for an EXPERIMENTAL feature, and the feature is subject to change without notice!

Experimental Overlapped File Transfer

This experimental feature introduced in HTCondor version 8.1.6 allows a pipelined use of an execute slot by overlapping the execution of one job with the transferring of output from the previous job. This work is detailed in #4291 . The motivation behind this feature is that the amount of time used for transferring output files back to the submit host can be significant, while the CPU usage during the transfer is minimal. The goal is a more productive use of the CPU during this transfer time, by allowing it to start on a new job.

In implementation, an execute slot is paired with minimal-resource slot by configuration, and both slots are claimed together. These minimal-resource slots are called transfer slots or buddy slots . A job begins its execution on the execute slot. When the job is done with its CPU-intensive phase, it invokes a condor_chirp command. HTCondor then moves the job from the execute slot to its paired transfer slot, provided that transfer slot is not being used by a prior job for that prior job's output transfer. And, the execute slot can then be matched with a new job.

It is assumed that the job is doing its own output transfer.

This implementation is only for static slots.

Configuration to Enable the Overlapped File Transfer

A single metaknob configures all static slots on a machine to use this experimental feature:

  USE EXPERIMENTAL : Async_Stageout

What the Job Needs to Do

The job needs to identify when it is done with its execute phase and is about to enter its output transfer phase. To do so, the job invokes the the undocumented condor_chirp command:

  condor_chirp phase output

In order to use chirp commands, the job's submit description file must contain

  +WantIOProxy = true