Transfer Daemon Meditation

Flow of the TransferDaemon

Start of TD daemon:

+ Globally construct a TransferD object.

libc calls main():
+ Set up daemoncore, call dc_main() which calls main_init()
+ Call g_td.init();
 + Parse command line and create object representing command line options.
  + The TransferD object has a "Features" object called m_features. This
    holds command line options and all their defaults.
  + The valid command line options are:
    --schedd <sinfulstring>
      To which schedd should the transferd register itself.
      Get TransferRequests from stdin.
    --id <ascii_key>
      Used by the transferd to identify itself to the schedd. A shared
      "secret" used only to pair an incoming request of a registering
      transferd and the schedd's forking of it. This is because the schedd
      can fork more than one at a time and need to know which registration
      mmatches which reason the schedd needed it.
    --timeout <number of seconds>
      If there is no work to do, exit in number of seconds.
    --shadow <upload|download> [DEMO option, may not be used in production]
      The transferd must get a transferrequest via stdin, and then
      will connect to a shadow to perform up/down load as requested.

 + If using stdin, call TransferD::accept_transfer_request().
  + Read the first line of stdin, this represents an "encapsulation
    method" which dictates how to process the stuff that comes
    after. The function encap_method() converts this line to an
    enum. Currently, there is only one: ENCAP_METHOD_OLD_CLASSADS (and
    ENCAP_METHOD_UNKNOWN which means I don't know what kind to use).
  + If the encapsulation is validly ENCAP_METHOD_OLD_CLASSADS, then call
    the function accept_transfer_request_encapsulation_old_classads().
    + Read the TransferRequest ClassAd from stdin. This details how many
      additional job classads are coming down the pipe for this transfer.
    + Create a single TransferRequest object from the above ad.
    + Read multiple work ads (each representing a jobad which the FileTransfer
      Object will use as a unit of work for a transfer) and store them into
      the TransferRequest object.
    + Since we can only have one transfer request given to the transferd in
      the stdin mode, we will generate a "capability" that represents the
      TransferRequest and associate the transfer request with the capability.
    + Since we have work to do, ensure our inactivity timer will not be fired.
  + Mark ourselves initialized.

 + Tell the schedd I'm alive and ready: [g_td.register_to_schedd()]
  + Determine the location of the schedd from the Features object.
  + Get my own sinful string.
  + Create a DCSchedd object to the schedd.
  + The DCSched object knows how to register the treansferd via the
    schedd.register_transferd(...) call: [DCSchedd::register_transferd()]
    + StartCommand the TRANSFERD_REGISTER command to the schedd.
    + Force authentication
     + Send a registration request classad that includes:
      - ATTR_TREQ_TD_SINFUL: sinful string of TransferDaemon
      - ATTR_TREQ_TD_ID: id string of secret key given by the schedd on the
        command line.
    + The reponse ad from the schedd contains:
      - ATTR_TREQ_INVALID_REQUEST which is either TRUE or FALSE
      - If FALSE, then ATTR_TREQ_INVALID_REASON is defined.
 + If the registration is:
   Not valid: EXCEPT
   Valid: Store the socket to the schedd as m_update_sock. This is a connection
   that is used to send updates about TransferRequests from the transferd to
   the schedd.
 + Return from g_td.init();

+ Call g_td.register_timers():
 + Register: TransferD::process_active_requests_timer()
   This timer is commented out.
   I apologize because I don't know why, and this is one of the major
   means by which the TransferD performs its functions. I think it was
   commented out so all of this stuff could be merged into mainline
   HTCondor and not affect anything before we could finish it all.
 + Register: TransferD::exit_due_to_inactivity_timer()
   This will cause the transferd to exit if ther eis no work to
   perform. Coupled with the above, it means in mainline HTCondor, if the
   transferd starts, it will automatically exit because it won't process
   any of its queued work.

+ Call g_td.register_handlers():
 + Register:
     DUMP_STATE -> TransferD::dump_state_handler
   This function will emit information to condor_squawk.
 + Register:
     TRANSFERD_CONTROL_CHANNEL -> TransferD::setup_transfer_request_handler()
   This function manages a permanent connection to the schedd and is the control
   channel between the schedd and the transferd. There are scalability
   issues with this architecture which were known about, but we decided to
   solve later once the architecture of how the schedd communicated to the
   transferd was more finalized.
 + Register:
     TRANSFERD_WRITE_FILES -> TransferD::write_files_handler();
   This function uses some mechanism (like the file transfer object
   itself) to write transfered files into a local directory. It could
   be the spool, the initialdir, or whatever.
 + Register:
     TRANSFERD_READ_FILES -> TransferD::read_files_handler();
   This function uses some mechanism (like the file transfer object
   itself) to read transferring files from a local directory. It could
   be the spool, the initialdir, or whatever.
 + Register a Repaer:
     "Reaper" -> TransferD::reaper_handler()
   This function handles the exiting of any process which wasn't a process
   devoted specifically to reading/writing files.

Handler DUMP_STATE: TransferD::dump_state_handler

When this handler is invoked, it constructs a classad and sends it back on
the wire.

The contents of this classad can includes all sorts of status information
or other statistics about the transferd.

Currently, it will create a classad with:
    Uid = getuid()
    OutstandingTransferRequests = <how many requests left to do>

and send it back to the client which requested it. Afterwards, the stream
is not kept and daemoncore closes the connection.

Handler TRANSFERD_CONTROL_CHANNEL: TransferD::setup_transfer_request_handler()

ASIDE: The transferd registration processes looks like this:
1. Schedd fork/execs transferd with secret key on command line.
2. TransferD connects back to the schedd with the secret key and closes the
3. Schedd validates the key and sinful of the transferd with what it is
4. The schedd connects back to transferd, this is called the control
   connection, and the tether stays active until the transferd exits.
   More transfer requests, among other things, can come down the control

This handler manages step 4:

+ Make sure we're authenticated by calling rsock->triedAuthentication().
+ Register_Socket the control socket to handle any future TransferRequests
  from the schedd:
  Control Socket Fd -> TransferD::accept_transfer_request_handler();
+ We keep the stream alive, and return.

TRANSFERD_WRITE_FILES -> TransferD::write_files_handler()

This handler is called when a client wishes to send a TransferRequest to
the TransferD which results in files being written *to* the TransferD.
How the actual files themselves are written depends upon the ATTR_TREQ_FTP
attribute in the request ad from the client. FTP in this attribute means
"File Transfer Protocol", but NOT in the same manner as the ftp program.
The values of that attribute dictate HOW the transferd should accept
and write those files. Currently, the only supported method is the
enumeration FTP_CFTP, which means "File Transfer Protocol: HTCondor File
Transfer Protocol".

+ Make sure we're authenticated by calling rsock->triedAuthentication().
+ Get the fully qualified user name fromthe socket.
+ Grab the ClassAd from the socket, and lookup the capability string.
  This capability represents a TransferRequest association, not a FileTransfer
  capability as found in a job ad.
+ If the capability is unknown, abort the request with a classad and close the
  connection, otherwise, find the TransferRequest object associated with the
+ Get the ATTR_TREQ_FTP protocol from the request ad.
  It can only be FTP_CFTP because I had implemented one protocol so far!
+ Ensure that the request is ONLY of the "upload" variety. Upload in this
  context means the transfer is ocurring from the client to the TransferD.
+ If everything is ok, send a success classad back to the client.

[Now we set up a "thread" using DaemonCore CreateThread which
_actually_ performs the file transfer in an asynchronous manner using
the FTP protocol defined.]

+ Create a reaper to handle the call to TransferD::write_files_thread() which
  will enact the actual transfer.
+ Create a ThreadArg object which holds the FTP protocol enumeration and
  the TransferRequest.
+ Create the TransferD::write_files_thread() thread passing in the ThreadArg
  object (so it knows what to do) and associating it with the aforementioned
  reaper id.
+ TODO: If this creation fails, I don't do anything! I should.
+ Associate the newly created thread with its reaper id in a data structure so
  the transferd can refer to it later when it finishes.
+ Return, closing the stream to the client.

THREAD TransferD::write_files_thread

The purpose of this thread, which was created in
TransferD::write_files_handler(), is to make asynchronous the act
of having a client upload a set of files in a TransferRequest to
the TransferD.  Otherwise the TransferD would have to block while the
transfer happened and that would suck _a lot_. This function is running
in either another thread, or another process under the TransferD.

[NOTE: This function is barely implemented to just exactly perform the task
at hand. It needs obvious improvement before it can be production

+ First thing we do is sleep(1), it is a hack that the comments explain.
+ We pick out the protocol and TransferRequest form the passed in ThreadArg.
+ TODO: We should have different actions based upon the FTP protocol here,
  however, we don't and just assume FTP_CFTP in the control flow. This needs
  to be fixed for it to be production code.
+ The client is actually waiting to perform the transfer on this socket, so
  we'll set the timeout to be very large (8 hours) so the transfer can finish
  without timeouts happening.
+ Now, we iterate over the tasks in the TransferReqeust object. Each task is
  a job ad that we can instantiate a FileTransfer object with.
  For each job ad in the TransferRequest:
  + Create a FileTransfer object.
  + SimpleInit() it.
  + SetPeetVersion() whatever the TranferRequest stated.
  + DownloadFiles() and store all the files in the local file system wherever
    the job ad stated it should go.
 + When we're done will all of them it is end of message to the client.
+ Send back a classad to the client that everything is ok.
+ Exit the thread/process with EXIT_SUCCESS. The reaper cares about this.

REAPER TransferD::write_files_reaper

This handler gets called with a write_files_thread image/thread exits. The
main act of this handler is to inform the schedd that the TransferRequest
has been attempted (and hopefully completed!) and what the results of that was.

+ Ensure that the just exited transfer thread that I just caught the reaped
  status of is something that I actually asked to happen. This is a consistency
+ Construct a status ad which contains the capability of the TransferRequest
  and how the sub-thread/process succeeded or failed.
+ Send the status ad back to the schedd.
+ Destroy the TransferRequest. It is completely finished. If it failed, the
  idea is that the schedd will send another one with the same tasks to
  be performed.
+ If there are no more requests, then set the inactivity timer.

TRANSFERD_READ_FILES -> TransferD::read_files_handler()

[This handler is a very similar analogue to TransferD::write_files_handler().]

THREAD TransferD::read_files_thread

[This thread is a very similar analogue to TransferD::write_files_thread().]

The only _real_ difference is that UploadFiles() is called on the FileTransfer

REAPER TransferD::read_files_reaper

[This reaper is a very similar analogue to TransferD::write_files_reaper().]

TIMER TransferD::exit_due_to_inactivity_timer()

+ If m_inactivity_timer == 0, the timer is off and we return.
+ Otherwise, we subtract m_inactivity_timer from now, and see if it is
  greater then the timeout specified (on the command line, or by default).
+ If we timeout due to inactivity, we DC_Exit(TD_EXIT_TIMEOUT).from now,
  and see if it is greater then the timeout specified (on the command line,
  or by default).
+ If not, the timer fires again later...

TIMER TransferD::process_active_requests_timer() DEMO code!

The purpose of this timer is to see if there are any pending pending
TransferRequests and to process them. Currently this code path is demo
code in that it performs the first active TransferRequest it knows about,
and then exits the process.

ASIDE: TransferRequests have two main modes, and a third hacked mode.

The first mode is ACTIVE. This means the transferd itself initiates the
processing of a TransferRequest.

The second mode is PASSIVE. This means that a client initiated the
processing of a TransferRequest.

The hacked one is ACTIVE_SHADOW. This means that the transferd is going
to be initiating the handling of a TransferRequest, except it will do
so only be hard coded communication with a condor_shadow.

WARNING: This timer code is smushy in that it was never fully developed
and/or finished. Coupled with that, there is some demo code in here
that connects to a running shadow to pull files from wherever the shadow
states to get them (which in this case is itself).

+ Iterate over the known about TransferRequests
 + [TREQ_MODE_ACTIVE: No behavior implemented, TODO in source]
 + [TREQ_MODE_PASSIVE: Don't do anything, since a client is managing it]
 + [TREQ_MODE_ACTIVE_SHADOW: Demo code, talk to a shadow and process it]
  + Figure out how many jobads (AKA transfer ads) to process. However I only
    will be processing the first jobad in the TransferRequest!
  + Create a FileTransfer object. Init() it so it uses Daemoncore.
  + Set up a callback to TransferD::active_shadow_transfer_completed() which
    will get called by Daemoncore when the file transfer is completed.
  + Depending upon if the TransferD was invoked with 'upload' or 'download'
    on the command line, tell the FileTransfer object to UploadFiles() or
    DownloadFiles() respectively. This is a nasty hack since an active
    TransferRequest should know if it is uploading or downloading--the actual
    TransferRequest object can know this information, but who specified
    it wasn't completed.
+ If I found any active TransferRequests, return that I found some.

CALLBACK TransferD::active_shadow_transfer_completed()

This is the control continuation from

+ Ensure the file transfer suceeded, except otherwise.
+ DC_Exit();