Experimental Condor Annex

description

condor_annex is (will be) a tool for expanding an HTCondor pool into the cloud. (It presently only supports AWS.)

Annex is both a verb -- the condor_annex tool doesn't scavenge cycles, but acquires them -- and a noun, referring to the collection of slots in the cloud.

motivation

A user of your HTCondor pool, Dr. Needs-Moore, needs more cycles in less time than your pool can provide. (This can happen for any number of reasons, but deadlines are a good and sufficient one.) However, Dr. Needs-Moore has money to spend on solving this problem. Solution: trade Dr. Needs-Moore's money for cycles. Enter the cloud.

constraints

We argue that the person whose money would be spent is the best one to decide when and how much to spend, so condor_annex must be usable by Dr. Needs-Moore. However, the good doctor isn't a systems administrator, so condor_annex needs to be simple and easy to use. At the same time, it should be hard to spend "too much" money. In particular, idle instances should shut themselves off, and there should be a limit to how long the instances will run in any case (a lease). (Naturally, it must be possible to extend the lease as necessary.)

Our prototype reduces the end-user's responsibilities to the following command:

condor_annex --project-id 'TheNeeds-MooreLab' --expiry '2015-12-18 23:59' --instances 16

proposed first-time user set-up

  1. Dr Needs-Moore contacts the pool administrator wanting to run more jobs (by a deadline).
  2. The administrator determines that Dr Needs-Moore needs more cycles than their pool can provide (in that time).
  3. The administrator creates an AWS user for the doctor (using the AWS web console), downloads the doctor's credentials, massages them a bit, and sends the resulting file to the doctor. The administrator also sends along a project ID.
  4. The doctor copies the file to the right place and, if our packaging hasn't already, has a grad student install the AWS command-line tool.

installation

....

Quick Start

This section explains how to set up condor_annex for single-user testing. It assumes a fair degree of familiarity with AWS and HTCondor.

AWS

Start by creating a new AWS account, or logging into your existing account. You may prefer to do the former, since condor_annex is still experimental. For the same reason, condor_annex presently assumes that it has almost unlimited privileges. (In the future, there will be a high-privilege one-time set-up, and the end user will run condor_annex with a low-privilege AWS user (or role).) You may want to create a(n IAM) user with "Admin" privileges, instead.

You may have noticed a lot of information missing from the command-line above. condor_annex does all of its AWS work through the 'aws' command-line tool, which has a configuration file containing the access key and secret key for a given AWS account (or user or role). If, like above, the command-line is incomplete, condor_annex will look up the missing information in the 'aws' tool's account. This allows you (the annex administrator) to update and maintain the defaults without having to distribute files to your end users.

The four default things you can set in the account are listed below. All are region-specific. Right now, condor_annex works best with the 'us-east-1' region, although the 'us-west-2' region is also supported. (Right now, you must manually adjust the 'project ID' command line argument conform to the S3 bucket name restrictions, which are much tighter in us-west-2 for some reason. You should also change the hard-coded default region in the condor_annex script.)

  1. An SSH keypair. condor_annex will use the one named "HTCondorAnnex". (Found on the "EC2" page.)
  2. A VPC. condor_annex will use the one whose "name" tag is "HTCondorAnnex". (Found on the "Networking" page.) Turn on support in the VPC for DNS hostnames.
  3. VPC subnets. condor_annex will use the ones whose "name" tag is "HTCondorAnnex". (Also found on the "Networking" page.")
  4. Launch configurations. condor_annex will use the ones named "HTCondorAnnex-1" through "HTCondorAnnex-8". (Found on the "EC2" page. You may need to create an AutoScaling Group in order to create a Launch Configuration; after doing so, you can cancel out of creating the AutoScaling Group.)

To speed your quick start, we provide the following Amazon Linux AMIs with HTCondor 8.4.2 pre-installed:

  • us-east-1 - ami-91e1a3fb
  • us-west-1 - ami-7f06731f
  • us-west-2 - ami-ac8890cd

The launch configurations need only the instance type, the AMI ID, and the spot price, if any. Additional specifications are presently ignored.

Once you've used the web console to create all of these objects, you'll need to install and configure the 'aws' command-line tool. You don't need to set a default region, but condor_annex defaults to 'us-east-1'. Something along the lines of "yum install python-pip; pip install awscli" will usually get the tool installed. To configure, run 'aws configure'.

HTCondor

condor_annex assumes that it is run from a shell with a live HTCondor installation. This means that the HTCondor binaries are in the PATH and that the environment variable CONDOR_CONFIG points to an HTCondor configuration file. That configuration file should include lines like the ones below, since condor_annex assumes that pool password security is used (so that it knows what security token(s) to propagate). The easiest way to accomplish this may be to create a personal condor; this also has the virtue of isolating your real pool from condor_annex while you test it. See

https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=CreatingPersonalHtcondor

for more information about installing a personal condor.

SEC_CLIENT_AUTHENTICATION_METHODS = FS, PASSWORD
SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD
SEC_PASSWORD_FILE = $(LOCAL_DIR)/password_file
ALLOW_WRITE = condor_pool@*/* $(FULL_HOSTNAME) $(IP_ADDRESS)

The manual has information on how to generate 'password_file'. See also

https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToEnablePoolPassword

for more information about enabling pool password.