tool rents computational resources from Amazon's cloud service and adds those resources to an HTCondor pool for your jobs to use. These instructions document how to use
for CHTC jobs. Some restrictions apply:
You will need a log-in on
annex-cm.chtc.wisc.edu. (Ask your research computing facilitator about this.)
The jobs you want to run on AWS must have
The jobs you want to run on AWS must have
WantFlockingset. This means your jobs will flock! If you don't know what that means, don't use
condor_annexuntil you've talked to your research computing facilitator.
The jobs you want to run on AWS must have
requirementswhich match the resources acquired by
condor_annex. By default, those resources will run an EL6-like operating system, but they won't have
Intentionally or otherwise, other users of
condor_annexon the CHTC may run jobs they don't own, including yours, on their resources. We're working on a solution to this, but if that possibility worries you, don't use
Working with these restrictions will be covered in the following instructions.
In these instructions, we've included sample output after the commands. Lines you should execute start with the dollar sign (
) character; do not include the dollar sign (
) when copying the line.
We are developing a similar way for CHTC jobs to run on the Cooley cluster at Argonne National Lab: UsingCooleyAnnexOnChtc
Prepare to Add Resources for Your Jobs
- Grant Access to Your AWS Account
- Lay the Groundwork in AWS
- Check the Groundwork
- Submit a Test Job
- Add Resources for Your Jobs
Run Jobs on Your Resources
- Change a Submit File
- Edit an Idle Job
- Clean Up (optional)
These instructions assume this is the first time you're using
on CHTC. You'll want to have two terminal windows open: one for running
commands (logged into
) and another for submitting jobs (currently, only
is supported). If you've used
on CHTC before, skip ahead to section 1.3 ("Check the Set-Up").
1 Prepare to Add Resources for Your Jobs
Before you can add resources for your jobs, you must (a) give
access to your AWS account (so it can rent resources on your behalf) and (b) lay some groundwork for
at AWS. We then recommend that you check and make sure that the groundwork was laid properly. Don't worry, the second and third step have been automated.
1.1 Grant Access to Your AWS Account
needs an account to use AWS. You can grant
access to your account by acquiring a pair security "keys" that function like a user name and password. Like a user name, the "access key" is (more or less) public information; the corresponding "secret key" is like a password and must be kept a secret. To help keep both halves secret, you never tell
the keys themselves; instead, you put each key in its own protected file.
To create those two files, execute the following commands on
$ mkdir -p ~/.condor $ cd ~/.condor $ touch publicKeyFile privateKeyFile $ chmod 600 publicKeyFile privateKeyFile
The last command ensures that no user other than you can read or write to those files. (Like any other file on CHTC machines, these files will be readable by the CHTC administrative staff. If that bothers you, contact us for alternatives.)
To fill the files you just created, go to the IAM console ; log in if you need to. The following instructions assume you are logged in as a user with the privilege to create new users. (The 'root' user for any account has this privilege; other accounts may as well.)
- Click the "Add User" button.
- Enter name in the User name box; "annex-user" is a fine choice.
- Click the check box labelled "Programmatic access".
- Click the button labelled "Next: Permissions".
- Select "Attach existing policies directly".
- Type " AdministratorAccess " in the box labelled "Filter".
- Click the check box on the single line that will appear below (labelled " AdministratorAccess ").
- Click the "Next: review" button (you may need to scroll down).
- Click the "Create user" button.
From the line labelled "annex-user", copy the value in the column labelled "Access key ID" to
On the line labelled "annex-user", click the "Show" link in the column labelled "Secret access key"; copy the revealed value to
- Hit the "Close" button.
You have now granted
access to your AWS account.
1.2 Lay the Groundwork in AWS
It takes a few minutes for
to lay the groundwork it needs at AWS. Since this groundwork doesn't cost you anything to keep around, you can just create it once and forget about it. Run the following commands on
; you should still have a terminal window logged in there from the previous step.
$ condor_annex -setup Creating configuration bucket (this takes less than a minute)....... complete. Creating Lambda functions (this takes about a minute)........ complete. Creating instance profile (this takes about two minutes)................... complete. Creating security group (this takes less than a minute)..... complete. Setup successful.
1.3 Check the Groundwork
You can verify at this point (or any later time) that the groundwork was laid successfully by running the following command (also on
$ condor_annex -check-setup Checking for configuration bucket... OK. Checking for Lambda functions... OK. Checking for instance profile... OK. Checking for security group... OK.
If you don't see four "OK"s, return to step 1.1 and try again. If you've done that once already, contact your research computing facilitator for assistance.
2 Submit a Test Job
It sounds a little strange, but if you submit a test job
you add resources for your jobs, you won't have to wait as long for it to start, which will save you both time and money. Use a second terminal window to log into
and create the following submit file:
executable = /bin/sleep transfer_executable = false should_transfer_files = true universe = vanilla arguments = 600 log = sleep.log # You MUST include this when submitting from CHTC to let the annex see the job. +WantFlocking = TRUE # This is required, by default, to run a job in an annex. +MayUseAWS = TRUE # The first clause requires this job to run on EC2; that's what makes it # good as a test. The second clause prevents CHTC from setting a # requirement for OpSysMajorVer, allowing this job to run on any. requirements = regexp( ".*\.ec2\.internal", Machine ) && (TRUE || TARGET.OpSysMajorVer) queue 1
Submit this file to the queue; it won't run until after you've completed the next step.
3 Add Resources for Your Jobs
Entering the following on
will add resources for your jobs to the pool. We call the set of resources you added an "annex". You have to supply a name for each annex you create; the example below uses 'MyFirstAnnex'. When you run
, it will print out what it's going to do, and then ask you if that's OK. You must type 'yes' (and hit enter) at the prompt to start an annex; if you do not,
will print out instructions about how to change whatever you may not like about what it said it was going to do, and then exit. The following command adds one resource (an "instance") for one hour; you should increase that if the job you want to run takes longer. Don't increase the number of resources if you haven't tested your job with
yet; you can easily add resources after you've verified that everything works.
$ condor_annex -count 1 -annex-name MyFirstAnnex -idle 1 -duration 1 Will request 1 m4.large on-demand instance for 1 hours. Each instance will terminate after being idle for 1 hours. Is that OK? (Type 'yes' or 'no'): yes Starting annex... Annex started. Its identity with the cloud provider is 'MyFirstAnnex_f2923fd1-3cad-47f3-8e19-fff9988ddacf'. It will take about three minutes for the new machines to join the pool.
You won't need to know the annex's identity with the cloud provider unless something goes wrong.
Before starting the annex,
will check to make sure that the instances will be able to contact CHTC. Contact your machine's administrator if
reports a problem with this step.
Otherwise, wait a few minutes and run the following to make sure your annex has started up and joined the pool:
$ condor_annex status -annex MyFirstAnnex Name OpSys Arch State Activity LoadAv firstname.lastname@example.org LINUX X86_64 Unclaimed Idle 0.000 email@example.com LINUX X86_64 Unclaimed Idle 0.000 Machines Owner Claimed Unclaimed Matched Preempting Drain X86_64/LINUX 2 0 0 2 0 0 0 Total 2 0 0 2 0 0 0
An annex (by default) will only runs jobs which (a) you submitted and (b) have MayUseAWS set to true. You can confirm this by running the following command:
$ condor_annex -annex MyFirstAnnex status -af:r START (MayUseAWS == true) && stringListMember(Owner,"tlmiller") (MayUseAWS == true) && stringListMember(Owner,"tlmiller")
above should be your username.
There are additional instructions for general annex use. For now, we'll move on to actually running on your new resource.
4 Run Jobs on Your Resources
It shouldn't take more than a few minutes from when your new resources join the pool for your test job to start running. The test job sleeps for ten minutes to make it easier to catch in the running state; you don't need to wait for the test job to finish before testing one of your own jobs.
You can make use of the annex resources for your own jobs in two ways: by modifying an existing submit file or by editing jobs already in the queue.
4.1 Change a Submit File
Be sure to make a backup copy of your submit file before you start changing your submit file. ;)
You will need to add the following lines to the submit file (before the queue statement):
+MayUseAWS = TRUE +WantFlocking = TRUE
condor_annexif you don't want your jobs flocking.
You will also need to modify the
line. The modification has two goals: first, to prevent your jobs from running anywhere other than CHTC or your annex; and second, to allow your job to run on either CHTC or your annex. To accomplish the first goal, add the following clause to the
line, changing "
" to the name of your annex, if different:
(AnnexName == "MyFirstAnnex" || TARGET.PoolName == "CHTC")
If the phrase "add the following clause" didn't make a lot of sense to you, the new
line should like the following, except replacing the text
with everything to the right of the equals sign in the original requirements line.
requirements = (<old-requirements>) && (AnnexName =?= "MyFirstAnnex" || TARGET.PoolName == "CHTC")
As mentioned above, the default operating system for resources acquired by
does not advertise
, but is "like" EL6. If the old requirements do not mention
, add the following clause to the
(TRUE || OpSysMajorVer)
(This tautology will prevent CHTC's submit nodes from adding a requirement that
be 6 or 7.)
If the old requirements
, the old requirements most likely include a clause of the following form:
OpSysMajorVer == 6
Change this clause to the following:
OpSysMajorVer =?= 6
how the old requirements use
, ask a research computing facilitator for help.
4.2 Modify an Idle Job
If you want a job that's already in the queue to run on your AWS resources, you can use the
command to make changes much like the ones you would if you were changing a submit file. However, to make sure your job doesn't flock somewhere you don't want your job to go, you'll need to modify the
before you set
For these examples, we'll assume your job's ID is 100234.5.
You can run
condor_q 100234.5 -af:r requirements
to examine a queued job's requirements. When making a change, copy-and-paste the whole thing and sandwich it between single quotes (
), as in the example below, which assumes your annex is named
$ condor_q 100234.5 -af:r requirements (TARGET.PoolName == "CHTC") && ((Target.OpSysMajorVer == 6) || (Target.OpSysMajorVer == 7)) && (OpSysName =!= "Debian") && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.HasFileTransfer) $ condor_qedit 100234.5 requirements '(TARGET.AnnexName == "MyFirstAnnex" || TARGET.PoolName == "CHTC") && ((Target.OpSysMajorVer =?= 6) || (Target.OpSysMajorVer =?= 7)) && (OpSysName =!= "Debian") && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.HasFileTransfer)'
Execute the following commands to set
as required. (Note that when editing a job in the queue, these attribute names are
preceded by a plus ('
') sign.) A reminder: this means these jobs will flock! For now, you shouldn't use
if you don't want your jobs flocking.
$ condor_qedit 100234.5 MayUseAWS TRUE Set attribute "MayUseAWS" for 1 matching jobs. $ condor_qedit 100234.5 WantFlocking TRUE Set attribute "WantFlocking" for 1 matching jobs.
5 Cleaning Up (Optional)
rents for you from Amazon will, as we mentioned before, shut themselves down after the duration, or if they're idle for longer than the time-out. If your jobs all finish early, you can run (on
to immediately shut down all the resources you rented.