glexec is a tool that provides a sudo-like capability in a grid environment. glexec takes an X.509 proxy and a command to run as inputs, and maps the proxy to a local identity (that is, a Unix UID), which it then uses to execute the command. See http://www.nikhef.nl/grid/lcaslcmaps/glexec/ for more information about glexec.
HTCondor supports using glexec on the execute side to provide a sort of alternative form of PrivSep (called GLEXEC_JOB mode). This is useful in a glide-in scenario for providing isolation between the glide-in and the jobs underneath it. Igor Sfiligoi uses HTCondor's glexec support in his glide-in setups for a number of HEP projects.
A previous implementation of glexec integration in HTCondor had the StartD using glexec to spawn the Starter (called GLEXEC_STARTER mode). This implementation had a number of drawbacks and is now deprecated. The rest of this document will focus on details of the GLEXEC_JOB implementation. Specifically, changes made to the Starter and the ProcD to support GLEXEC_JOB mode will be described.
In the Starter, there is a class called GlexecPrivSepHelper that encapsulates the use of glexec and behaves much in the same way as the CondorPrivSepHelper class does for PrivSep . Although GLEXEC_JOB mode works similarly to PrivSep mode, there are some important differences that stem from the fact that with glexec we can only execute things as the user and not take any other root-enabled actions (chowning the sandbox, for example).
GlexecPrivSepHelper is really just an interface to a set of helper shell scripts that do the heavy lifting. These helper scripts are:
condor_glexec_setup. This script is used in place of the
chown-the-sandbox-from-condor-to-the-user operation. Since we can't chown, the sandbox is actually copied to achieve a similar effect. After this script runs, there are two directories. dir_<pid> is the job sandbox and is owned by the user at this point. dir_<pid>.condor is owned by the Condor user and contains a copy of the job's proxy. This is important since calls to glexec require the proxy as input so we need a keep a condor-owned copy that the job can't mess with.
condor_glexec_run. This is used when it comes time to run the job. The main things this script does are set up the job's standard I/O and use the condor_glexec_job_wrapper program to pass the job its environment. (glexec provides no easy way to do this so we have to play some tricks. Check out glexec_job_wrapper.cpp for the gory details.)
- condor_glexec_cleanup. This is used in place of the PrivSep chown-the-sandbox-from-the-user-back-to-condor operation. It essentially just undoes what condor_glexec_setup did before executing the job.
The ProcD required one pretty simple change to support GLEXEC_JOB mode. After the Starter registers a process family, it additionally sends over the name of the file that contains the job's proxy. This indicates to the ProcD that it cannot use the kill system call directly on processes in the new family. Instead, it will use the given proxy filename as an argument to the last of the helper scripts:
- condor_glexec_kill. This simply uses glexec to run /bin/kill with the arguments given by the caller.
glexec used to be a real pain to set up for testing purposes though it is much easier now that it is a component of the VDT. In any case, there is a CentOS 3 VM that I made available in /scratch/glexec/VMware/mendota on nostos.cs.wisc.edu that has glexec installed in it. glexec is configured to map every proxy to the local account greg. The password for both the greg account and the root account is the same as the password for the CSL condor account (at least as of March 2009).