Igor Stress Test

We have found it useful to stress-test a release candidate by giving it to Igor Sfiligoi to run in glideinWMS . This exercises a lot of HTCondor features that are important to OSG-type users: glidein, HTCondor-G, CE schedd, CCB. Igor suggested that it would be a good idea of we could run this test in-house, and we agree, so we have set up the machinery to do so. This document describes roughly how we installed it.

High-level View

  • ghost-pool - runs light-weight jobs on cluster nodes. The setup is in /unsup/condor/ghost-pool. Some nodes in CHTC also join the ghost pool. These are configured in the CHTC cfengine.

  • CE: vdt-itb.cs.wisc.edu - an OSG CE that is part of the ITB (integration test bed). This node will be referred to as the CE. The schedd on this node is part of the ghost pool. The jobmanager is vdt-itb.cs.wisc.edu/jobmanager-nfslitecondor.

  • factory: c157.cs.wisc.edu - the glideinWMS factory, installed in /data1/itb-stress. It submits glideins to the CE. This node is part of the GLOW pool (despite its domain name). It normally runs HTCondor jobs. Turn off HTCondor when doing a large-scale test.

  • submit: c158.cs.wisc.edu - the glideinWMS VO frontend, user job submission node, and collector tree for the glidein pool, all installed in /data1/itb-stress. This node normally runs HTCondor jobs. Turn off HTCondor when doing a large-scale test.

glidein factory setup

Go to c157.chtc.wisc.edu. Install the factory in /data1/itb-stress. Where an unprivileged user account is needed, we are using zmiller.

Documentation for the glideinWMS factory may be found here: http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/doc.v2/install/factory_install.html

cd /data1/itb-stress
cvs -d :pserver:anonymous@cdcvs.fnal.gov:/cvs/cd_read_only co -r snapshot_091214 glideinWMS

perl-Time-HiRes appears to be already installed as part of the main perl install

httpd was already installed, so just enabled it:
/sbin/chkconfig --level 3  httpd on
/etc/init.d/httpd start

wget http://atlas.bu.edu/~youssef/pacman/sample_cache/tarballs/pacman-3.28.tar.gz
tar xvzf pacman-3.28.tar.gz
cd pacman-3.28
. ./setup.sh
cd ..

mkdir osg_1_2_4
cd osg_1_2_4/

export VDTSETUP_CONDOR_CONFIG=/data1/itb-stress/personal-condor/etc/condor_config
export VDTSETUP_CONDOR_BIN=/data1/itb-stress/personal-condor/bin

pacman -get http://software.grid.iu.edu/osg-1.2:client
. ./setup.sh

ln -s /etc/grid-security/certificates globus/TRUSTED_CA

cd ..

yum install rrdtool-python


M2Crypto python module appears to be already installed
  m2crypto-0.16-6.el5.3.x86_64
  This is older than Igor's requirements 0.17, but I am trying it to see if it works.

Download javascriptrrd-0.4.2.zip and unzip it.

Download flot-0.5.tar.gz and untar it.

I made a service cert for glideinwms and installed it in certs.  (I used the following command.)
cert-gridadmin -email dan@hep.wisc.edu -affiliation osg -vo glow -host c157.chtc.wisc.edu -service glideinwms

A proxy for the factory may be made with this command:
grid-proxy-init -cert certs/c157.chtc.wisc.edu_glideinwms_cert.pem -key certs/c157.chtc.wisc.edu_glideinwms_key.pem -hours 100 -out certs/c157.chtc.wisc.edu_proxy.pem


cd glideinWMS/install
chown zmiller /data1/itb-stress
su zmiller
./glideinWMS_install


choose [4] pool Collector
Where do you have the HTCondor tarball? /afs/cs.wisc.edu/p/condor/public/binaries/v7.4/7.4.1/condor-7.4.1-linux-x86_64-rhel5-dynamic.tar.gz

Where do you want to install it?: [/home/zmiller/glidecondor] /data1/itb-stress/glidecondor
Where are the trusted CAs installed?: [/data1/itb-stress/osg_1_2_4/globus/TRUSTED_CA] /etc/grid-security/certificates


Will you be using a proxy or a cert? (proxy/cert) cert
Where is your certificate located?: /data1/itb-stress/certs/c157.chtc.wisc.edu_glideinwms_cert.pem
Where is your certificate key located?: /data1/itb-stress/certs/c157.chtc.wisc.edu_glideinwms_key.pem


What name would you like to use for this pool?: [My pool] ITB-Stress
How many secondary schedds do you want?: [9]

In this version of glideinWMS, the condor_config.local needs to be edited
to replace HOSTALLOW with ALLOW.


./glideinWMS_install
[2] Glidein Factory

Where is your proxy located?: /data1/itb-stress/certs/c157.chtc.wisc.edu_proxy.pem
Where will you host your config and log files?: [/home/zmiller/glideinsubmit] /data1/itb-stress/glideinsubmit

Give a name to this Glidein Factory?: [mySites-c157] ITB-Stress

Do you want to use CCB (requires HTCondor 7.3.0 or better)?: (y/n) y
Do you want to use gLExec?: (y/n) n
Force VO frontend to provide its own proxy?: (y/n) [y]
Do you want to fetch entries from RESS?: (y/n) [n]
Do you want to fetch entries from BDII?: (y/n) [n]

Entry name (leave empty when finished): ITB
Gatekeeper for 'ITB': vdt-itb.cs.wisc.edu/jobmanager-condornfslite
RSL for 'ITB':
Work dir for 'ITB':
Site name for 'ITB': [ITB]

Should glideins use the more efficient Match authentication (works for HTCondor v7.1.3 and later)?: (y/n) y
Do you want to create the glidein (as opposed to just the config file)?: (y/n) [n]

/data1/itb-stress/glideinWMS/creation/create_glidein /data1/itb-stress/glideinsubmit/glidein_v1_0.cfg/glideinWMS.xml

Submit files can be found in /data1/itb-stress/glideinsubmit/glidein_v1_0
Support files are in /var/www/html/glidefactory/stage/glidein_v1_0
Monitoring files are in /var/www/html/glidefactory/monitor/glidein_v1_0

cd /data1/itb-stress
# make setup.sh set X509_CERT_DIR and X509_USER_PROXY
. ./setup.sh
glideinsubmit/glidein_v1_0/factory_startup start


I added the following entries to glidecondor/certs/condor_mapfile:
GSI "/DC=org/DC=doegrids/OU=Services/CN=glideinwms/c158.chtc.wisc.edu" vofrontend
FS zmiller gfactory

The FS mapping of zmiller to gfactory appears to be necessary to make the factory
ad have AuthenticatedIdentity gfactory, which is what I configured the VO frontend
to expect.  Not sure why it is using FS to authenticate to the collector though.

Initially, the glideins were failing on startup because the -dir argument to glidein_startup.sh
had no argument.  We edited the factory xml file and set the directory to '.':

bash-3.2$ vi ../glidein_v1_0.cfg/glideinWMS.xml
bash-3.2$ ./factory_startup reconfig ../glidein_v1_0.cfg/glideinWMS.xml
Shutting down glideinWMS factory v1_0@ITB-Stress:          [OK]
Reconfigured glidein 'v1_0'
Active entries are:  ITB
Submit files are in /data1/itb-stress/glideinsubmit/glidein_v1_0
Reconfiguring the factory                                  [OK]
Starting glideinWMS factory v1_0@ITB-Stress:               [OK]


Then the glideins failed to run because glidein_startup.sh could not find the
worker node client software, because the ITB gatekeeper does not point this at
a shared location.  I added a few lines to the top of /data1/itb-stress
/glideinsubmit/glidein_v1_0/glidein_submit.sh to hack around this problem:

#Dan hack:
if ! [ -f "${OSG_GRID}" ]; then
  OSG_GRID=/afs/hep.wisc.edu/osg/wnclient
fi

I had to remove the gass cache on the CE to make this change take effect!

Then there was a problem with the glidein HTCondor binaries being incompatible with the SL4 machines in CHTC.  To change the binaries used by the glidein:

I unpackd the desired HTCondor tarball and named it
/data1/itb-stress/condor_versions/7.4.1_x86_rhel3.  I then edited
glideinsubmit/glidein_v1_0.cfg/glideinWMS.xml.  Remove the existing entry in
condor_tarballs and replace with this:

<condor_tarball arch="default" os="default" base_dir="/data1/itb-stress/condor_versions/7.4.1_x86_rhel3" />

Then reconfig the factory:

glideinsubmit/glidein_v1_0
./factory_startup reconfig ../glidein_v1_0.cfg/glideinWMS.xml

Note that this will slightly rewrite the entry we just added to glideinWMS.xml
to insert the tar_file that it generates.

submit node setup

The submit node is c158.chtc.wisc.edu.

The documentation for setting up the glideinWMS VO frontend and pool collector are here: http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/doc.v2/install/

This node will run the glidein pool collector, the schedd that submits jobs, and the
glideinWMS VO frontend.

Install all the same glideinWMS prereqs as on c057.

I made a service cert for glideinwms and installed it in certs.  (I used the following command.)
cert-gridadmin -email dan@hep.wisc.edu -affiliation osg -vo glow -host c158.chtc.wisc.edu -service glideinwms

To create the proxy used by the VO frontend:
grid-proxy-init -cert certs/c158.chtc.wisc.edu_glideinwms_cert.pem -key certs/c158.chtc.wisc.edu_glideinwms_key.pem -hours 100 -out certs/c158.chtc.wisc.edu_proxy.pem

cd glideinWMS/install/
./glideinWMS_install

[4] User Pool Collector
[5] User Schedd
Please select: 4,5

Where do you have the HTCondor tarball? /afs/cs.wisc.edu/p/condor/public/binaries/v7.4/7.4.1/condor-7.4.1-linux-x86_64-rhel5-dynamic.tar.gz

Where do you want to install it?: [/home/zmiller/glidecondor] /data1/itb-stress/glidecondor

Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y

Do you want to get it from VDT?: (y/n) n
Where are the trusted CAs installed?: [/etc/grid-security/certificates]

Where is your certificate located?: /data1/itb-stress/certs/c158.chtc.wisc.edu_glideinwms_cert.pem
Where is your certificate key located?: /data1/itb-stress/certs/c158.chtc.wisc.edu_glideinwms_key.pem

How many slave collectors do you want?: [5] 50
What name would you like to use for this pool?: [My pool] ITB-Stress-Submit
What port should the collector be running?: [9618]

(I had to hand edit condor_config.local to change HOSTALLOW --> ALLOW)

Installing user submit schedds

Do you want to use the more efficient Match authentication (works for HTCondor v7.1.3 and later)?: (y/n) y
How many secondary schedds do you want?: [9]

./glideinWMS_install
[7] VO Frontend

Where is javascriptRRD installed?: /data1/itb-stress/javascriptrrd-0.4.2

Where is your proxy located?: /data1/itb-stress/certs/c158.chtc.wisc.edu_proxy.pem

For security reasons, we need to know what will the WMS collector map us to.
It will likely be something like joe@collector1.myorg.com
What is the mapped name?: vofrontend@c157.chtc.wisc.edu
Where will you host your config and log files?: [/home/zmiller/frontstage] /data1/itb-stress/frontstage

Where will the web data be hosted?: [/var/www/html/vofrontend]

What node is the WMS collector running?: c157.chtc.wisc.edu
What is the classad identity of the glidein factory?: [gfactory@c157.chtc.wisc.edu]

Collector name(s): [c158.chtc.wisc.edu:9618]c158.chtc.wisc.edu:9618c158.chtc.wisc.edu:9620-9669

What kind of jobs do you want to monitor?: [(JobUniverse==5)&&(GLIDEIN_Is_Monitor =!= TRUE)&&(JOB_Is_Monitor =!= TRUE)]

The following schedds have been found:
 [1] c158.chtc.wisc.edu
 [2] schedd_jobs1@c158.chtc.wisc.edu
 [3] schedd_jobs2@c158.chtc.wisc.edu
 [4] schedd_jobs3@c158.chtc.wisc.edu
 [5] schedd_jobs4@c158.chtc.wisc.edu
 [6] schedd_jobs5@c158.chtc.wisc.edu
 [7] schedd_jobs6@c158.chtc.wisc.edu
 [8] schedd_jobs7@c158.chtc.wisc.edu
 [9] schedd_jobs8@c158.chtc.wisc.edu
 [10] schedd_jobs9@c158.chtc.wisc.edu
Do you want to monitor all of them?: (y/n) y

What kind of jobs do you want to monitor?: [(JobUniverse==5)&&(GLIDEIN_Is_Monitor =!= TRUE)&&(JOB_Is_Monitor =!= TRUE)]
Give a name to the main group: [main]
Match string: [True]

VO frontend proxy = '/data1/itb-stress/certs/c158.chtc.wisc.edu_proxy.pem'
Do you want to use it to submit glideins: (y/n) [y]


/data1/itb-stress/glideinWMS/creation/create_frontend /data1/itb-stress/frontstage/instance_v1_0.cfg/frontend.xml
Work files can be found in /data1/itb-stress/frontstage/frontend_myVO-c158-v1_0
Support files are in /var/www/html/vofrontend/stage/frontend_myVO-c158-v1_0
Monitoring files are in /var/www/html/vofrontend/monitor/frontend_myVO-c158-v1_0

Added /DC=org/DC=doegrids/OU=Services/CN=glideinwms/c157.chtc.wisc.edu to GSI_DAEMON_NAME.

# To start it up:
cd frontstage/frontend_myVO-c158-v1_0/
./frontend_startup start

# Logs are in
frontstage/frontend_myVO-c158-v1_0/group_main/log
frontstage/frontend_myVO-c158-v1_0/log



setting up CE

The CE on vdt-itb.cs.wisc.edu was already set up except for a HTCondor jobmanager.

SET UP HTCondor

1) installed condor-rhel5.repo into /etc/yum.repo.d

2) yum install condor

3) edit /etc/condor/condor_config?

4) edit /etc/condor/condor_config.local
   change CONDOR_HOST = chopin.cs.wisc.edu
   change COLLECTOR_HOST = $(CONDOR_HOST):15396
   change DAEMON_LIST = MASTER, SCHEDD

5) export CONDOR_CONFIG=/etc/condor/condor_config

6) /usr/sbin/condor_master

okay, that's working.


SET UP JOBMANAGER

7) cd /opt/itb

8) DO NOT DO THIS: pacman -trust-all-caches -get http://software.grid.iu.edu/osg-1.1.14:Globus-Condor-Setup

9) do:
 [root@vdt-itb ~]# export CONDOR_CONFIG=/etc/condor/condor_config
 [root@vdt-itb ~]# export VDT_CONDOR_LOCATION=/usr
 [root@vdt-itb ~]# export VDT_CONDOR_CONFIG=/etc/condor/condor_config

10) cd /opt/itb/

11)   pacman -install http://vdt.cs.wisc.edu/vdt_2099_cache:Globus-CondorNFSLite-Setup

12)   $VDT_LOCATION/globus/lib/perl/Globus/GRAM/JobManager/condornfslite.pm
	update condor_config location, condor_rm, condor_submit

13) edit $VDT_LOCATION/edg/etc/grid-mapfile-local, add DNs
14) run $VDT_LOCATION/edg/sbin/edg-mkgridmap