How to configure multiple negotiators
Known to work in HTCondor version: 8.5.4
This is a technique for increasing the scalability of the HTCondor negotiator. In very large pools (200,000 slots), the matchmaking process can take a long time. This can become a problem for glidein pools, where it's important for an unclaimed machine to be matched quickly (glideins will often exit if unclaimed for more than 10 minutes).
The idea is to divide the machine ads between multiple negotiator daemons, which can do matchmaking in parallel. The negotiators communicate with all schedds in the pool, but consider only a subset of the machine ads.
The division of machines is by defining an expression for each negotiator that evaluates to true for machines it should manage. You need to write these expressions carefully so that all machines will match to one and only one negotiator.
One drawback to this configuration is that each negotiator does its own accounting, so fair-share and quotas operate independently for each portion of the pool.
# Setup alternate negotiators so we can partition the pool. # We run one negotiator for T2_US_*, one for T1_*/T2_CH_CERN, and one for everyone else. # Right now, this is as close as we have to splitting the pool into thirds (the "everyone else" # negotiator is still a bit bigger). # # Define the new negotiator daemons; add them to the daemon list: NEGOTIATORT1 = $(NEGOTIATOR) NEGOTIATORT1_ARGS = -f -local-name NEGOTIATORT1 NEGOTIATOR.NEGOTIATORT1.NEGOTIATOR_LOG = $(LOG)/NegotiatorT1Log NEGOTIATOR.NEGOTIATORT1.MATCH_LOG = $(LOG)/MatchT1Log NEGOTIATOR.NEGOTIATORT1.SPOOL = $(SPOOL)/negotiator_t1 NEGOTIATORUS = $(NEGOTIATOR) NEGOTIATORUS_ARGS = -f -local-name NEGOTIATORUS NEGOTIATOR.NEGOTIATORUS.NEGOTIATOR_LOG = $(LOG)/NegotiatorUSLog NEGOTIATOR.NEGOTIATORUS.MATCH_LOG = $(LOG)/MatchUSLog NEGOTIATOR.NEGOTIATORUS.SPOOL = $(SPOOL)/negotiator_us VALID_SPOOL_FILES=$(VALID_SPOOL_FILES), negotiator_us, negotiator_t1 DAEMON_LIST = $(DAEMON_LIST), NEGOTIATORT1, NEGOTIATORUS DC_DAEMON_LIST =+ NEGOTIATORT1, NEGOTIATORUS # Make sure each negotiator considers the right set of slots. NEGOTIATOR_SLOT_CONSTRAINT=!regexp("^(T1_|T2_US_|T2_CH_CERN)", GLIDEIN_CMSSite) NEGOTIATOR.NEGOTIATORT1.NEGOTIATOR_SLOT_CONSTRAINT=regexp("^(T1_|T2_CH_CERN)", GLIDEIN_CMSSite) NEGOTIATOR.NEGOTIATORUS.NEGOTIATOR_SLOT_CONSTRAINT=regexp("^T2_US_", GLIDEIN_CMSSite)
If using HAD, you also need to set up multiple HAD and Replication daemons.
HADT1 = $(HAD) HADT1_ARGS = -f -local-name HADT1 -p 51451 HADT1_ENVIRONMENT = "_CONDOR_HAD_PORT=51451 _CONDOR_REPLICATION_PORT=41451" HAD.HADT1.HAD_LOG = $(LOG)/HADT1Log HAD.HADT1.HAD_CONTROLLEE = NEGOTIATORT1 MASTER_NEGOTIATORT1_CONTROLLER = HADT1 HADUS = $(HAD) HADUS_ARGS = -f -local-name HADUS -p 51452 HADUS_ENVIRONMENT = "_CONDOR_HAD_PORT=51452 _CONDOR_REPLICATION_PORT=41452" HAD.HADUS.HAD_LOG = $(LOG)/HADUSLog HAD.HADUS.HAD_CONTROLLEE = NEGOTIATORUS MASTER_NEGOTIATORUS_CONTROLLER = HADUS REPLICATIONT1 = $(REPLICATION) REPLICATIONT1_ARGS = -f -local-name REPLICATIONT1 -p 41451 REPLICATIONT1_ENVIRONMENT = "_CONDOR_REPLICATION_PORT=41451 _CONDOR_SPOOL=$(SPOOL)/negotiator_t1" REPLICATION.REPLICATIONT1.REPLICATION_LOG = $(LOG)/ReplicationT1Log REPLICATIONUS = $(REPLICATION) REPLICATIONUS_ARGS = -f -local-name REPLICATIONUS -p 41452 REPLICATIONUS_ENVIRONMENT = "_CONDOR_REPLICATION_PORT=41452 _CONDOR_SPOOL=$(SPOOL)/negotiator_us" REPLICATION.REPLICATIONUS.REPLICATION_LOG = $(LOG)/ReplicationUSLog DC_DAEMON_LIST =+ HADT1, HADUS, REPLICATIONT1, REPLICATIONUS DAEMON_LIST = $(DAEMON_LIST), HADT1, HADUS, REPLICATIONT1, REPLICATIONUS