Group Quota Design
??? What's some good use cases here What didn't the old code do that the new code can?
Some questions we'd like customer use cases to address:
What is the semantic of accounting group quota?
- That is: what does a group quota regulate/limit?
- What is the 'unit' associated with a quota?
What does it mean for groups to be in a hierarchy?
- How does a parent's quota relate to child quotas?
- How do 'sibling' groups relate to each other, their parent, and their children (if any)?
High Level Design and Definitions
The HGQ design is intended to allow administrator to restrict the aggregrate number of slots running jobs submitted by groups of users.
These sets of users are organized into hierarchical groups, with the "none" group being the name of the root. The admin is expected to assign a quota to every leaf and interior node in the tree, except for the root. The assigned quotas can be absolute numbers or a floating point number from 0 to 1, which represents a percentage of the immediate parent. If absolute, it represents a weighted number of slots, where the each slot is multiplied by a configurable weight, which defaults to number of cores. All groups named must be predeclared in the config file. Note the quota is independent of user priority.
Can we get crisp definitions of each of the fields in the GroupEntry structure?
Here is some annotation from the meeting on fields that didn't already have in-code doc:
// these are set from configuration string name; double config_quota; // Could be static (>=1) or dynamic (0<x<1) bool static_quota; // Flag for if config_quota is static or dynamic bool accept_surplus; // true if this group will accept surplus bool autoregroup; // true if will participate in autoregroup phase // current usage information coming into this negotiation cycle double usage; // accountant's value for usage under thi sgroup ClassAdListDoesNotDeleteAds* submitterAds; // list of submitter ads under this group double priority; // group's priority from acct
Meaning of quota for "static" quota:
The static quota for a given group indicates the minimum number of machines/slots that group is expected to be allocated, given sufficient demand. The sum of the static quota for all the children nodes of any given parent must be less than or equal to the parent's static quota.
The sum of the children's static quota may be less than the parent. If so, the remainder is assigned to the parent.
For dynamic (proportional) quota
A dynamic (proportional) quota indicates the percentage of the parent's node resources the group is expected to be allocated, given sufficient demand. If the children of a node have proportional quota, each node then is assigned an absolute quota based on the proportion assigned to their parent's node.
The sum of all the sibling quota should be <= 1.0. (if not, they are normalized to 1 with a warning message)
Each job then specifies what group it should be in with the "+AccountingGroup = "group_name.username" syntax. See also: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2728
quota terminologyNote: The term "quota" is overloaded. Sometimes in the code and documentation, it means "the amount assigned by the administrator to a group" (entry->config_quota). It may also be the value translated from configured quota to actual (possibly weighted) slot quantity (entry->quota). The quantity finally assigned to a group, after quota computation and surplus sharing and fractional-quota distribution, is referred to as 'allocated' (entry->allocated).
First, the code builds up a data structure which describes each group, it's position in the tree, the administratively configured quota, whether it is static or dynamic quota, whether this group accepts_surplus or autoregroup. For each group, the current weighted usage is fetched from the accountant, as is the current userprio. The number of running and idle jobs is copied from the submitter ad from each submitter, and summed into the corresponding group structure. Note that the number of running jobs also includes jobs running in flocked-to pools. Each group also contains a list of all the related submitter ads.
If autoregroup is on, the submitters are also appended to the root's list of submitter ads.
After (weighted) slot quotas are assigned to all the group entries, surplus sharing is computed for all groups in the hierarchy configured to accept surplus. Following surplus sharing, when slot weighting is not enabled, any fractional quota allocations are consolidated and distributed in a round robin fashion.
Surplus SharingThe primary purpose of surplus sharing is to allow group quotas to "float" locally based on demand. For example, if one configures group A, A.B, and A.C, where gropup A does not share surplus, but A.B and A.C do share surplus, then A.B and A.C can float against each other, while maintaining the constraint that quota(A.B) + quota(A.C) <= quota(A). Surplus quota is always shared at the lowest possible level before being passed upwards.
The basic principle for surplus sharing is: surplus quota is distributed among sibling groups in proportion to assigned quota. For example, if group A has twice the quota of group B, group A will be awarded twice the surplus. Some additional points:
- available surplus consists of any surplus shared from the level above in the hierarchy, plus any surplus coming up from sibling sub-trees
- any groups with surplus sharing not enabled do not participate in surplus distribution
- if a group does not need all of its potential surplus, any it does not use will be shared among remaining participating groups
- the parent group of siblings participates in sharing, effectively as another sibling
- any surplus unused after sharing among siblings (and parent) is sent up the hierarchy to be shared at the level above
Fractional Quota ConsolidationWhen slot weighting is not enabled, fractional quota values for groups are consolidated and distributed in round robin fashion to ensure that all quotas are integer values.
- available remainder for consolidation consists of remainder coming from upper level in hierarchy, combined with any remainder coming up from sibling subtrees
- remainders are not accepted by groups not accepting surplus
- siblings having received remainder least recently are favored in round robin - siblings are ordered by time of last receipt of a remainder
- remainder unused at a level is sent up to parent
Allocation rounds are a method to address the scenario where jobs submitted under an accounting group do not satisfy mutual job/slot requirements for enough slots to achieve their quota. When GROUP_QUOTA_MAX_ALLOCATION_ROUNDS > 1, then each group that has not met its allocated quota has its 'requested' value re-set to be equal to whatever its current (weighted) usage is. (i.e. it is assumed that no further jobs under that group will match slots until next negotiation cycle). This frees up the unused quota for other groups that may be able to use it as surplus.
The following steps are iterated GROUP_QUOTA_MAX_ALLOCATION_ROUNDS times:
- (starting after 1st round) re-set 'requested' values to current usage
- (re)compute quota allocations
- allow all groups to renegotiate
Round Robin RateRound robin rate is a method to address the 'overlapping effective pool' problem: this is a scenario where the jobs in two or more accounting groups are in fact competing for a subset of the total available resources. For example, if a pool has 100 linux machines and 100 windows machines, and 200 jobs from 2 accounting groups are competing only for the linux machines. Without intervention, the first group to negotiate can acquire all 100 linux machines and starve the 2nd group.
To address this problem, there is a loop around negotiation that operates like so:
- (initialize all quota limits at zero)
- increase each quota limit by the round robin rate (up to allocated quota)
- run negotiation with those limits
Round robin rate is convigured via: GROUP_QUOTA_ROUND_ROBIN_RATE, which defaults to "infinity", which emulates legacy behavior.
(note: There is some interest in developing alternative approaches to allocation rounds and round robin rate that require fewer nested loops on top of basic negotiation)
accounting group negotiation orderwe sort the submitters in "starvation order", by GROUP_SORT_EXPR, defaults to the ratio of current group usage / configured group quota
Finally, we negotiate with each group in that order, with a quota limited as calculated above.
How common is it to have demand (submitters) in interior nodes?
- some downstream customers are known to be interested in jobs submitted against interior nodes
- What about non-homogenous pools?
Is there a way to do this without relying on the submitter ad's # of idle/running jobs?
- There may be alternative approaches to surplus-sharing to address this behavior, but it is an open question
- How should this behave in the face of flocking?
- I have a few thoughts on how weighted slots should be thought about here:
- and here: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3435