PPT Slide 
Mechanisms for Matchmaking and Parallel High Throughput Computing in 
the Condor Distributed System
Rajesh Raman, raman@cs.wisc.edu
Todd Tannenbaum, tannenba@cs.wisc.edu
http://www.cs.wisc.edu/condor
 Condor Project 
- Overview
- What is Condor ?
- Projects  and Collaborations
- High Throughput Computing
- ClassAds and MatchMaking
- Parallel Computing with Condor
 
 What is Condor ? 
- High Throughput Computing
- Distributed Resources
- Physically distributed
-  Distributed ownership
 
-  Resource Management
- Increase utilization of resources
-  Simple interface to execution environment
- User level interface
-  Application level interface
 
 
 Important Mechanisms 
- Checkpointing (and migration)
- Owner policies require resource reclamation
-  Need to save (resumable) state of application 
 
- Remote System Calls
- Preserves submission environment in execution  environment.
 
 The Condor Team 
- Research Staff
- Todd Tannenbaum
- Derek Wright
- Adding 2 more...
 
 Condor Team, cont. 
- Graduate Students 
- Rajesh Raman (MatchMaking)
- Jim Basney (Split Execution)
- Shrinivas Ashwin (Mr. Parallel)
- Adiel Yoaz (Accounting)
 
 Condor Almuni 
- Mike Litzkow
- David Dewitt
- Marvin Solomon
- Many others… (Produced XXX Masters and XXX PhDs]
 Current Collaborators and Projects 
- UW-Flock
- Intel Sponsorship: $4.2 Million 
- Graduate School, Engineering
 
- metaNEOS: metacomputing environments for optimization
- with Prof. Michael Ferris
 
 Condor Pool Installations 
- Universites
- U of Wisconsin, U of Illinois, U of Michigan, Dartmouth, Duke, U of Washington, U of Virginia, U of California-Berkeley
 
- Government
- NCSA, Nasa, US Navy, NSA, NIKHEF (Amsterdam), INFN (Italy)
 
- Commercial
- Hewlett-Packard Labs, J.P. Morgan, Mercedez-Benz, Dragon Systems
 
 Power of Computing Environments 
- High Performance Computing
- Fixed amount of work; how much time?
-  Response time/latency oriented
- Traditional Performance metrics:  FLOPS, MIPS
 
-  High Throughput Computing
- Fixed amount of time; how much work?
-  Throughput oriented
-  Application specific performance metrics
 
 Distributed Ownership of Resources 
- Commodity resources
- Underutilized:  70% of a pool's cycles are not utilized
- Fragmented:  owned by different people
 
-  Can provide HTC with these cycles, BUT
- Must not impact QOS to owner
 
-  Owners specify access policy
- Expressed with control expressions
- The current state of the resource (e.g., load average)
-  Characteristics of the request (e.g., who wants to use it?)
-  Time of day, random numbers, etc
 
 
 Condor Architecture 
- Startds ( Represent owners of resources)
-  Implement owner's access control policy
 
- Schedds( Represent customers of the system)
-  Maintain persistent queues of resource requests
 
- Manager
-  Collector:  Database of resources
-  Negotiator: Matchmaker
-  Accountant: Priority maintenance
 
 Condor Architecture, cont. 
 Matchmaking 
- Customers 
- Require resources with certain characteristics
- Discriminating customers
- Requests place constraints on resources
 
-  Distributed ownership
- Resources service requests which match owner's policy
-  Discriminating resources
-  Resource offers place constraints on customers
 
 Matchmaking with Classified Advertisements 
- Parties requiring matchmaking advertise
- Characteristics and requirements (i.e., constraints)
 
- Advertisements matched by a Matchmaker
- Matched parties contact each other to "claim”
- Communication, authentication, constraint verification, negotiation of terms, etc.
-  Claiming does not involve the Matchmaker
 
- Method is symmetric
- No client/server relation imposed
 
 Classified Advertisement Matchmaking Framework 
- Expression and evaluation of characteristics
- ClassAd, Closure, EvaluationContext
 
-  Advertising Protocol
- Contents of advertisements
-  Publication protocol
 
-  Matchmaking Algorithm
- Relates ad contents to matching process 
- Priority schemes, Ranking schemes, etc.
 
 Classified Advertisement Matchmaking Framework (contd.) 
-  Matchmaking Protocol
- How are relevant parties informed of a successful match?
- What information are they given?
 
-  Claiming Protocol
- How do matched parties claim each other to cooperate?
 
 ClassAd:  Mechanism for expressing characteristics 
- A ClassAd is a set of names, each of which is bound to an expression.  e.g.,   [      Name => "Joe Hacker" ; Height => 182 ; Sex => "Male" ;      Disposition  => (TimeOfDay() < 600) ? "Sour" : UNDEFINED ;     Requirements => (other.Height < Height) && (other.Sex ==       	"Female")  ]
-  Expressions
- Constants, attribute references, function calls
 
 ClassAd (contd.) 
-  Attribute references may refer to attributes in other ads
- Attribute references "trigger" expression evaluation
- Scope resolution
- Evaluates to UNDEFINED if no such expression exists
 
-  Values 
- String, integer, real, UNDEFINED and ERROR types
- Operators are total (i.e., defined over all values)
 
 Closure:  Evaluation Environment for a ClassAd 
- Determines which ClassAd's attributes to lookup
-  Closure is
-  ClassAd and an ordered mapping of (scope-name, closure) pairs
- No name may be repeated
 
 EvaluationContext:  Evaluation Environment for several ClassAds 
-  A set of closures which is self-contained
- No closure reference leaves the context
-  Condor's "Standard Context" is a bit more complex
- Includes closures for a matchmaker "advertisement”
 
 
 Matchmaking in Condor 
- Opportunistic Resource Exploitation
- Resource availability is unpredictable
- Exploit resources as soon as they are available
-  Return resources as soon as they are unavailable
 
-  Matchmaking performed continuously
 
-  Attractive for malleable parallel applications
- Request more resources after execution commences
- Granted immediately if resources are available, or
- As soon as resources become available
 
 
 Matchmaking in Condor (contd.) 
- Advertising protocol
-  Startd's, Schedd's send classads to Collector
- Must contain a "Requirements” expression
- Optionally contain a"Rank” and “CurrentRank” expressions
 
- Startds send a "private ad" containing a capability
 
- Matchmaking protocol
- Give the matched Startd and Schedd the capability from the startd's private ad
 
 Matchmaking in Condor (contd.) 
- Matchmaking Algorithm
- Request ad A matched with offer ad B “iff”
- A's "Requirements" expression evaluates to TRUE, and
- B's "Requirements" expression evaluates to TRUE, and
- B‘s"Rank" expression value is greater than "CurrentRank", and
- A’s  "Rank" expression value is  its greatest when evaluated against B
 
 
- Claiming protocol
- Negotiate "heartbeat" frequency,  checkpoint transfer, etc.
 
 Condor Parallelism 
- Job Level
- Condor clusters of  processes
- DagMan
 
- Task Level
- Interfacing Condor and PVM
- PVM: Message Passing
- Condor: Resource Management
 
- PVM Resource Manager Interface
 
 Interfacing Condor and PVM, cont. 
 Interfacing Condor and PVM, cont. 
- CARMI -vs- PVM 
- Resource Requests
- PVM: Synchronous
- CARMI: Asynchronous
 
- Resource Request Mechanism
- PVM: Hostname and Type String
- CARMI: ClassAd
- Task Management
- CARMI: Additional Notifications
- CARMI: Additional Operations
 
- 
 
 
 Master-Worker Model 
- A good fit for an opportunistic environment
- Master
- Runs on Submit Machine
- Manages pool of tasks
 
- Worker
- Runs on remote machines
- Receives pieces of work from the Master, returns answer
 
 Additional Condor/PVM Frameworks 
- CoCheck
- Checkpoint a Worker or set of Workers
- Requirements for a consistent checkpoint
- Synchronize all processes
- Flush PVM messages in transit
- Perform Checkpoint (save image)
- Remap TIDs
 
 
- WoDi
- A framework for Master-Worker applications
- Performs optimizations
 
 Future Work 
 Future Work Part II 
- Matchmaking
- Aggregate Resources/Requests
 
 Summary 
- Condor is an implementation of a High Throughput Computing system in an opportunistic environment.
- Major Mechanisms to achieve HTC:
- Matchmaking
- Checkpointing
- Remote system calls
- Sandboxing