MondayIntroduction and Guidelines
Speaker(s): Mark Coatsworth ( University of Wisconsin )
Speaker(s): Dave O'Connor ( UW Medical Foundation Professor of Pathology and Laboratory Medicine and UW-Madison ) Welcome to HTCondor Week 2021
Speaker(s): Miron Livny ( UW-Madison CHTC ) What's New? What's Improved?
Speaker(s): Todd Tannenbaum ( University of Wisconsin )
Speaker(s): Greg Thain ( UW-Madison CHTC ) Introducing the HTCondor 9.0 Series (Users)
Speaker(s): Christina Koch ( UW Madison )
Speaker(s): Greg Thain ( Center for High Throughput Computing ) Security in HTCondor 9.0
Speaker(s): Brian Bockelman ( Morgridge Institute for Research ) Town Hall Discussion: Authorization and Identity
Speaker(s): Miron Livny ( UW-Madison CHTC ) Brian Bockelman ( Morgridge Institute for Research ) Jim Basney ( University of Illinois Urbana-Champagne ) Frank Würthwein ( UCSD / Open Science Grid ) Jeny Teheran ( FermiLab )
TuesdayCampus Research and Facilitation
Speaker(s): Lauren Michael ( UW-Madison CHTC )
Speaker(s): Christina Koch ( UW-Madison CHTC ) Running COPASI biochemical simulations with HTCondor
COPASI is a widely used simulator for chemical and
biochemical reaction networks based on ordinary differential equations
or stochastic methods. It includes various analysis methods such as
optimization, parameter estimation, sensitivity analysis, and several
others. While COPASI is mostly used in a standalone GUI-based mode,
several compute-intesive tasks benefit from parallelization. We created
a web-based system which facilitates transforming such tasks into
smaller sub-tasks that can be run independently. This system then allows
the user to submit these tasks to HTCondor from the web interface, and
assembles the numerical results in their expected order. Thus the end
user never has to interact directly with HTCondor.
Speaker(s): Clark Gaylord ( George Washington University )
The US Geological Survey (USGS) is currently leading a horizon scanning for new invasive species for the United States (US). This horizon scan is using a climate match to assess how climate in potential invasive species’ non-US range matches the climate in different parts of the US. We developed a high-throughput assessment using HTCondor to examine 8,000+ species. We will describe our workflow and how we created an R package, used Docker, and HTCondor for the assessment and provide suggestions for other people wanting to use R with HTCondor.Speaker(s): Richard Erickson ( USGS ) Building better tools with the help of the Open Science Grid
Speaker(s): Nick Cooley ( University of Pittsburg )
The landscape of computing power available for the CMS experiment is already evolving from almost exclusively x86 processors, predominantly deployed at WLCG sites, towards a more diverse mixture of Grid, HPC and Cloud facilities, incorporating a higher fraction of non-CPU components, such as GPUs. The CMS Global Pool is consequently adapting to the heterogeneous resource scenario, aiming at making the new resource types available to CMS. An optimal level of granularity in their description and matchmaking strategy will be essential in order to ensure efficient allocation and matchmaking to CMS workflows. Current uncertainties involve what types of resources will be available in the future, how to prioritize diverse workflows to those diverse types, and how to deal with a diversity of policy preferences by the resource providers. This contribution will describe the present capabilities of the CMS Submission Infrastructure and its critical dependencies on the underlying tools (such as HTCondor and GlideinWMS), along with its necessary evolution towards a full integration and support of heterogeneous resources according to the CMS needs.Speaker(s): Marco Mascheroni ( CERN )
Speaker(s): Michael Thomas ( LIGO ) HTCondor in a Digitization Workflow : Helping Preserve Cultural Heritage
Digitization is an important aspect of the preservation and promotion of heritage materials. Once physical documents are too fragile or damaged to manipulate, the digital copy often becomes the only version that is available to the public. The digitization workflow must produce files that reliably meet high standards.
The combination of cycle scavenging and distributed computing of HTCondor allows the digital collections team to complete tasks faster with a small pool of 50 available workstations. The team submits projects to HTCondor through a web server that automatically prepares the submit file and input list.
Each task launches a Java application that handles file verification and executes tools such as Tesseract (optical character recognition), FFmpeg (audio, video file conversion) or ImageMagick (image conversion). Once the project is complete, the web server prepares a report using custom exit codes and informs the owner.
After being processed through HTCondor, the files are ready to be preserved for future generations. The projects for which the institution has dissemination rights then become available through our web platform : https://numerique.banq.qc.ca/patrimoine/
WednesdayUpgrading to HTCondor 9.0
Speaker(s): Todd Miller ( CHTC )
Dropbox-driven workflows, where the appearance of new files in a given directory triggers work to be done on those inputs, are common in many contexts. Customarily these are implemented with cron jobs, or a service daemon in the system. The HTCondor platform has a number of features, such as built-in e-mail notifications, “crondor” for repeating jobs, and a well-conceived model of jobs and resources, which make building Dropbox workflows easier, and the result far more manageable.
The techniques I will describe were developed during 2020 to support an automated AI-driven visual and x-ray inspection process for silicon wafer and other component production which delivered $50 million worth of benefits, by reducing SME work and improving product quality and manufacturing yield as the data gathered was fed back into the component design, and was recognized in a prestigious Raytheon Missiles & Defense CIO Award.
Speaker(s): Patrick Godwin ( LIGO )
Pegasus 5.0 is the latest stable release of Pegasus that was released in November 2020. A key highlight of this release, is a brand new Python3 based Pegasus API that allows users to compose workflows and to control their execution programmatically. This talk will give an overview of the new API and highlight various key improvements introduced that address system usability (including a comprehensive, yet easy-to-navigate documentation, and training), and the development of core functionalities for improving the management and processing of large, distributed data sets, and the management of experiment campaigns defined as ensembles.Speaker(s): Karan Vahi ( Pegasus Team - USC ) Office Hours
Speaker(s): Town Hall Discussion: Multiple GPU Jobs
Speaker(s): David Schultz ( UW-Madison WIPAC ) John Knoeller ( University of Wisconsin, Madison ) Josh Willis ( LIGO ) Todd Miller ( UW-Madison CHTC ) Pegasus Tutorial
Speaker(s): Karan Vahi ( Pegasus - Team USC Information Sciences Institute )
ThursdaydHTC for LHAASO Experiments
Speaker(s): Jingyan Shi ( IHEP )
JupyterLab has become an increasingly popular platform for rapid prototyping, teaching algorithms or sharing small analyses in a self-documenting manner.
However, it is commonly operated using dedicated cloud-like infrastructures (e.g. Kubernetes) which often need to be maintained in addition to existing HTC systems. Furthermore, federation of resources or opportunistic usage are not possible due to a requirement of direct inbound connectivity to the execute nodes.
This talk presents a new, open development in the context of the JupyterHub batchspawner:
Extending the existing functionality to leverage the connection broker of the HTCondor batch system, the requirement for inbound connectivity to the execute nodes can be dropped, and only outbound connectivity to the Hub is needed.
Combined with a container runtime leveraging user namespaces, unprivileged CVMFS and the HTCondor file transfer mechanism, notebooks can not only be executed directly on existing local HTC systems, but also on opportunistically usable resources such as HPC centres or clouds via an overlay batch system.
The presented prototype paves the way towards a federation of heterogeneous and distributed resources behind a single point of entry.
Kubernetes is an open source cluster orchestration system whose popularity stems in part because it acts as a standard resource management interface across cloud providers and on-premises data centers. There is significant interest in managing HTCondor services and scheduling user jobs in Kubernetes clusters. These solutions often rely on running standard HTCondor daemons inside a container or developing custom Kubernetes operators to bridge the two services. Originally designed by Google, it remains a major contributor to Kubernetes which is now governed by the Cloud Native Computing Foundation. We will describe recent (1.21) and planned (1.22+) contributions to improve direct support for batch scheduling of high throughput and parallel jobs as well as developments in our Google Kubernetes Engine product, which offers Kubernetes clusters with reduced management overhead.Speaker(s): Abdullah Gharaibeh ( Google Cloud )
Speaker(s): Brian Lin ( UW-Madison CHTC ) John (TJ) Knoeller ( UW-Madison CHTC )
Speaker(s): Joao Dorea ( UW-Madison Animal & Dairy Sciences ) Scaling Virtual Screening to Ultra-Large Virtual Chemical Libraries
Progress in chemical synthesis strategies has given rise to vast “make-on-demand” chemical libraries. Such libraries, now virtual, are bounded only by synthetic feasibility and are growing exponentially. Making and testing significant portions of such libraries on a new drug target is not feasible. We increasingly rely on computational approaches called virtual screening methods to help us navigate large chemical spaces and to prioritize the most promising molecules for testing. The main challenge now is to scale existing virtual screening methods, or develop new ones, with sufficient molecule throughput and scoring accuracy to accommodate ultra-large compound libraries. Here I will describe some promising approaches that leverage high-throughput computing to meet this challenge.Speaker(s): Spencer Ericksen ( UW-Carbone Cancer Center, Drug Development Core, Small Molecule Screening Facility )
Speaker(s): Gaylen Fronk ( UW-Madison Addiction Research Center )
Speaker(s): Benedikt Riedel ( UW-Madison WIPAC ) Closing Remarks
Speaker(s): Miron Livny ( UW-Madison CHTC )