Lark Project

Lark

Introduction

Lark was a NSF-funded project for adding network-awareness to HTCondor's High Throughput Computing approach.

Broadly, it can be split into three major areas:

  • Advanced Network Testbed: A small testbed for HTCondor networking technologies. Consists of dedicated HTCondor pools at Wisconsin and Nebraska. This will serve as a "launch point" for Lark technologies onto the production clusters at the sites.
  • Network Monitoring: Integrating existing network monitoring tools (particularly, perfSONAR) into the HTCondor ecosystem. This will provide various HTCondor daemons with the ability to make decisions based on the observed network state.
  • Network Management: Have HTCondor actively alter the network layer based on its internal policies.

Results and Products Produced

Work on Lark started on Yr 2013 and completed in Yr 2015.

Read about more about Lark results via published research papers, such as

  • Zhe Zhang, Brian Bockelman, Dale Carder, and Todd Tannenbaum, "Lark: Bringing Network Awareness to High Throughput Computing", Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2015), Shenzhen, Guangdong, China, May 2015. [ PDF version ]

Lark produced code for a pluggable HTCondor contrib module , with plans to merge much of this code (currently on git branch V8_2-lark-branch) into production HTCondor during the v8.5 or v8.7 developer series.

Advanced Network Testbed (ANT)

The ANT consists of small HTCondor pools at Nebraska and Wisconsin. These are meant to test Lark technologies and harden existing advanced HTCondor network technologies.

Example ANT use cases include:

  • Verification of IPv6 support, especially of the flocking use case.
  • Testing of network accounting at several sites.
  • Testing dynamic VLAN creation between Nebraska and UW using the technology provided by the DYNES project .
  • Testing of multi-mode IPv4 / IPv6 functionality.

Personnel responsible: Garhan Attebury, Alan DeSmet

Network Monitoring

HTCondor is relatively uninformed of the underlying network conditions. For example, its queuing is based on the number of concurrent file transfers, regardless of whether the underlying network is 100Mbps or 100 Gbps.

The network monitoring task will work on gathering data from the perfSONAR boxes in the DYNES project and pushing them into a HTCondor collector.

From there, the schedd would be able to utilize this information to better adjust its concurrent file transfers.

Personnel responsible: Dan Bradley (schedd changes), Unassigned (perfSONAR integration)

Network Management

We are adding the ability for HTCondor to directly manipulate the network configuration for jobs.

In particular, a pair of network pipe devices is created for each job. Using Linux's network namespace feature, the job only can access one of the network pipe devices. The other end of the network pipe device is configured to utilize the host's network. By having HTCondor manage the external network pipe device's configuration, we manage every aspect of the job's network.

Lark's technologies allow for users to specify a desired network policy and the starter to execute the desired configuration. See the developer documentation below for more information about what can be configured.

Personnel responsible: Zhe Zhang, Brian Bockelman

PIVOT outreach

The PIVOT project is an outreach project based at UNL. It provides small HTC clusters for colleges in the state of Nebraska, along with training for students, faculty, and IT professionals on how to manage and utilize compute clusters for research computing.

Lark is providing advanced network training for those involved in the PIVOT project. In particular, we have the following goals:

  • Help provide expertise for adding IPv6 connectivity to the HTCondor clusters. Involve students in the process of actually using the IPv6 connectivity.
  • Deploy perfSONAR network monitoring as a part of the HTCondor clusters.
  • Utilize the HTCondor network management tools.

Personnel responsible: Carl Lundestedt, Brian Bockelman

Development Documentation

Attachments: