Software News

Breaking the GPU Bottleneck: How Distributed Computing is Expanding AI Training

May 12, 2026

Training complex AI models usually requires exclusive access to massive, centralized supercomputers. This rigid requirement has created a “compute divide” in the scientific community, locking many researchers out of high-level machine learning discovery.

To combat this, a collaborative research team at UW–Madison and the Morgridge Institute for Research is leveraging the NAIRR Pilot to prove that AI training doesn’t have to be centralized. Through their project, the team used distributed High Throughput Computing (dHTC) to break massive AI training tasks down into “small bites.”

Using software tools like HTCondor, they successfully distributed these fragmented workloads across a nationwide network of computing providers. By harnessing small, opportunistically scheduled pockets of available GPU time across 13 different sites, the team successfully trained highly complex models without any degradation in quality.

This innovative approach is dismantling the barrier to entry for machine learning, turning the nation’s collective computing power into a shared engine for scientific inquiry and ensuring the next great breakthrough can come from any researcher, anywhere.

Read more on the NAIRR Pilot website: https://nairrpilot.org/projects/highlights/breaking-gpu-bottleneck