Photo by Toby Elliott on Unsplash

Photo by Toby Elliott on Unsplash

Application-aware batch scheduler

Eruption

Article from ADMIN 86/2025
By
Volcano optimizes high-performance workloads on Kubernetes to avoid deadlocks.

The basic Kubernetes scheduler – kube-scheduler – does a great job of bin-packing pods into nodes, but it can result in a scheduling deadlock for the more complicated multipod jobs that are created by analytics and artificial intelligence and machine learning (AI/ML) frameworks such as Apache Spark and PyTorch. That means expensive cluster resources, such as GPUs, sitting idle and unavailable to any workload.

Volcano scheduler is a Cloud Native Computing Foundation (CNCF) project that introduces the Queue and PodGroup custom resources to enable gang scheduling (i.e., the simultaneous scheduling of multiple related objects) and facilitate more efficient use of the cluster. Complex jobs run more reliably, and data engineers become more productive.

In this article, I demonstrate default Kubernetes scheduling behavior with the use of short-lived single-pod jobs, show how multipod jobs from Apache Spark and PyTorch can trigger a scheduling lock, and use Volcano to run the same jobs smoothly and predictably. The Git repository [1] gives full details of how to create a test Kubernetes cluster on Digital Ocean and to run all the examples.

Kubernetes for Analytics and ML

Kubernetes, often considered the operating system of the cloud, is often thought of in terms of distributed microservices – in other words, client-server applications with an indefinite lifespan, decomposed into smaller services components (e.g., database, business logic, web front end) for containerized deployment in a way that makes each part redundant, scalable, and easy to upgrade. In that use case, the Kubernetes cluster was most likely designed and scaled with the application's resource requirements in mind, but Kubernetes also lends itself to the "batch" use case – that is, for running resource-intensive

...
Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kick-start your AI projects with Kubeflow
    Training language models and AI algorithms requires a powerful infrastructure that is difficult to create manually. Although Kubeflow promises a remedy, it is itself a complex monster … unless you are familiar with the right approach that lets you get it up and running fairly quickly.
  • Persistent storage management for Kubernetes
    The container storage interface (CSI) allows CSI-compliant plugins to connect their systems to Kubernetes and other orchestrated container environments for persistent data storage.
  • An open source object storage solution
    We introduce the MinIO high-performance object store, its key features and applications, and some performance tips.
  • Kubernetes containers, fleet management, and applications
    Kubernetes is all the rage, but many admins find themselves struggling to get started. We present the basic architecture and the most important components and terms.
  • Nested Kubernetes with Loft
    Kubernetes has limited support for multitenancy, so many admins prefer to build multiple standalone Kubernetes clusters that eat up resources and complicate management. As a solution, Loft launches any number of clusters within the same control plane.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=