
Photo by Toby Elliott on Unsplash
Application-aware batch scheduler
Eruption
The basic Kubernetes scheduler – kube-scheduler
– does a great job of bin-packing pods into nodes, but it can result in a scheduling deadlock for the more complicated multipod jobs that are created by analytics and artificial intelligence and machine learning (AI/ML) frameworks such as Apache Spark and PyTorch. That means expensive cluster resources, such as GPUs, sitting idle and unavailable to any workload.
Volcano scheduler is a Cloud Native Computing Foundation (CNCF) project that introduces the Queue
and PodGroup
custom resources to enable gang scheduling (i.e., the simultaneous scheduling of multiple related objects) and facilitate more efficient use of the cluster. Complex jobs run more reliably, and data engineers become more productive.
In this article, I demonstrate default Kubernetes scheduling behavior with the use of short-lived single-pod jobs, show how multipod jobs from Apache Spark and PyTorch can trigger a scheduling lock, and use Volcano to run the same jobs smoothly and predictably. The Git repository [1] gives full details of how to create a test Kubernetes cluster on Digital Ocean and to run all the examples.
Kubernetes for Analytics and ML
Kubernetes, often considered the operating system of the cloud, is often thought of in terms of distributed microservices – in other words, client-server applications with an indefinite lifespan, decomposed into smaller services components (e.g., database, business logic, web front end) for containerized deployment in a way that makes each part redundant, scalable, and easy to upgrade. In that use case, the Kubernetes cluster was most likely designed and scaled with the application's resource requirements in mind, but Kubernetes also lends itself to the "batch" use case – that is, for running resource-intensive
...Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
