Grid Engine: Running on All Four Cylinders
In 2010, Oracle’s purchase of Sun Microsystems ended an era of technology leadership. Such acquisitions preserve intellectual property and some key individuals, but a large part of the personality and passion often spreads in the technology wind. HPC users also had many questions about the acquisition. Aside from the shivers of fear sent down the MySQL community, two Sun open software projects are used in the HPC arena. The first is the Lustre parallel filesystem, for which Oracle dropped future support or development and has since been picked up by several companies working under a GPLv2 license.
The second is Sun Grid Engine. Unlike Lustre, Oracle still offers what is now called Oracle Grid Engine to customers as a closed source product. The original Sun Grid Engine, previously known as CODINE (COmputing in DIstributed Networked Environments) or GRD (Global Resource Director), came to Sun through the purchase of Gridware Inc. in 2000. After renaming it Sun Grid Engine, Sun offered the package with source code in 2001 and also sold a commercial version called N1 Grid Engine (N1GE).
The open source license used by Sun, called the Sun Industry Standards Source License (SISSL), is now a retired free and open source license. It was recognized as an “open license” by the Free Software Foundation and the Open Source Initiative (OSI). The license is somewhat interesting. Under SISSL, developers could modify and distribute source code and derived binaries freely. Modifications could be kept private or made public; however, SISSL required that “The Modifications which You create must comply with all requirements set out by the Standards body in effect one hundred twenty (120) days before You ship the Contributor Version.” If the Modifications do not comply with the current standards, SISSL becomes a copyleft license, and source must be published “under the same terms as this license [SISSL] on a royalty free basis within thirty (30) days.” Thus, as long as shipped binaries are standards compliant, there is no requirement to ship source code. The latest official released source code from Sun was version 6.2 Update 5.
One of the more interesting aspects of HPC is the use of open source for much of the cluster “plumbing” or infrastructure. When a package suddenly undergoes a change in ownership, the future openness and availability is often of some concern. In the two years since Oracle’s purchase, the four major efforts have come to the fore:
To get a sense of where each of these products/projects fits into the HPC landscape, I contacted each group and asked some questions about features, codebases, and the future. The first and easiest is Oracle Grid Engine.
Oracle Grid Engine
Having no direct contact person at Oracle, I attempted to email their sales channels asking for someone with which I could ask some questions. I have not received any response. Before, you bring out the pitchforks and clubs, understand that Oracle has stated HPC is not a market in which they are interested. The Oracle Grid Engine Support page has plenty of information, and a 90-day free trial is purportedly available for those who register. It appears that Oracle Grid Engine is under active development and support but not targeted at HPC. It is not clear whether Oracle is adding their own features, integrating some of those found in the open versions (discussed below), or both. Also note that some of the Sun documentation is available on Oracle’s website. In 2010, after the purchase of Sun, the Grid Engine 6.2 update 6 source code was not available in releases of new binaries.
Open Grid Scheduler
Many users might not know, but Oracle worked hard to do a smooth hand-off to the open source community. The Open Grid Scheduler project was chosen by Oracle in 2010 as the open source Grid Engine maintainer. Members of the project who were not employees of Sun but had been contributing code to Sun Grid Engine since 2001 formed a company called Scalable Logic to support the open version of Grid Engine.
The Scalable Logic team continues an open Grid Engine development effort. The team at Scalable Logic plans a feature release once a year combined with update releases. One of the first major enhancements was the inclusion of hwloc (hardware locality) multicore binding.
In the up coming release, they are also adding the following new features:
- C Groups – a Linux kernel feature to limit, account for, and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups as an alternative to the Grid Engine Portable Data Collector.
- AMD Optimizations – support for the new Bulldozer architecture.
- Intel Xeon Phi – support for Intel MIC when Intel releases the new architecture.
Additionally, they had a bug fix for Cygwin support, support for Linux 3.0 on ARM, and the much-appreciated removal of NFSv4 dependency on BerkeleyDB spooling. They also merged fixes and features from end users, such as the updated Hadoop integration. Finally, the Scalable Logic team runs and contributes heavily to the Gird Engine Users mailing list, which is similar to the once popular, users@gridengine.sunsource.net mailing list.
Scalable Logic also works closely with hardware vendors such as NVidia to include things such as GPU monitoring. Because some of the hardware features contain code protected by non-disclosure agreements, they can’t use an open development model. These features do end up in the open version, however. Clearly, the Open Grid Scheduler/Grid Engine team is pushing the project to new heights and leading the way with support and new features, many of which show up later in other Grid Engine implementations.
Univa Grid Engine
On January 18, 2011, it was announced that Univa had recruited several principal engineers from the former Sun Grid Engine team and that Univa would be developing and supporting their version of Grid Engine. Univa Grid Engine, in addition to being open source, offers enhanced testing and support. Univa was a joint developer with Sun for components of the Sun HPC software stack and an OEM of Sun Grid Engine. Univa has delivered three Grid Engine production releases and seven updates in the past year.
In terms of software development, Univa reports that development has been extremely active and well funded. Moreover, Univa has invested millions of dollars in infrastructure, development, QA/automated testing, and customer support. Univa has also released UniSight for reporting and analytics and UniCloud for dynamic application management and will be releasing License Orchestrator for beta in Fall 2012.
Univa built on top of Grid Engine 6.2U5 (the last open version from Sun) and released Univa Grid Engine as open source; however, according to Univa, releases will lag between product, currently 8.1, and source, currently 8.0. Univa actively adds features for customers as required and emphasizes that most of the core scheduler work was done by the Sun engineers that now work at Univa, whereas the open community that formed was fundamentally focused on usability and configuration. Thus, versions of Univa Grid Engine will focus on production-worthy status, ensure future development, and deliver rapid turnaround with user issues without relying on community resources.
Son Of Grid Engine
The final Grid Engine implementation is cleverly called “Son of Grid Engine” and started in fall 2010 when it was clear Oracle was not contributing to the gridengine.sunsource.net site. Son of Grid Engine is a community-supported continuation of the old Sun Grid Engine project. As much of the original information as possible has been preserved, including an active repository for the project (The Open Grid Scheduler forked using a snapshot of the last Sun source tree). Additionally, Son of Grid Engine has collected much useful information from the original project, including the valuable how-tos, as well as active repository and mailing list archives.
The current version. 8.1.1, is based on Univa (version 8) and has incorporated community changes not in the Univa tree. The intention is to be an enhanced superset of the Univa public repository tracking Univa source releases. (Note that the version numbers have diverged, in that Son of Grid Engine version 8.1.1. is not based on the Univa 8.1 source tree.) Active development is evidenced by the project timeline and many releases.
Son of Grid Engine will use any code that looks useful, correct, tractable, and legal, including changes from Open Grid Scheduler and some of the packaging code from Debian and Fedora. (RPMs are available). It is intended to be free software supported by a community of fellow system managers, users, and contributions. They also want to point out that there is more than just the SGE source available in their repository, including other components such as ARCo (Accounting and Reporting Console), Inspect (Monitoring and Configuration Console), and SDM (Service Domain Manager for Grid Engine services adding cloud connectivity).
Summary
The current HPC Grid Engine seems have diverged into four camps. Oracle continues to offer Oracle Grid Engine but has no interest in the HPC market. Support forums are still active and open, but the package can be expected to diverge from the other efforts. Open Grid Scheduler backed by Scalable Logic is a fork of the last Sun open source release. They have a depth of experience, offer paid support, and are providing many of the leading enhancements in their codebase (that often gets adopted by others). They provide source code, and contribution is welcome through their mailing lists. Univa offers, by virtue of it’s acquisition of Sun engineers, a deeply supported and tested product. Source code is available via a repository; however, code releases will lag the binary release. They also offer other products that integrate and enhance their core Grid Engine version. Finally, Son of Grid Engine is an open continuation and preservation of the original Sun Grid Engine project. It offers an open repository, an enhanced Univa codebase, and lots of useful documentation.
As often happens in the open source world, what was once a single domain of development and distribution has now grown into divergent paths. What was once Grid Engine from Sun is now four “different but similar” projects that seem to be seeking their own niches. Perhaps the bigger lesson in this transition is the strength of open source in a changing market. Those that tied their boat to Grid Engine can rest assured that support, fixes, and features will continue and will be available at a level to suit their needs.
Thanks to Rayson Ho of Scalable Logic, Gary Tyreman of Univa, and David Love of the Son of Grid Engine project for their valuable input.