The Cluster Documentation Project

Leveraging the power of community to improve HPC documentation.

The latest Top500 List indicates Linux is used in 91% of all systems. This statistic probably reflects the overall market and is remarkable from several aspects. First, Linux completely dominates all other possible operating systems, which is perhaps unique among modern technology markets. And second, no one company or organization owns or controls Linux or many of the other HPC software tools and application – an aspect that is well liked by many HPC administrators. Solidifying this position through enhanced documentation is the goal of a new initiative called The Cluster Documentation Project (CDP).

Back in the Day

Some history might be helpful in understanding the potential of the CDP effort. Back in the mid-1990s a speaker at a Convex Users group meeting in Texas was asked about the role of Linux in the future of HPC. The speaker categorically dismissed Linux and considered it too much of a hobbyist project. He assumed that because it had no central corporate support, it was impossible to be used in any kind of professional setting, let alone the rough and tumble world of HPC. There was some notable rumbling in the audience, and after the talk, a small crowd gathered to discuss this “Linux thing” a bit further.

Curiously, at the same meeting, a presentation by Thomas Sterling was explaining NASA’s first experience with a new Convex parallel computer, the SPP 1000. As part of the talk, Sterling presented a few slides that described a low-cost cluster built from i486 processors and 10BT Ethernet. The project had a strange name, “Beowulf,” and was using something called Linux for an operating system. The cost of this experimental system was a mere US$ 50,000. In some cases, the Beowulf system performed as well as the new Convex computer costing two orders of magnitude more. In other use cases, the Convex computer was worth the money, but the consultant evidently never attended the session.

A very good historical description was written by Sterling in which he describes, Beowulf Breakthroughs: The Path to Commodity Supercomputing. In that article, the choice of Linux was one of practicality (Berkley Unix was tied up in the AT&T lawsuit), and Linux was open, modifiable, and freely distributable.

A further analysis of Why Linux On Clusters? has some deeper insights into why Linux has been so successful in HPC. Key aspects identified included openness, plug-and-play Unix replacement, no legal encumbrances, not designed by product marking types, and user ownership and control. Openness and ownership are often cited as why open source software has had a viral growth through the entire computer industry.

The 2010 HPC market was estimated to be US$ 25.6 billion (Source: Intersect360) with a healthy annual growth rate. At the core of this market are Linux clusters, which can range from desk-side to systems the size of a city block. In terms of software, this market has been built on cooperation rather than competition. In general, large, industry-wide software projects (operating systems, filesystems, libraries) have been developed in a cooperative fashion because, on their own, these aspects are not considered commercially viable products by many vendors (i.e., development expense vs. market size prohibits vendors from creating and owning their own products). Filesystems are a good example. Many open source filesystems are available to Linux, and often multiple companies support the design and installation of these filesystems. In a sense, all vendors share a single codebase but derive profit by bringing that codebase into action for the client. To be fair, not all HPC software is open source, and many closed-source packages are in use. Compilers and high-end applications are good examples of these products.

Documenting the Revolution

The rewards of the “open approach” can be found throughout the HPC market. One aspect that has always been a challenge is the documentation of cluster HPC methods, technologies, and practices. The Linux Documentation Project offers open documentation for many aspects of the early Linux software ecosystem, including Beowulf clusters. Numerous magazines (paper and web), books, project sites, and cluster sites (websites put together by administrators for their users) provide various levels of information. The state of these sources varies across the board. Many of the HPC resources are out of date and linked to dead projects or nonexistent pages and represent a search challenge for end users. The “multiple-component nature” of a modern HPC clusters makes a single documentation source difficult or economically unfeasible for vendors.

In addition to frustrating existing users, documentation that is difficult to find or out of date prevents new users from venturing into the market. The early HPC cluster pioneer worked without documentation and often created their own custom versions. Today’s mainstream users often need good documentation before they can make an investment into HPC technology.

Similar to the cost sharing advantage of open software, good open documentation has the ability to raise the tide for all boats in the HPC market. New and existing HPC users can benefit because a well-maintained source of information is available to everyone. Vendors can gain an advantage because some of the “documentation holes” that might exist between their products and services and the open source world can now be bridged in a professional fashion.

Harnessing Change

One of the difficulties in producing content is the dynamic nature of the methods and practices of HPC. Some fundamental aspects are well documented – MPI, for instance – and others, such as GPU computing, are currently in a state of rapid change. Committing information to a printed book only makes sense if the information has a shelf life of two to three years. On the other hand, the emergence of more dynamic online methods, such as wikis, are very good at capturing these fast-moving technologies.

The CDP plans to address both the dynamic and static nature of the market and community by using a powerful feature of the popular MediaWiki platform (used and developed for Wikipedia). If one looks closely, the main page of Wikipedia has a Create a Book link that allows users to select content and create their own books in ODF or PDF formats, or as printed and bound versions using Print On Demand (POD) methods. The feature is supported by Wikibooks and will work with any MediaWiki-based site, such as the CDP. If books are created with POD, the printer has agreed to contribute part of the sales back to the MediaWiki Foundation.

The CDP will be an openly readable wiki, which will then provide the basis for books on various HPC topics. The content used for the books will come directly from the CDP wiki. The goal of the CDP is not to create or supplant existing text or reference books but, rather, to augment them with up-to-date information on many aspects of HPC. Of course, readers will see some overlap and be directed toward existing texts if topics need a more fundamental treatment.

Free as in Speech

The entire CDP is licensed under the creative commons Attribution–NonCommercial–ShareAlike license; that is, readers are free to re-use, improve, and republish the information, provided they keep the original author attribution and do not charge money for the new work. From a practical standpoint, this provides the following advantages:

  • HPC users and vendors are free to use any content or books within the CDP.
  • HPC users and vendors are free to create their own customized books with CDP content.

For instance, a system administrator could produce a manual for their cluster by pulling information from the CDP that is specific to their HPC facilities. End users can find content in tutorial format to help learn about HPC, and vendors can create manuals for their open source–based services. There are many more possibilities because the content is both “active” and “printable” at the same time. Of course, once a book is created, the information is frozen, but the ability to make updated versions is much easier once the initial book is created.

Controlled Chaos

To provide well-composed and edited information, much of the initial CDP content will be paid for by the CDP project (see below). This requirement will ensure a baseline of good material for the community. Content can be submitted to the project by anyone in the HPC community, provided it meets the editorial guidelines of the project. The CDP plans to use experienced editors, writers, and graphics artists in this process.

The project will deliver two items: First, the CDP wiki, which currently has some usable content and will be improved and maintained in a professional fashion. It is expected to take approximately six months to augment the current site before it is ready for general use. Additionally, each year, the CDP will release a book (using Wikibooks) that will be available to the entire community. Each book will address a different topic in HPC and will be updated annually as new content becomes available. The plan is to have a series of books that can be used by both users and vendors to help enhance the market and community.

Getting Your “T-Shirt”

The CDP project needs to be funded to become a reality. Because all HPC users and vendors can benefit from this project, funds are invited from the entire community. The CDP has decided to use a global funding platform to get things started. You can read more and contribute at Funding the CDP. Both individuals and vendors have various levels of contribution and rewards. The goal is to raise enough money to get the project started and then continue the effort through vendor support.

The CDP also invites community input (users and vendors) to help build a top-tier resource. More information on contributing can be found on the CDP wiki. An initial budget will be part of the project along with an annual accounting of how funds are used. (The principals behind the CDP include myself and some of the past writers, editors, and practitioners from ClusterWorld and Linux magazines.)

Small starts can have big consequences. Remember, a small NASA project helped launch an entirely new Linux HPC market. Growing a small open and cooperative documentation project into a effective community resource can help strengthen and grow the entire market.