Matlab-Like Tools for HPC
From people who build a simple two-node cluster all the way up to people who have access to very large systems, one of the most common questions about high-performance computing (HPC) is: “What applications can I run on an HPC system?” One of the most popular applications is Matlab, which a large number of people use in their everyday work and research – either Matlab or Matlab-like tools. For example, a fairly recent blog posting from Harvard University’s Faculty of Arts and Sciences, Research Computing Group showed that the second most popular Environment Module was Matlab. People are using Matlab for a variety of tasks that range from the humanities, to science, to engineering, to games, and more. Some researchers use it for parameter sweeps by launching 25,000 or more individual Matlab runs at the same time. Needless to say, Matlab is used very heavily at a number of places, so it is a very good candidate for running on an HPC system.
I don’t want to take anything away from MathWorks, the creator of Matlab, because their product is a wonderful application, but for a number of reasons, Matlab might not be the answer for some people (e.g., they either can’t afford Matlab or can’t afford 25,000 licenses, they just want to try a few Matlab features, or they want or need access to the source code). This brings up the category of tools that are typically called “Matlab-like”; that is, they try to emulate the concept of Matlab and make the syntax basically compatible so moving back and forth is relatively easy. When people ask what tools or applications they can try on their shiny new cluster, I tend to recommend one of these Matlab-like tools, even though they aren’t strictly parallel right out of the box (so to speak).
In this article, I want to talk about a few of these tools so you can get an idea of what’s available in the open source world for Matlab-like tools. I won’t be looking at other numerical tools that have a syntax different from Matlab, such as R or Scipy; rather, I’ll be covering tools that are trying to be like Matlab.
I’ll be briefly covering Scilab, GNU Octave, and FreeMat. These tools try to be as close as possible to Matlab syntax so that Matlab code will transfer over easily, with the possible exception of Simulink and GUI Matlab code. They have varying degrees of success with Matlab compatibility, but all are inherently serial applications.
Serial in this case means that the vast majority of the code is executed on a single core, although some of the programs have the ability to do a small amount of parallel execution. To get them to run code in parallel usually requires some add-ons, such as MPI, and rewriting the code. This approach allows you to start multiple instances of the tool on different nodes and have them communicate over a network so that code can be executed in parallel.
I won’t be comparing or contrasting the tools; rather, I’ll briefly present them with some pointers on how to install and use the tool, and I’ll leave the final determination of which tool is “better” for your case up to you.
Scilab
Scilab is one of the oldest Matlab-like tools. It was started in 1990 in France, and in May 2003, a Scilab Consortium was formed to better promote the tool. In June 2012, the Consortium created Scilab Enterprises, which provides a comprehensive set of services around Scilab. Currently, it also develops and maintains the software. Scilab is released under a GPL-compatible license called CeCILL.
Prepackaged versions of Scilab exist for Linux (32-bit and 64-bit); Mac OS X; and Windows XP, Vista, and Windows 7, along with, of course, the source code. These packages include all of Scilab including something called Xcos which is something along the lines of Simulink from MathWorks. Scilab is the only open source Matlab-like tool to include something akin to Simulink. Scilab also comes with both 2D and 3D visualization, extensive optimization capability, statistics, control system design and analysis, signal processing, and the ability to create GUIs by writing code in Scilab. You can also interface Fortran, C, C++, Java, or .NET code to Scilab.
Installing Scilab on Linux is easy with either one of the two precompiled binaries: 32- or 64-bit. I downloaded the 64-bit binary (a tar.gz file), and untarred it into /opt . This produces a subdirectory /opt/scilab-5.4.0 (which was the latest version as I wrote this). To run Scilab, I just used the command
/opt/scilab-5.4.0/bin/scilab
which brought up the Scilab GUI tool. The main window is shown in Figure 1.
The console in the middle of the figure accepts commands; the remainder of the window is a file browser on the left, a variable browser at top right, and a command history on the bottom right. It also has a very nice built-in text editor called “SciNotes” (Figure 2), which can be used to write code.
Scilab’s innovative Variable Browser allows you to edit variable values, including those in matrices, using something like a spreadsheet tool. When you first bring up the editor, it displays a list of the variables in the current workspace (Figure 3).
When you double-click on a variable, you call up the variable editor to edit the values. For example, double-clicking on variable A brought up the spreadsheet-like view shown in Figure 4.
At this point, I can edit any value for any entry of A .
A “Modules” capability adds extra functionality to Scilab. Much like the “toolboxes,” of Matlab, Scilab keeps modules at a website called ATOMS (AuTomatic mOdules Management for Scilab). One of the most critical modules for HPC is probably sciGPGPU, which provides GPU computing capabilities. Using sciGPGPU within Scilab is relatively straightforward, but you need to know something about GPUs and CUDA or OpenCL to use it effectively. Listing 1 shows a code snippet taken from the main sciGPGU site that illustrates how to use the cuBLAS library. (Note that you can also used the cuFFT library, but sample code for it is not shown.)
Listing 1: Sample Scilab code for GPUs using sciGPGPU
stacksize('max'); // Init host data (CPU) A = rand(1000,1000); B = rand(1000,1000); C = rand(1000,1000); // Set host data on the Device (GPU) dA = gpuSetData(A); dC = gpuSetData(C); d1 = gpuMult(A,B); d2 = gpuMult(dA,dC); d3 = gpuMult(d1,d2); result = gpuGetData(d3); // Get result on host // Free device memory dA = gpuFree(dA); dC = gpuFree(dC); d1 = gpuFree(d1); d2 = gpuFree(d2); d3 = gpuFree(d3);
Scilab has a vibrant community, and one excellent place to go to learn more or to help get started is the Scilab wiki, which has a very good section on migrating from Matlab to Scilab. At this site, an extensive PDF discusses differences between Matlab and Scilab and how to change your Matlab code, if it needs to be changed, to run on Scilab.
An additional excellent Scilab resource is a PowerPoint presentation by Johnny Heikell of 504 slides (at last count) that introduces Scilab and how to use it. He also shows how to convert Matlab files to Scilab files.
Keep in mind that the downloadable Scilab binaries are built to be as fast as possible, yet still be transportable. Because performance is extremely important in HPC, you might want to build Scilab yourself. This would allow you to include Intel’s MKL library, to get the fastest possible BLAS and FFT operations for Intel processors, or ACML (AMD Core Math Library), which is used to tune AMD processors. Be sure to read all of the details on building Scilab at the wiki site; the GUI portion of Scilab requires Java.
GNU Octave
The GNU Octave project was conceived by John W. Eaton at the University of Wisconsin-Madison as a companion to a chemical reactor course he taught. Serious design of Octave, as it was first called, began in 1992, with the first alpha release on January 4, 1993, and the 1.0 release on February 17, 1994. In 1997, Octave became GNU Octave (starting with version 2.0.6). From the beginning, it was published under the GNU GPL license – initially, the GNU GPLv2 license but later switched to the GNU GPLv3 license.
For the rest of this article, I will refer to GNU Octave as just Octave. Like Scilab and Matlab, Octave is a high-level interactive language for numerical computations. Its language is very similar to, but slightly different from, Matlab. It comes with a large number of functions and packages and uses Gnuplot for plotting and visualization.
Octave is popular and widely used, perhaps partly because it is part of GNU, so it is commonly built for Linux distributions. However, I also think it is widely used because the basic syntax is close to Matlab, and it is open-source. Some differences between Octave and Matlab are explained in the Octave wiki, a FAQ on porting, a table of key differences, and a Wikibook.
A huge number of additional toolkits (same concept as a Matlab toolbox) for Octave are available at Octave-Forge. Although there are far too many to be listed here, a few notable ones include:
- Benchmark (about 2 years old but still possibly useful)
- Control
- Data smoothing
- Database
- Financial
- Fuzzy-Logic-toolkit
- Image (processing images)
- IO (I/O in external formats)
- Linear algebra (additional linear algebra computations)
- Multicore (about 2 years old, but intended for parallel processing functions)
- nnet (Neural networks)
- Optim (optimization)
- Signal (signal processing)
- Specfun (special functions)
- Statistics
- Symbolic (symbolic computations)
One thing you do need to note about Octave is that files from Matlab Central’s File Exchange cannot be used in Octave, as explained in the Octave FAQ.
Octave is easy to install because your favorite distribution probably has it available. In my case, I use Scientific Linux 6.2 (Listing 2).
Listing 2: Abbreviated installation output of Octave on SL6.2 system
[root@test1 laytonjb]# yum install octave ... Dependencies Resolved ===================================================================================== Package Arch Version Repository Size ===================================================================================== Installing: octave x86_64 6:3.4.3-1.el6 epel 9.1 M Installing for dependencies: GraphicsMagick x86_64 1.3.17-1.el6 epel 2.2 M GraphicsMagick-c++ x86_64 1.3.17-1.el6 epel 103 k blas x86_64 3.2.1-4.el6 sl 320 k environment-modules x86_64 3.2.7b-6.el6 sl 95 k fftw x86_64 3.2.2-14.el6 atrpms 1.6 M fltk x86_64 1.1.10-1.el6 atrpms 375 k glpk x86_64 4.40-1.1.el6 sl 358 k hdf5-mpich2 x86_64 1.8.5.patch1-7.el6 epel 1.4 M mpich2 x86_64 1.2.1-2.3.el6 sl 3.7 M qhull x86_64 2010.1-1.el6 atrpms 346 k qrupdate x86_64 1.1.2-1.el6 epel 79 k suitesparse x86_64 3.4.0-2.el6 epel 782 k texinfo x86_64 4.13a-8.el6 sl 667 k Transaction Summary ===================================================================================== Install 14 Package(s) Total download size: 21 M Installed size: 81 M Is this ok [y/N]: y ... Installed: octave.x86_64 6:3.4.3-1.el6 Dependency Installed: GraphicsMagick.x86_64 0:1.3.17-1.el6 GraphicsMagick-c++.x86_64 0:1.3.17-1.el6 blas.x86_64 0:3.2.1-4.el6 environment-modules.x86_64 0:3.2.7b-6.el6 fftw.x86_64 0:3.2.2-14.el6 fltk.x86_64 0:1.1.10-1.el6 glpk.x86_64 0:4.40-1.1.el6 hdf5-mpich2.x86_64 0:1.8.5.patch1-7.el6 mpich2.x86_64 0:1.2.1-2.3.el6 qhull.x86_64 0:2010.1-1.el6 qrupdate.x86_64 0:1.1.2-1.el6 suitesparse.x86_64 0:3.4.0-2.el6 texinfo.x86_64 0:4.13a-8.el6 Complete!
After installing Octave, I had one small problem to solve. The HDF5 libraries couldn’t be found, so I added a line to my .bashrc file so the library was in LD_LIBRARY_PATH :
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib64/mpich2/lib/"
To run Octave, I simply enter octave at the command prompt.
Right now, Octave is a command-line-driven tool without a standard GUI. Several attempts have been made at a GUI, but none have been successful enough to be included with Octave. You can read more about it here in the Octave FAQ, but for the time being, Octave is a command-line tool.
Figure 5 below shows a console window on my system with some Octave commands.
Octave can also use gnuplot to plot results and for visualization. Figure 6 below is an example of a 3D plot from an Introduction to GNU Octave website that shows the commands used to create a plot (Figure 7).
Octave creates a new window with the resulting plot, as shown in Figure 7.
A number of sites have introductions and examples of Octave, and a good place to start is the Octave wiki or a slightly dated Introduction to Octave PDF, which is nevertheless still a valuable resource for help getting started with Octave.
Recently, an effort has been made to create a JIT (Just In Time) compiler for Octave. It is a work in progress and not quite ready for production work, but you can read about the goals and possibly experiment with it. Be warned that work on the JIT has not progressed for a few months, but I’m hoping it doesn’t become another dead Octave project.
As with Scilab, the downloadable binaries for Octave that come with your distribution are likely to be the least common denominator in terms of performance, but building Octave is fairly easy. Intel provides a set of instructions on how to build Octave using MKL, and a blog post tells you how to build Octave with ACML for AMD processors (it’s for Ubuntu, but the principles are the same). To make things a little more generic, you can also use OpenBLAS to build Octave.
Some efforts have been made to run some Octave functions on GPUs. However, adding GPU capability to Octave is not likely to happen anytime soon. To be honest, I don’t completely understand the issues, but it involves license issues because the GPU GPLv3 licenses are not compatible with licenses for various GPU tools and languages (CUDA in particular). Hopefully, this will be resolved in the future, but in my opinion, it really hurts Octave’s applicability in HPC.
FreeMat
A more recent development effort for a Matlab-like tool is called FreeMat. The intention is to develop an interactive numerical environment that is similar to both Matlab and IDL. FreeMat has prebuilt binaries for Windows, Mac OS X, and Linux and is released under the GPL license (I think GPLv2).
FreeMat follows the same lines as Scilab and Octave, and the language is fairly close to Matlab’s language. The FreeMat FAQ has a short section on the differences between FreeMat and Matlab that should help you take Matlab code and run it with FreeMat.
I tried installing an FC14 (Fedora Core 14) version of FreeMat 4.x on my Scientific Linux 6.2 system using rpm to install it and yum to help resolve dependencies, but I received errors that I could not resolve, and it failed, so I tested FreeMat on a Windows 7 system.
Figure 8 shows the FreeMat console that comes up when started.
The window looks similar to Scilab and, to some degree, Matlab. A console appears on the right, and the stacked windows on the left are the file browser, history, variable list, and debug windows. The figure shows that the simple AX=B works just the same as in Matlab, Scilab, and Octave.
FreeMat can also do some reasonable graphics. Figure 9 shows the console for a simple 3D plot example taken from the FreeMat help site, and Figure 10 shows the plot.
The FreeMat site has a good introduction to the software, and you can find a FreeMat Primer on the FLOSS for Science website. A good introduction to FreeMat is combined with a discussion of basic numerical methods, as well. The PDF is incomplete by a few pages, but it does get you started with FreeMat.
Others
A few tools that are somewhat Matlab-like – some still surviving and some defunct – include RLaB, RLaB+, JMathlab, and O-Matrix (commercial). A whole host of other tools exist if you want to stray from Matlab compatibility even further.
Going Parallel
Matlab and Matlab-like tools are extremely useful in HPC even though they are serial applications. As I mentioned earlier in this article, Matlab and Matlab-like tools can be used for tasks such as parameter sweeps by running something like 25,000 simultaneous instances of the application. However, in other situations, you might want to run the underlying functions in parallel.
For example, you might want to perform a large FFT or a large SVD (single-value decomposition) as quickly as possible by running the application using all of the cores in the node, or even by running the computations across several distributed nodes.
Several parallel processing options for Scilab are summarized in the Scilab parallel computing documentation. The first option is to use the inherent multicore capabilities in the functions used in Scilab. For example, certain libraries perform the linear algebra computations in Scilab, and these libraries could perform the computations using all of the cores in the system. For example, Intel’s MKL library can use all of the cores for performing matrix multiplications or other functions. Typically this is done using OpenMP, but not necessarily. However, these computations are limited to intrinsic functions, so you can’t parallelize Scilab code such as a for loop.
Scilab also has the capability of running more explicit parallel applications on multicore systems (i.e., cores on the same node). A function called parallel_run allows parallel calls to a function. This allows you to parallelize function calls on the system – but remember that the execution is on a single node (but with four-socket AMD systems, you can get 64 cores on a single system).
For parallel distributed applications on Scilab, you can also use PVM (Parallel Virtual Machine). PVM is a rather old approach to parallel programming and has given way to MPI (Message Passing Interface) for the most part, but it is still used in some areas. A good blog post discusses how to use PVM within Scilab (but it is two years old by now). A git repository holds some early code developed by Scilab Enterprises to create MPI capability for Scilab.
In a manner similar to Scilab, Octave can also use numerical libraries that have been parallelized to run on a single node, such as Intel’s MKL or something similar, perhaps using OpenMP. You just have to build Octave yourself and use the appropriate libraries.
Octave also has a parallel toolbox to use for running applications on a cluster or a distributed system, and with the parcellfun command, you can execute parallel function calls on the same node. This is very similar to Scilab’s parallel_run command.
The openmpi_ext toolbox uses MPI to allow Octave instances on different nodes to communicate and share data. It requires the use of Open MPI, but if you have experience in HPC, it isn’t difficult to build and install.
Parallel coding in FreeMat is a little more difficult. Evidently, early versions of FreeMat could use MPI for parallel coding; however, it appears this work has not been continued in the current versions of FreeMat.
One interesting FreeMat feature is the use of threads within the language. FreeMat-threads can communicate with each other through the use of global variables. Although I have not tested this feature, it appears to be in the current versions.
Summary
In this article, I briefly reviewed three Matlab-like tools: Scilab, Octave, and FreeMat. All three have their pluses and minuses that can be debated, but in my opinion, which one you chose ultimately depends on your requirements. If you need a comparison of these tools check out this University of Maryland technical report.
If you searching for a general-purpose numerical tool for HPC, one of these tools is a good candidate. If you are willing to stray further from Matlab compatibility, other candidates could work as well, but that is the subject of another article and likely another series of debates. In the mean time, give one of these applications a whirl – I think you’ll like what you see.