Is this powerful but simple scripting language big enough for Big Data?

The Lua Scripting Language

The Lua scripting language celebrates its 20th birthday this year. Because it is used more often as an embedded scripting language than as an independent programming language (e.g., Python or Perl), Lua is less familiar to many people.

Lua is very widespread in games and game engines. In fact, Wikipedia lists nearly 150 games that use Lua in a separate category of “Lua-scripted video games.” However, Lua can also be found in a variety of network and system programs, such as the Wireshark network analyzer, the Nmap scanner [1], the MySQL Proxy program, the Rspamd anti-spam solution, the FreeSWITCH VoIP software, the Redis NoSQL database, the Apache web server [2], and the Nginx reverse proxy service.

Homegrown

Lua was developed at the Catholic University of Rio de Janeiro by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar Celes. Because Brazil was subject to strong import restrictions for hardware and software in 1992, the three decided to develop a separate scripting language for their own purposes, which eventually culminated in Lua (Portuguese for “moon”). Ierusalimschy still controls the development today and has published the standard work Programming in Lua, third edition, which was published early last year. The first edition, available online refers to Lua 5.0. The language has now moved to version 5.2, but the online book is still up to date to a large extent.

As mentioned, Lua is primarily designed as a library that application programmers can integrate into their software to add scripting capabilities. However, you can also use Lua without additional software. The Lua distribution, which is available for all major operating systems, contains an interpreter that weighs in at just a few hundred lines of code and otherwise relies on the existing library functions. This compactness (Figure 1) characterizes the entire Lua distribution and, along with a fairly high execution speed, is repeatedly touted as one of Lua’s advantages.

Figure 1: A small Lua script that makes use of libguestfs.

Nevertheless, the interpreter also offers features such as garbage collection (i.e., it automatically cleans up unused data structures to release memory).

Calling the Lua interpreter with lua, launches in an interactive mode, in which you can enter code. Alternatively, it executes Lua scripts that it accepts as a parameter.

The Lua compiler is called luac; it converts programs to Lua bytecode before execution, which saves a bit of compile time when running but does not otherwise offer any performance benefits.

Manageable

Lua offers a few exciting features. The syntax is quite conventional; for example, it marks blocks with the key words do and end instead of using braces. All told, Lua includes only about 20 reserved keywords (Table 1), which makes it quite easy to learn.

Lua is a dynamically typed language that supports types nil, boolean, number, string, function, thread, table, and userdata. The type of a variable is thus determined when a script runs and can optionally be converted to another type.

It is no problem to assign a number to a variable and then to a string later on in the program. In the case of the Boolean type, which accepts logical values, false and nil are both false, whereas an empty string or 0 represent a true value.

This is unique and unlike PHP, for example, where the assignment of arbitrary types to logical values tends to be quite chaotic rather than adhering to a specific method.

Strings

Strings are defined – as in other languages – in single or double quotes. However, strings enclosed by double brackets that span multiple lines are a special feature of Lua. An HTML string thus looks like this:

html = [[
  <html>
  <head>
  ...
]]

The special operator for connecting strings is ... Using +, as in other languages, does not work in Lua, which reserves this operator for numbers. Formatting strings is similar to C programming, such as

string.format("%.7f", math.pi)

for numbers. The string module also provides a set of functions that allow you to search for characters in strings. For example,

string.find (string, search_string)

returns two numbers representing the beginning and the end of the string found. string.gmatch returns an iterator; if you provide a search pattern, it successively returns all the found results.

Other functions return the length of a string, convert uppercase and lowercase (or vice versa), invert a string, and so on.

In principle, Lua can store UTF-8 in its strings because they use 8 bits. However, the language core does not provide any further features for processing UTF-8. Currently, a few modules, such as slnunicode, handle this type of operation. The Lua core language will add support for UTF-8 in a future version.

Numeric typing is straightforward, with just number, which largely corresponds to a floating-point number (float) in other languages. In particular, Lua has no integer variable type. The only data structure offered by Lua is tables, which replace the arrays and hashes of other programming languages.

Tables basically work like hashes or dictionaries: Instead of holding a single value, a variable uses keys, which can be of different types, to store a variety of values. In a table, neither the keys nor the values need to be of the same type. A new table is initialized in Lua with the curly brackets constructors, { }:

t = {}
t['foo'] = 'bar'
t['123'] = 'linux'

As the Lua developers emphasize, tables are not really variables or types but dynamic objects, whose values you can only reference in your program.

This sounds more complicated than it is. Also, it is not important when programming, but you should keep in mind that tables are not copied during allocation, even though you always use the same table:

x = {}
x['os'] = "linux"
y = x
print(y['os'])
linux

Note the content of y['os'] is the string linux, as the output from the print statement shows. To save the programmer some typing, Lua also provides a shorthand notation for specifying the key:

print(y.os)
linux

Tables also allow programmers to implement arrays through the use of continuous numeric values for the keys. In principle, arrays can thus also begin at any value (i.e., 0 or 1). Following Lua convention, however, they start with 1. Two-dimensional data structures such as matrices are created by defining a table that contains tables.

First Class: Functions

To structure programs, Lua provides functions that represent distinct types that also can be stored in variables. Functions are defined with the function keyword followed by brackets containing parameters. The function body follows and is terminated by the end keyword. To define a variable number of parameters, Lua uses the construct ..., which can be confusing in examples because it looks like code has been omitted. For example, select (x, ...) lets you access the xth element in the function body. select('#', ...) returns the number of parameters actually passed in. Alternatively, the statement args = {...} grabs all the arguments in the args table. The Lua interpreter does not complain when a function expects three parameters, but only two are provided with a call. The missing parameter is set to nil. An idiom for emulating default values for parameters that do not exist looks like:

function f(a, b, c)
   local a = a or 0
   ...

The local variable a is thus given the value of the parameter variable a if present, and otherwise a value of 0. Because you can store functions in variables and pass them into other functions, you can also construct higher order functions. Lua is also used to program in a functional style, which is back in fashion thanks to languages like Scala, Clojure, and even JavaScript (in the form of Node.js). To parallelize programs, Lua does not use threads but co-routines, which are less prone to error.

Loops

The control structures in Lua are essentially the same as in other popular programming languages. The if queries can contain multiple elseifs and an else block. A while statement always checks a condition at the beginning and executes the block as long as the condition is satisfied. A repeat block does this in reverse and runs until the condition stated at the end of the block is no longer fulfilled.

A for loop can extend over ranges of numbers or use an iterator function that you create (e.g., from an array). Special functions include pairs and ipairs. The following code iterates against an array using a for loop:

tbl = {"a", "b", "c"}
for key, value in ipairs(tbl) do
   print(key, value)
end

A numeric for loop extends over a range of numbers, either with or without an increment: for i = 1, 5 iterates over any number between 1 and 5, whereas for i = 1, 10, 2 uses steps of 2.

A break statement terminates the loop and then continues with program flow. Strangely, Lua is missing a continue statement that other programming languages have, which jumps to the end of a loop and then continues. It has to be simulated in a fairly complicated way using a goto:

for i = 1, 10 do
   if i % 2 == 0 then goto continue end
   print(i)
   ::continue::
end

A goto label is, as seen here, enclosed by two double colons. To simulate the continue statement, the marker occurs directly before the loop end.

Rocks

Basically, I’ve covered the main language features in Lua, which can already achieve quite a lot. For further reading and a quick reference for syntax and functions, check out the language reference.

A programming language is of little value without a healthy ecosystem, and the Luarocks module repository fulfills this purpose. It is installed from source in a jiffy but is also included in most Linux distributions. A call to:

luarocks <search term>

searches in the repositories, and

luarocks install <package>

installs it locally. Root privileges are needed if the location for the packages is only writable for the superuser.

Table 2 shows a selection of useful extensions available from the Luarocks repository.

Unfortunately, not all Lua libraries are available on Luarocks, such as modules for LDAP and modern features like libguestfs and the Augeas configuration API. Thanks to the widespread use of Lua, you will have no shortage of programming tools. If you want to use something other than vi or Emacs for your development work, you can turn to a number of graphical development environments (e.g., ZeroBrane; Figure 2) that are available for Linux, Windows, and OS X and that cost as much as you are willing to pay.

Figure 2: Users can pay as much as they like for the ZeroBrane IDE.

Lua plugins are also available for the major Java IDEs, such as Eclipse, Netbeans, and IntelliJ. The ZeroBrane offers many tutorials, such as the one for debugging Wireshark scripts.

The Codea IDE is interesting in that it implements a Lua development environment on the iPad (Figure 3).

Figure 3: Codea is a Lua development environment on the iPad. (twolivesleft.com)

The movie on the Codea page is worth seeing; it shows what a development environment could look like in support of the developer with colors, sounds, file selection, and so forth, according to data type.

Info

[1] “NSE: Nmap Scripting Engine” by Ron McCarty, ADMIN, 2011, issue 06, pg. 72
[2] “Lua for Apache” by Tim Schürmann, ADMIN, issue 09, pg. 42