Laying out documents with Markdown and Pandoc
In Shape
Text editing programs such as Microsoft Word, Apple Pages, and the open source LibreOffice suite can be used to create documentation and instructions. All of these programs mix different elements of a document together. For example, good documentation or a good manual does not consist of just plain text; it also contains images, graphics, tables, lists, and other elements. Conversely, this means that when writing, attention must be paid not only to the content of the document but also to coordinating the individual elements neatly in terms of the layout.
Unsurprisingly, this what-you-see-is-what-you-get (WYSIWYG) principle is very popular. Another type of tool maintains a division between the content of a document and the layout (e.g., in web development). No one would think of mixing the layout of a web page with the page content. Instead, a style sheet is used for the layout, typically available as a CSS (cascading style sheet) file [1]. To create a layout for the web page, you only need to modify the CSS, without having to change any of the page content.
TeX, LaTeX, and Markdown
If you prefer a logical division between content and layout, you can always use text typesetting systems such as TeX [2] or LaTeX [3]. However, these tools are complex, with syntax that is not necessarily intuitive, and require a certain period of familiarization. LaTeX is, strictly speaking, only a collection of macros that are intended to simplify the use of TeX.
Markdown [4] is far easier to use. This simple markup language can be learned, and therefore also used, by anyone in a short amount of time. Another advantage is that you can write a Markdown document in any text editor – you do not need a special program. For an easy introduction to the syntax of the language, I recommend you take a look at the Markdown Guide [5].
To generate a finished document from a Markdown text, you can use the Pandoc [6] tool. The software comprises a library written in Haskell and a command-line tool. These aids let you convert a variety of different text formats into other formats (e.g., Markdown into HTML, ePub, or PDF). The tool also supports formats such as DOCX and PPTX (Microsoft Word and PowerPoint).
These conversions are made possible by the use of a reader and writer, which are available for each format. A reader converts the text into an abstract syntax tree (AST) before a writer then transfers the elements from the AST into the appropriate target format. In some cases, the text has to go through an intermediate step. For example, to convert Markdown to PDF, a LaTeX version is first created, which is then converted to the PDF format with the pdflatex
tool.
Installing Pandoc
The Pandoc software is available in the repositories of almost all Linux distributions. You will also find installation archives for Linux, Windows, and macOS on the Pandoc project site [6]. For Macs, you can install the software with the Brew package manager, as well. It is important to mention here that you need a TeX environment if you want to convert text to PDF. For macOS, this is easily done with Brew by installing the basictex package. For Windows, the open source MiKTeX [7] distribution of TeX is available.
Web Pages in Markdown
In the following examples, I look at a couple of practical uses of Pandoc. The first task is to convert a Markdown text into (X)HTML so that you can publish it on a web page:
pandoc -M title="pandoc" -s foo.md -o foo.html
The -M
option lets you define arbitrary metadata for the output format – in this case, the title of the web page. The -s
option ensures that only a single file is written as output, which means the stylesheet information is also contained in the header of the HTML file. A simple Markdown text is shown in Listing 1 that produces the output in Figure 1.
Listing 1
Simple Markdown
# Hello IT administrator Using pandoc, you can export texts to a variety of different formats. The supported formats include, for example. - html - pdf - epub - docx - pptx Check the **pandoc** help page by calling `man pandoc` to view *all* supported formats.
The situation is very similar if you want to convert the Markdown text into a PDF:
pandoc -s foo.md -o foo.pdf
Pandoc tries to detect the input and output formats by referencing the file extensions. However, you can also specify the formats explicitly:
pandoc -f markdown -t pdf foo.md -o foo.pdf
(i.e., with -f
/--from
and -t
/--to
).
Buy this article as PDF
(incl. VAT)