Working with Pandoc

Pandoc is the “universal document converter.” It can take an myriad of input formats and produce a equally large variety of outputs–which is great for writing with Markdown!

For example, you may write an article in Markdown and use Pandoc to:

  • convert it into a DOCX for the required final submission format or collaborator
  • export a high quality print PDF (using LaTeX)
  • generate an HTML version to paste into a Canvas course

Alternatively, you can take other formats and convert to Markdown:

  • convert a DOCX into Markdown for better editing, use with LLMs, or adding to web project
  • convert an HTML file into easier to edit Markdown

The Pandoc User’s Guide provides the huge range of options, however, you will probably end up using only a handful of commands.

Getting Started

Check the installation documentation for how to get it set up on your computer, or the notes the Resource section.

Download the “markdown-demo.md” file to provide some demo content.

Open a Terminal

Pandoc is a commandline application, so to use it you will need to open a terminal:

  • Windows: open menu and search for “Git Bash” or “powershell” or “CMD”
  • Mac: Applications > Utilities > Terminal, or launch Spotlight (Command + spacebar) and type “Terminal”
  • Linux: most likely called “Terminal”

In the terminal window, type pandoc --version and press Enter. If installed correctly, this should output a version number and some information!

In the terminal window, navigate to your Downloads directory, most likely cd Downloads, so that you can try some commands on “markdown-demo.md”.

Convert a File

The basic anatomy of a Pandoc command looks like:

pandoc + input file name + some option flags + -o + output file name

For example: pandoc markdown-demo.md -o markdown-demo.html

Pandoc will use the extensions of the input and output file names to guess the markup format. However, the formats can be specified if necessary, using “from” -f and “to” -t options. For example, pandoc test.md -f markdown -t html -o test.html.

To Office Docs

Pandoc is good at converting Markdown to DOCX and ODT (LibreOffice) formats:

  • Convert to DOCX: pandoc markdown-demo.md -o markdown-demo.docx
  • Convert to ODT: pandoc markdown-demo.md -o markdown-demo.odt

However, if you have a standard Markdown file and convert to DOCX or ODT you might be surprised to see images with captions in the resulting document. Pandoc Markdown flavor treats image markup differently that most web-oriented markdown flavors. In CommonMark and GitHub Flavored Markdown image markup looks like ![alt text](image.jpg). Pandoc uses the implicit_figures package which treats the alt text as an image caption. To add a different alt text (which is best practice in most cases) you would use following the syntax:

![figure caption](image.png){alt="description of image"}

To avoid captions (if you don’t want them!), the best option is to specify the “from” format, such as the typical GitHub Flavor Markdown (GFM) -f gfm. For example:

pandoc -f gfm markdown-demo.md -o markdown-demo.docx

To Markdown

Pandoc is also good at converting to Markdown from other formats, so you can edit Markdown instead of some more cumbersome form–or to get simpler text data for use in code and LLM projects. Keep in mind any complex style formatting will be discarded–which can be helpful in a lot of cases too!

The basic version pandoc example.docx -o example.md will generate a Pandoc Flavor Markdown document with hard wrap and escaping–it might look more complicated that the minimal styles explained in this workshop and seen in web-oriented flavors.

To keep it simpler you can use a “to” format option, like -t gfm. Wrap can be turned off using the option --wrap=none. For example:

pandoc example.docx -t gfm --wrap=none example.md

Generate a PDF

Creating PDF with Pandoc requires LaTeX installed. Pandoc converts the document into LaTeX, then uses LaTeX typesetting engine to output the PDF. The first time you create a PDF, your LaTeX distribution’s package manager will probably pop up asking you to install new packages multiple times–your first PDF might take awhile!

In terminal, type: pandoc test.md -o test.pdf

The result should be a decent looking PDF (optimized for print). Try adding a table of contents with the --toc option:

pandoc --toc test.md -o test.pdf

To start tweaking your PDF layout, you can pass LaTeX variables to Pandoc using the -V variable flag (see docs for all LaTeX variable options). If you know LaTeX, you can get fancy right away and use existing templates. However, you can also use a few very simple options to spruce up the defaults. For example, the default margin is very large, so you might want to use:

pandoc -V geometry=margin=1.25in test.md -o test2.pdf

This will use the LaTeX package geometry to set all the margins to 1.25 inch.

Now try a new font:

pandoc -V fontfamily="electrum" -V geometry=margin=1.25in -o test3.pdf test.md

Font size can be controlled using -V fontsize=12pt, however, the default template only supports sizes 10, 11, or 12. More sizes are supported using the extsize package (8pt, 9pt, 10pt, 11pt, 12pt, 14pt, 17pt, 20pt) which can be used by adding -V documentclass=extarticle plus the desired fontsize.

YAML Metadata

Rather than setting all these options on the commandline, you can simplify by declaring the LaTeX options as metadata at the top of the document instead. Pandoc uses a YAML metadata block for document features such as title, author, and abstract, as well as configuration variables for outputs.

YAML is a (in theory) human readable plain text data format. It is added to the top of a documents sandwiched between a line with three hyphens --- at the top and bottom. Variables are mostly key value pairs.

For example, try adding a metadata block to your test.md, then generate a new PDF:

---
title: Test Document
author: A. Great Writer
date: 2020-02-20
abstract: "This test document explores all you need to know about Markdown and Pandoc. Our findings suggest it is possible to create PDFs from Markdown using Pandoc."
---

Add Pandoc LaTeX variables in the same way:

---
title: Test Document
author: A. Great Writer
geometry: margin=1in
documentclass: extarticle
fontfamily: accanthis
fontsize: 14pt
colorlinks: true
---

Next Steps

An equation in $E=mc^2$ the sentence.
Or displayed:

$$ x^n + y^n = z^n $$