Reproducible Publications with Julia and Quarto

J.J. Allaire (RStudio, PBC)

JuliaCon 2022

Overview

  • Quarto Basics

  • Scientific Markdown

  • Output Formats

  • Quarto and Julia

Quarto Basics

What is Quarto?

https://quarto.org

Quarto is an open-source scientific and technical publishing system that builds on standard markdown with features essential for scientific communication.

  • Computations: Python, R, Julia, Observable JS
  • Markdown: Pandoc w/ many enhancements
  • Output: Documents, presentations, websites, books, blogs

Literate programming system in the tradition of Org-Mode, Sweave, Weave.jl, R Markdown, iPyPublish, Jupyter Book, etc.

Origins

  • Open source project sponsored by RStudio, PBC.
  • 10 years of experience with R Markdown (a similar system that was R-specific) convinced us that the core ideas were sound.
  • The number of languages and runtimes used for scientific discourse is very broad.
  • Quarto is a ground-up re-imagining of R Markdown that is fundamentally multi-language and multi-engine.

Goal: Computational Documents

  • Documents that incorporate the source code required for their production
  • Notebook and plain text flavors
  • Automation and reproducibility

Goal: Scientific Markdown

Goal: Single Source Publishing

https://coko.foundation/articles/single-source-publishing.html

Simple Example

Markdown document with cross references and executable code blocks:

---
title: "Plots Demo"
author: "Norah Jones"
date: "5/22/2021"
jupyter: julia-1.7
---

## Parametric Plots

Plot function pair (x(u), y(u)). See @fig-parametric for an example.

```{julia}
#| label: fig-parametric
#| fig-cap: "Parametric Plot"
#| echo: false

using Plots
plot(sin, x->sin(2x), 0, 2π, leg=false, fill=(0,:lavender))
```

Can be rendered to dozens of output formats (via Pandoc):

quarto render hello.qmd --to html
quarto render hello.qmd --to pdf
quarto render hello.qmd --to docx
quarto render hello.qmd --to pptx

Rendered Output

HTML

PDF

Rendered Output

MS Word

PowerPoint

Code Cells

Special syntax indicates that a code block is executable:

```{julia}
#| echo: false

using Plots
plot(sin, x->sin(2x), 0, 2π, leg=false, fill=(0,:lavender))
```

Code is executed and its output is included in the document.

Some useful options for code cells:

Option Description
echo Control whether source code is displayed.
output Control whether output is displayed.
warning Control whether warnings are displayed.
code-fold Fold code using HTML <details> tag.
layout Layout multiple plots side by side.

Rendering Pipeline

Plain text workflow (.qmd => .ipynb then execute cells):

Notebook workflow (no execution occurs by default):

Scientific Markdown

Brief History

Citations

Pandoc includes robust support for citations and bibliographies in a wide variety of formats including BibTeX, CSL, and RIS.

Markdown Syntax Output
Blah Blah [see @knuth1984, pp. 33-35;
also @wickham2015, chap. 1]
Blah Blah (see Knuth 1984, 33–35; also Wickham 2015, chap. 1)
Blah Blah [@knuth1984, pp. 33-35,
38-39 and passim]
Blah Blah (Knuth 1984, 33–35, 38–39 and passim)
Blah Blah [@wickham2015; @knuth1984].
Blah Blah (Wickham 2015; Knuth 1984).
Wickham says blah [-@wickham2015]
Wickham says blah (2015)

More than 10,000 citation output styles supported via CSL (Citation Style Language):

---
title: "My ARticle"
bibliography: references.bib
csl: nature.csl
---

Cross References

Cross reference figures, tables, equations, sections, theorems, etc.

Markdown: https://quarto.org/docs/authoring/cross-references.html#subfigures

Julia cell: https://quarto.org/docs/computations/julia.html#code-blocks

Figure/Layout Panels

  • Arbitrary layout of figures and tables:

    ::: {layout="[[40,-20,40], [100]]"}
    ![Surus](surus.png)
    ![Hanno](hanno.png)
    ![Lin Wang](lin-wang.png)
    :::
  • Shorthand syntax (layout-ncol or layout-nrow) available for simple cases:

    ::: {layout-ncol="2"}
    Block 1
    
    Block 2
    :::

Also works for code chunks that produce figures and tables: https://quarto.org/docs/authoring/figures.html#computations

Callouts

Work in HTML, PDF, MS Word, Revealjs, and ePub output

Advanced Page Layout

Portable markdown syntax for advanced grid/column layout in HTML and PDF

  • Figures and tables that span the entire page
  • Use of margin for figures, equations, captions, asides, footnotes, and citations
  • Responsive show/hide of site navigation and TOC

Docs: https://quarto.org/docs/authoring/article-layout.html

Diagrams

Quarto has native support for embedding Mermaid and Graphviz diagrams.

```{mermaid}
flowchart LR
  A[Hard edge] --> B(Round edge)
  B --> C{Decision}
  C --> D[Result one]
  C --> E[Result two]
```

Output Formats

Documents

Pandoc supports a huge array of output formats, all of which can be used with Quarto.

  • HTML
  • PDF
  • MS Word
  • Open Office
  • JATS (Journal Article Tag Suite)
  • ConTeXt
  • RTF
  • AsciiDoc
  • Markdown (gfm, hugo, etc.)

Example: GitHub Languages Analysis (PDF)

Presentations

Quarto supports a variety of formats for creating presentations, including:

  • revealjs — reveal.js (HTML)
  • pptx — PowerPoint (MS Office)
  • beamer — Beamer (LaTeX/PDF)

Revealjs supports many advanced features including speaker notes, animations, custom backgrounds, and printing to PDF.

Example: JunctionTrees.jl: Efficient Bayesian Inference in Discrete Graphical Models

Websites

  • Convenient way to publish groups of documents.

  • Arbitrary content depth / organization

  • Multi-level navigation (navbar / sidebar / hybrid)

  • Full text search (client side or Algolia)

Example: Julia Workshop for Data Science

Example: Quarto Website

Books

  • Inherit features of Quarto websites (navigation, search, etc.)

  • Support cross references across chapters

  • Formats include HTML, PDF, MS Word, and ePub

Example: Embrace Uncertainty: Mixed-effects models with Julia

Blogs

  • Inherit features of Quarto websites (navigation, search, etc.)

  • Automatically generated listing and RSS feed

  • Customizable about page

Example: Patrick Altmeyer’s Blog

Quarto and Julia

Prerequisites

Quarto executes Julia computations using its Jupyter engine (so Jupyter itself is a prereq):

pip3 install jupyter

The IJulia kernel is required, and Revise.jl recommended:

julia> using Pkg
julia> Pkg.add("IJulia")
julia> Pkg.add("Revise")

Basic Workflow

Rendering:

# plain text qmd
quarto render julia.qmd 
quarto render julia.qmd --to pdf

## ipynb notebook
quarto render julia.ipynb
quarto render julia.ipynb --to docx

Preview:

# plain text qmd
quarto preview julia.qmd 
quarto preview julia.qmd --to pdf

## ipynb notebook
quarto preview julia.ipynb
quarto preview julia.ipynb --to docx

IJulia

  • The IJulia Jupyter kernel is bound automatically when {julia} executable cells are present. A specific version of the kernel can be bound in document options:

    jupyter: julia-1.7
  • IJulia executes Julia code and transforms it to plain text, graphics, markdown, HTML, etc.

  • For interactive sessions, Quarto keeps the Jupyter kernel resident as a daemon to mitigate long startup time.

  • Revise.jl is (optionally) used to sync changes to dependent files/packages that occur while the daemon is running.

IJulia Figures

  • Quarto provides default figure sizes and formats (e.g. PNG vs. SVG vs. PDF) that are format dependent (e.g. HTML, PDF, MS Word, PowerPoint, etc.)

  • These are propagated to Julia in a setup cell that sets global defaults for some plotting packages (currently Plots and CairoMakie)

  • Better approach would be if Julia plotting packages could automatically detect when they are running in Quarto and set their defaults accordingly

Performance Considerations

  • Startup performance — How long does it take to load the interpreter + packages and how often do I need to do that?

  • Rendering performance — How expensive are my computations and how frequently do I need to run them?

Startup Performance

  • For interactive sessions, Jupyter kernels are kept alive to mitigate startup costs (hello.qmd example from earlier takes ~ 30 seconds on first run but less than 0.5 seconds on subsequent runs within the same session)

  • This is convenient but creates the problem of stale code (e.g. referenced files are changed, packages are updated, etc.). The solution to this is Revise.jl, a library that helps you keep your Julia sessions running longer, reducing the need to restart when you make changes to code.

Revise.jl

Install Revise.jl:

julia> Pkg.add("Revise")


Configure to launch automatically within IJulia:

.julia/config/startup_ijulia.jl

try
  @eval using Revise
catch e
  @warn "Revise init" exception=(e, catch_backtrace())
end

Rendering Performance

  1. Authoring in .ipynb enables you to control exactly when code execution occurs (and cache the results in the the notebook)

  2. Jupyter Cache provides transient caching of cell outputs for a document (if any of the cells in the document change then all of the cells will be re-executed).

  3. Quarto’s Freeze feature enables you to permanently save and re-use computational outputs.

Why IJulia?

  • Aligned with architectural investments we had already made for Jupyter (e.g. kernel daemonization, execution result caching)

  • IJulia.display function provides support for MIME outputs from Julia results (including raw LaTeX/HTML)

  • Ability to author in various popular notebook front ends (JupyterLab, VS Code, etc.)

Other literate programming systems (Pluto, Neptune, etc.) could certainly be integrated as an alternative to IJulia.

Tools: VS Code

https://marketplace.visualstudio.com/items?itemName=quarto.quarto

  • Render command with integrated preview pane
  • Syntax highlighting for markdown and embedded languages
  • Completion for embedded languages (e.g. Python, R, Julia, LaTeX, etc.)
  • Completion and diagnostics for project files and document/cell options
  • Commands and key-bindings for running cells and selected line(s)
  • Live preview for embedded mermaid and graphviz diagrams
  • Assist panel for contextual help, image preview, and math preview
  • Code snippets for common markdown constructs
  • Code folding and document outline for navigation within documents
  • Workspace symbol provider for navigation across project files

Tools: Jupyter Lab

Tools: Text Editors

Thank You!

Knuth, Donald E. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.
Wickham, Hadley. 2015. R Packages. 1st ed. O’Reilly Media, Inc.