ASTs, Markdown and MDX

by Leigh Halliday

Published: February 27, 2020 7 min read Web 0 Comments

Markdown for documents, React for interaction, MDX for both! But how do Markdown and MDX arrive at HTML and JSX? The answer is Abstract Syntax Trees.

Markdown is the perfect format for writing documents, documentation, blog posts, static content, and more. React on the other hand is great for building interactive interfaces. That said, have you ever tried writing a blog post in React/HTML? There's a reason Markdown exists! But what if you want to add some interactive elements to a Markdown document? Maybe an embedded YouTube video or maybe a chart that pulls in some dynamic data? Or maybe a form to collect some contact information on a sales page?

MDX gives you the best of both worlds. Write your documents in Markdown, but feel free to import and use React components right there inside of your document. Beautiful.

In this article we're going to go beyond surface level and dive into some of the inner workings of Markdown and MDX. How does a file with Markdown get converted into HTML, and how does MDX get converted into JSX?

We are going to explore Abstract Syntax Trees (AST) and what Markdown and MDX have to do with them. The code samples in this article can be found here.

MDX Real-World Usage (A Warning)

The examples in this article are meant to provide a glimpse of what MDX is doing behind the scenes and what ASTs are like and used for. If you'd like to use MDX in Gatsby, Next.js, or Create React App, the MDX website provides examples and documentation on how to easily use it within your app.

Syntax Trees

The ability to view code as data - rather than simply some text in a file - opens up a world of possibilities. Take Prettier for example. It is able to take some poorly formatted JavaScript or Markdown and give you something nicely formatted in return. You may think the conversion goes from ugly Markdown directly to formatted Markdown, but the key to this process is the intermediary step, a data structure called an Abstract Syntax Tree (AST).

Think of what you can produce with a Markdown file. Yes, you can produce HTML, but you can also produce formatted Markdown (like what Prettier does), or it can be checked for linter errors, display how many words are in it, among other things.

Markdown -> AST -> HTML
Markdown -> AST -> Formatted Markdown
Markdown -> AST -> Lint Errors
Markdown -> AST -> Word Counts

It is with ASTs that MDX is able to combine Markdown and React so beautifuly together.

Abstract Syntax Trees in Action

To see ASTs in action, let's look at this small Markdown example with a Level 1 Heading and a Paragraph:

# Welcome

A paragraph.

If we process this markdown with unified along with the remark-parse plugin, we'll take the Markdown input and end up with an AST which represents the Markdown.

import unified from "unified";
import markdown from "remark-parse";

const input = `
# Welcome

A paragraph.
`;

const tree = unified()
  .use(markdown)
  .parse(input);

If you do this yourself, you'll see all sorts of data about the position and line of the characters, but I have stripped this out to make it a bit more digestible. Each node (an object) in this tree contains a number of properties:

type: What data type is this node? Heading, Paragraph, Emphasis, Strong, etc.
children: Nested nodes contained within the current one. Imagine an Image inside of a Link, or a Link within a Paragraph
depth: Used to differentiate Level 1, 2, 3 Headings (h1, h2, h3)
value: Text nodes have a value attribute which contain their actual text value

{
  "type": "root",
  "children": [
    {
      "type": "heading",
      "depth": 1,
      "children": [
        {
          "type": "text",
          "value": "Welcome"
        }
      ]
    },
    {
      "type": "paragraph",
      "children": [
        {
          "type": "text",
          "value": "A paragraph."
        }
      ]
    }
  ]
}

Using the AST for Calculations

We can process the AST to count how many of each type we see (recursive function alert):

function counts(acc, node) {
  // add 1 to an initial or existing value
  acc[node.type] = (acc[node.type] || 0) + 1;

  // find and add up the counts from all of this node's children
  return (node.children || []).reduce(
    (childAcc, childNode) => counts(childAcc, childNode),
    acc
  );
}

Which, depending on your input, produces something like:

{
  "root": 1,
  "heading": 1,
  "text": 7,
  "paragraph": 3,
  "strong": 1,
  "emphasis": 1
}

Counting Words with the AST

The word count tool I'm using in VS Code right now counts ## Welcome as 2 words, when we can really see that it is only a single word which happens to be in an h2 tag. Using an AST we can provide a more accurate word count by only counting the text values.

import unified from "unified";
import markdown from "remark-parse";

function wordCount(count, node) {
  if (node.type === "text") {
    return count + node.value.split(" ").length;
  } else {
    return (node.children || []).reduce(
      (childCount, childNode) => wordCount(childCount, childNode),
      count
    );
  }
}

// Our markdown input
const input = `## Welcome`;

// Convert markdown into an AST
const tree = unified()
  .use(markdown)
  .parse(input);

// Extract Word Count from AST
const words = wordCount(0, tree);

Visualizing the AST

With this AST we can also create a React component called Node which renders it and its children (using padding to display its tree like structure):

const Node = ({ node }) => (
  <div style={{ paddingLeft: `15px` }}>
    <strong>
      {node.type}
      {node.depth && <span> (d{node.depth})</span>}
    </strong>

    {node.value && <div style={{ paddingLeft: "15px" }}>{node.value}</div>}

    {/* Render additional Nodes for each child */}
    {node.children &&
      node.children.map(child => {
        const { line, column, offset } = child.position.start;
        return <Node key={`${line}-${column}-${offset}`} node={child} />;
      })}
  </div>
);

This output allows us to see how the tree is structured and indented:

root
  heading (d1)
    text
      Welcome
  paragraph
    text
      A paragraph.

MDX

If you came here for MDX and not Markdown, you're in luck! We're now going to transition into exploring how MDX works and how it is related to the Markdown examples shown above.

AST Explorer

For all the visual learners, there is a great website called AST Explorer which allows you to visualize the AST produced by a number of different input formats such as Markdown and MDX. We're going to be diving into MDX a bit further now, so let's take a look at the AST produced by an MDX file.

MDAST, HAST, MDXAST, MDXHAST... What??

That's a lot of acronyms! But what do they mean and what does this have to do with Markdown and MDX? In order to convert Markdown into an AST, we need a specification, or a set of rules to follow so we know what types of Nodes are available (heading, paragraph, link, etc.) and what properties they might have (type, children, value).

This set of rules for Markdown is called mdast. Similarly, there are other sets of rules for dealing with HTML, called hast. With both specifications, someone could write code that converts a Markdown AST (mdast) into an HTML AST (hast), which is exactly what remark-rehype does.

MDX is a superset of Markdown, meaning that everything you can do in Markdown you can also do in MDX, plus three additional features, which are:

jsx (replacing html)
import statements
export statements

This specification is called MDXAST.

Compiling MDX into an AST

Unless you are developing a plugin for MDX, you probably won't need to deal directly with the MDX AST, but since this article is about learning, let's write some code which produces an AST.

const { createMdxAstCompiler } = require("@mdx-js/mdx");

// A "unified" compiler
const compiler = createMdxAstCompiler({ remarkPlugins: [] });
const input = `
import YouTube from "./YouTube";

# Welcome

<YouTube id="123" />
`;

const ast = compiler.parse(input);
const astString = JSON.stringify(ast, null, 2);
console.log(astString);

After we strip out some of the position data, the AST ends up looking like the data below. Notice that we are seeing two of the custom MDX node types: import and jsx.

{
  "type": "root",
  "children": [
    {
      "type": "import",
      "value": "import YouTube from \"./YouTube\";"
    },
    {
      "type": "heading",
      "depth": 1,
      "children": [
        {
          "type": "text",
          "value": "Welcome"
        }
      ]
    },
    {
      "type": "jsx",
      "value": "<YouTube id=\"123\" />"
    }
  ]
}

Compiling MDX into JSX

What we really want MDX to do is to produce JSX, not an AST. This code is similar to the previous example which produced an AST, but we're adding on the utility function mdxHastToJsx which takes the AST from the previous step and produces JSX.

const { createMdxAstCompiler } = require("@mdx-js/mdx");
const mdxHastToJsx = require("@mdx-js/mdx/mdx-hast-to-jsx");

const input = `
import YouTube from "./YouTube";

# Welcome

<YouTube id="123" />
`;

const compiler = createMdxAstCompiler({ remarkPlugins: [] }).use(mdxHastToJsx);
const jsx = compiler.processSync(input).toString();
console.log(jsx);

What is produced is valid JSX, which looks like:

import YouTube from "./YouTube";

const layoutProps = {};
const MDXLayout = "wrapper";
export default function MDXContent({ components, ...props }) {
  return (
    <MDXLayout
      {...layoutProps}
      {...props}
      components={components}
      mdxType="MDXLayout"
    >
      <h1>{`Welcome`}</h1>
      <YouTube id="123" mdxType="YouTube" />
    </MDXLayout>
  );
}

Conclusion

I hope you've enjoyed learning about ASTs and the role they play with Markdown and MDX. With ASTs we're able to process and tweak our code on its way to the desired result. It could be as simple as counting how many words are in a Markdown document, or as complex as Prettier or Babel. They open the door to a number of possibilities, which may have at one point seemed like a far-fetched idea. Take MDX itself for example. It was just an idea that a few people had, and with the help of ASTs and some hard work by some smart people, became a reality.

JavaScript, React, Tutorial

About the Author

Leigh Halliday

Leigh Halliday is a full-stack developer specializing in React and Ruby on Rails. He works for FlipGive, writes on his blog, and regularly posts coding tutorials on YouTube.

Comments

Comments are disabled in preview mode.

All articles

Topics

Web Mobile Desktop Design Productivity People

Latest Stories
in Your Inbox

Subscribe to be the first to get our expert-written articles and tutorials for developers!

All fields are required

Country/Territory

Blog