Elixir Abstract Syntax

As developers know, program source code is represented as lines of text.

The Abstract Syntax Tree (AST) is the representation of the source code as a hierarchical data graph, specifically a tree structure. With an AST, much of the difficult work parsing the original source code has been performed, and the syntax can be introspected programatically.

Similar to an AST, the Abstract Semantic Graph (ASG) is a graph of the semantic representation of the source code. The ASG goes one step further than the AST by representing semantic information.

When Erlang or Elixir source files are compiled, each module is converted to an Abstract Semantic Graph and saved to a file. This file is called a BEAM file, and it has a .beam extension.

Elixir provides a way to extract either the AST or ASG from source code. This information is used by tools such as Formatter and Dialyzer for the benefit of developers.

We’ll walk through two techniques for extracting information from program code.

The Elixir AST

The function Code.string_to_quoted!/1 converts Elixir source code into Elixir Abstract Syntax Tree (AST).

iex(1)> Code.string_to_quoted!("2 + 3")
{:+, [line: 1], [2, 3]}

string_to_quoted!/1 (and its sibling string_to_quoted/1) know that the above bit of source code is an operation on two operands. It represents the plus sign as an atom (:+), and it represents the two operands as a list ([2, 3]).

The Elixir AST typically contains three-element tuples like the one above. The first element is an operation or data type. The second element is metadata about the operation (e.g., source code line number), and the third element is the arguments of the operation, or in the case of a data type, the data.

Let’s try an example on a function call:

iex> Code.string_to_quoted!("f()")
{:f, [line: 1], []}

The above represents a function call with an operand :f. In actuality, the AST is not sure it’s actually a function call. It just knows that the expression is “call-like”, that it takes zero arguments.

iex> Code.string_to_quoted!("v")
{:v, [line: 1], nil}

Here, the AST gives nil to the arguments list, meaning arguments don’t apply. The AST representation does’t actually know whether it’s a call or a variable. In Elixir parentheses are optional for a function call, so it could be either.

We’ll do one more:

iex> Code.string_to_quoted!("%{a: \"a\"}")
{:%{}, [line: 1], [a: "a"]}

The above shows what the AST looks like for a map literal.

The BEAM File

When Elixir (or Erlang) compiles a module, it creates a .beam file that stores the compiled module. If code is compiled using mix, the .beam files can be found in _build/**/lib/**/ebin/*.beam.

The .beam file can be created more directly, using elixirc. This puts the .beam file in the current directory. We will use elixirc for the purposes of this article.

We’ll start with a sample module:

defmodule MyModule do
  def addition do
    2 + 3
  end

  def an_atom do
    :hello
  end

  def a_call do
    value = addition() + 1
    value + 4
  end
end

Save the above file to my_module.ex, then run the following:

$ elixirc my_module.ex
$ file Elixir.MyModule.beam
Elixir.MyModule.beam: Erlang BEAM file

Erlang provides the beam_lib library and its chunks/2 function for reading the .beam file:

iex> :beam_lib.chunks('Elixir.MyModule.beam', [:abstract_code])
{:ok, {MyModule, [
         abstract_code: {:raw_abstract_v1, [
                           {:attribute, 1, :file, {'my_module.ex', 1}},
                           {:attribute, 1, :module, MyModule},
                           {:attribute, 1, :compile, :no_auto_import},
                           {:attribute, 1, :export, [
                              __info__: 1,
                              a_call: 0,
                              addition: 0,
                              an_atom: 0
                            ]},
                           ...
                           ]}]}}

The first argument to :beam_lib.chunks/2 is the .beam file path. Note the single quotes; it’s a charlist, not a string.

The second argument is a list of “chunk types” to extract from the .beam file. The full list of available chunk types can be found in the Erlang Source Code.

The bulk of the return data is a list of tuples. Some of the tuples contain the atom :attributes as the first element, and the others have :function as the first element.

The tuples having :function represent the functions of the module. The third element in the tuple is the function name.

You’ll notice a large function called :__info__ that is automatically added to all Elixir modules.

{:function, 0, :__info__, 1, [...]}

The last three functions are the ones defined in the module’s source code.

[
  ...
  {:function, 10, :a_call, 0, [...]},
  {:function, 2, :addition, 0, [...]},
  {:function, 6, :an_atom, 0, [...]}
]

Digging into the :addition function representation, we can see the semantic representation of the simple addition operation, 2 + 3:

{
  :op,
  3,
  :+,
  {:integer, 0, 2},
  {:integer, 0, 3}
}

If the goal is to get the list of functions defined by the module, a small filter and map is all it takes:

{:ok, {module, [abstract_code: {:raw_abstract_v1, attributes}]}} =
  :beam_lib.chunks('Elixir.MyModule.beam', [:abstract_code])

attributes
|> Enum.filter(&(elem(&1, 0) == :function))
|> Enum.map(fn {_, _, name, arity, _} -> {name, arity} end)

# Returns: [__info__: 1, bar: 0, foo: 0]

Conclusion

We’ve learned how to introspect Elixir source code by extracting the AST and ASG. It is my hope that this information will help you build the next great developer tool for Elixir.