Engineering 15 min

A library to help navigate Ecto Data models

October 27, 2021
André Albuquerque

Share

share to linkedInshare to Twittershare to Facebook
Link copied
to clipboard
Jump straight to a key chapter

After joining Remote and starting to find my way around our extensive data model, I quickly noticed how often I was repeating the Repo.preload/2 call to navigate through our data model.

The explicit preload strategy, instead of the automatic eager loading provided by other frameworks (like Rails' ActiveRecord, for example) serves as a deterrent to the unfortunately common problem of having N+1 queries popping up in Production.

With Ecto, you need to be deliberate upfront about whether you need the association loaded for you. Otherwise, you'll get this result when trying to reach out to the country.addresses.

bash
1#Ecto.Association.NotLoaded<association :addresses is not loaded> 

The following snippet shows that an association isn't loaded by default. Only after "preloading" it the association becomes "filled" with records coming from the DB.

elixir
1iex()> country = Repo.get(Country, 2)
2%EctoExplorer.Schemas.Country{...}
3iex()> country.addresses
4#Ecto.Association.NotLoaded<association :addresses is not loaded>
5iex()> country = Repo.preload(country, :addresses)
6%EctoExplorer.Schemas.Country{...}
7iex()> country.addresses
8[%EctoExplorer.Schemas.Address{...}, ...]

The explicitness of calling Repo.preload/2 to load an association is more than welcome in Production given the safety it brings to the table, but it becomes a nuisance when we are in a local IEx shell trying to smoothly navigate our data model, hopping from Ecto association to Ecto association:

elixir
1iex()> country = Repo.get(Country, 2)
2%EctoExplorer.Schemas.Country{
3  addresses: #Ecto.Association.NotLoaded<association :addresses is not loaded>,
4  currencies: #Ecto.Association.NotLoaded<association :currencies is not loaded>,
5  flag: #Ecto.Association.NotLoaded<association :flag is not loaded>,
6  code: "ECU"
7  id: 2,
8  name: "Ecuador",
9  ...
10}
11# 1st Repo.preload/2 for the country.flag
12iex()> country = Repo.preload(country, :flag)
13# ...
14iex()> country.flag
15%EctoExplorer.Schemas.Flag{
16  colors: "YBR",
17  country: #Ecto.Association.NotLoaded<association :country is not loaded>,
18  country_id: 2,
19  orientation: "horizontal"
20}
21# 2nd Repo.preload/2 for the country.currencies
22iex()> country = Repo.preload(country, :currencies)
23# ...
24iex()> country.currencies
25[
26  %EctoExplorer.Schemas.Currency{
27    code: "USD",
28    symbol: "$"
29  },
30  %EctoExplorer.Schemas.Currency{
31    code: "SUC",
32    symbol: "Suc"
33  }
34]
35# 3rd Repo.preload/2 for the country.addresses
36iex()> country = Repo.preload(country, :addresses)
37# ...
38iex()> country.addresses |> Enum.at(0)
39%EctoExplorer.Schemas.Address{
40  city: "city_ECU_1",
41  country_id: 2,
42  first_line: "first_line_ECU_1",
43  postal_code: "postal_code_ECU_1"
44  ...
45}

As you can see above, the pattern is always the same:

  • You got the struct record from the DB,

  • You now need to explore one of its associations

  • You need to resort to Repo.preload/2 before actually checking any of the association records

  • Rinse and repeat 🔃

I was getting really tired of the Repo.preload/2 dance, and it got to a point where I decided to scratch my own itch.

At first, I even considered a quick and dirty hack 🔨 that would consist of a vim macro that would write the Repo.preload/2 call for me 🙈 . This approach might have solved my pain, since I always have both an IEx shell running (as my REPL) and vim inside tmux, but I'm pretty sure this wouldn't be much useful for anyone else but me 😅

I started thinking about what I needed to streamline the Repo.preload/2 usage:

  • The end goal would be to be able to "chain" one or more association accesses and the Repo.preload/2 calls would happen behind the scenes automatically; this way we would avoid the dreaded look of the Ecto.Association.NotLoaded 🪦

  • Repo.preload/2 already let's us pass an association "chain" on the second argument (e.g. Repo.preload(flag, [:country, :addresses])), so the new approach needs to improve on what's already provided out of the box by Ecto;

  • If we consider the last example, a streamlined version of the preload/2 function like X(flag, country.addresses) would already be an improvement. Note that X would be the new function and the second argument is not a string. This improvement would already save some keypresses every day, but I would need to turn the country.addresses part into [country: :addresses]. Maybe some metaprogramming sprinkles would help here? 🤓

To metaprogram or not to metaprogram?

By now I was almost certain that my path implied the usage of metaprogramming, so I tried to figure out what I'd get if I had no restrictions on the amount of metaprogramming I'm willing to use. The goal is to chain Ecto association "hops", and since these hops are similar in spirit to map accesses, if we support expressions like flag.country.addresses, we would be keeping a known Elixir pattern.

For this to work though, I would need to somehow override the . (dot) access to cater to our specific needs. From what I gathered from the Elixir source, extending the dot access would not be easy since its implementation is really intertwined with the language "core".

If I can't use the dot access for the Ecto navigation, I'd like to have something as similar as possible to it. I know that Elixir has a lot of operators provided by the Kernel module that are macro-based (check, for example, the Kernel.), so I dived into the source of these macros to understand how they come to life 🕵️‍♂️

What I found is that most of these macros use a different macro syntax than what I'm used to see, but this is exactly what allows one to write a ||| macro that would behave like an operator and used like foo|||bar.

elixir
1iex()> defmodule Example do
2...()>   # "operator" macro syntax
3...()>   defmacro left ||| right do
4...()>     "#{left} AND #{right}"
5...()>   end
6...()>
7...()>   # instead of the "traditional" macro syntax
8...()>   defmacro my_operator(left, right) do
9...()>     "#{left} AND #{right}"
10...()>   end
11...()> end
12{:module, Example, <<70, 79, 82, ...>>, {:my_operator, 2}}
13iex()> import Example
14Example
15iex()> "foo"|||"bar" # note the usage of `|||` as an "operator"
16"foo AND bar"
17iex()> my_operator("foo", "bar")
18"foo AND bar"

💡 Note the defmacro left X right, do: ... way of defining the macro instead of the more conventional defmacro X(left, right), do: ... that is commonly seen and used.

I felt I was on the right track, I just needed to define an operator symbol for the Ecto navigation that would somehow convey the meaning of "navigating through Ecto associations". I settled on the ~> operator since the ~ looks like a wave 🌊 and the > points forward, hence it would be used to sail through a sea of Ecto navigations (cheesy I know 🙈 , but I didn't come up with a better mnemonic 😅 ).

If I define the macro now and we simply inspect what we get on both left and right parameters, you'll see that we get the quoted form of both params, as every regular macro does 🌈

elixir
1iex()> defmodule Example2 do
2...()>   defmacro left ~> right do
3...()>     IO.inspect(left, label: "Left")
4...()>     IO.inspect(right, label: "Right")
5...()>
6...()>     :ok
7...()>   end
8...()> end
9{:module, Example2, <<70, 79, 82, ...>>, {:~>, 2}}
10iex()> import Example2
11Example2
12iex()> foo~>a
13Left: {:foo, [line: 15], nil}
14Right: {:a, [line: 15], nil}
15:ok
16iex()> foo~>a.b
17Left: {:foo, [line: 16], nil}
18Right: {{:., [line: 16], [{:a, [line: 16], nil}, :b]}, [no_parens: true, line: 16], []}
19:ok
20iex()> foo~>a.b.c.d
21Left: {:foo, [line: 17], nil}
22Right: {{:., [line: 17],
23  [
24    {{:., [line: 17],
25      [
26        {{:., [line: 17], [{:a, [line: 17], nil}, :b]},
27         [no_parens: true, line: 17], []},
28        :c
29      ]}, [no_parens: true, line: 17], []},
30    :d
31  ]}, [no_parens: true, line: 17], []}
32:ok

For each of these expressions, we need to convert the quoted form of the navigation part (the right-hand side of the expression, after the ~>) to a list of steps.

We'll create this list of steps by traversing the quoted right parameter, using the Macro.postwalk/3 function (you can check the full code here, which is a bit more long due to the handling of indexes, ie., steps with an index like country.addresses[3]):

elixir
1@doc false
2def _steps(quoted_right) do
3  quoted_right
4  |> Macro.postwalk(%{visited: [], steps: []}, fn
5    # ...
6    {:., _, _} = node, acc ->
7      acc = accumulate_node(acc, node)
8      {node, acc}
9    {first_step, _, _} = node, acc when is_atom(first_step) ->
10      acc = accumulate_node(acc, node, %Step{key: first_step})
11      {node, acc}
12    step, acc when is_atom(step) ->
13      acc = accumulate_node(acc, step, %Step{key: step})
14      {step, acc}
15    # ...
16    node, acc ->
17      acc = accumulate_node(acc, node)
18      {node, acc}
19  end)
20end
21defp accumulate_node(%{visited: visited} = acc, node) do
22  %{acc | visited: [node | visited]}
23end
24defp accumulate_node(%{steps: steps} = acc, node, %Step{} = step) do
25  %{accumulate_node(acc, node) | steps: [step | steps]}
26end

In a nutshell, the postwalk logic visits each node of the right-hand side AST and accumulates the steps as a list of %Step{} structs. The _steps/1 function tests illustrate how the function works (you can find the tests here):

elixir
1test "makes steps for a basic right-hand side" do
2  rhs = quote do: foo
3  assert [%Step{key: :foo}] == Subject.steps(rhs)
4end
5test "makes steps for a 2-hop right-hand side" do
6  rhs = quote do: foo.bar
7  assert [%Step{key: :foo}, %Step{key: :bar}] == Subject.steps(rhs)
8end
9test "makes steps for a 5-hop right-hand side" do
10  rhs = quote do: foo.bar.baz.bin.yas
11  assert [
12           %Step{key: :foo},
13           %Step{key: :bar},
14           %Step{key: :baz},
15           %Step{key: :bin},
16           %Step{key: :yas}
17         ] == Subject.steps(rhs)
18end
19test "makes steps for a basic right-hand side with index" do
20  rhs = quote do: foo[99]
21  assert [%Step{key: :foo, index: 99}] == Subject.steps(rhs)
22end

By converting each step into its own %Step{} structure we are able to keep things tidy 🧹.

As you might have guessed by now, here's how the step list of the flag~>country.addresses expression looks like:

elixir
1iex()> rhs = quote do: country.addresses
2{{:., [], [{:country, [], Elixir}, :addresses]}, [no_parens: true], []}
3iex()> EctoExplorer.Resolver._steps(rhs)
4{^rhs,
5 %{
6   expected_index_steps: 0,
7   steps: [
8     %EctoExplorer.Resolver.Step{index: nil, key: :addresses},
9     %EctoExplorer.Resolver.Step{index: nil, key: :country}
10   ],
11   visited: [
12     {{:., [], [{:country, [], Elixir}, :addresses]}, [no_parens: true], []}, #4
13     {:., [], [{:country, [], Elixir}, :addresses]}, #3
14     :addresses, #2
15     {:country, [], Elixir} #1
16   ]
17 }}

By looking at the visited list from the bottom up, we can figure out what happened:

  1. Visited the country AST node (#1);

  2. Then visited the addresses AST node (#2);

  3. Then visited the :. (dot) navigation node (#3);

  4. And finally visited the full expression that matches exactly the rhs value (#4).

With the navigation steps turned into a pretty list of %Step{} structs, we now need to preload the "current" struct with the immediate next step of the list and we'll obtain a new "current" struct. This approach will be repeated until we don't have any more remaining steps.

elixir
1@doc false
2def _resolve(current, %Step{key: step_key, index: nil} = step) do
3  case Map.get(current, step_key) do
4    %Ecto.Association.NotLoaded{} ->
5      current = Preloader.preload(current, step_key)
6      _resolve(current, step)
7    nil ->
8      Logger.warn("[Current: #{inspect(current)}] Step '#{step_key}' resolved to `nil`")
9      nil
10    value ->
11      value
12  end
13end

As you can see above, we try to get the current.step_key value and if it's an association that isn't loaded yet, we call the Preloader.preload/2 that behind the scenes relies on the Ecto.Repo.preload/2 function to fetch the association. Otherwise we just return the current.step_key value.

At this stage the gist of the streamlined Ecto navigation is behind us. We just need to offer an easy way for everyone to use the ~> navigation operator on their own IEx shells.

The EctoExplorer module defines its own __using__/1 macro so that people can use it to have access to the ~> operator. Since the Ecto navigation needs an actual Ecto Repo to fetch the data from the DB, the repo option is required when using the EctoExplorer module:

elixir
1defmacro __using__(repo: repo) do
2
3  if Mix.env() not in [:dev, :test] do
4
5    IO.puts(
6
7      "You're using EctoExplorer on the `#{Mix.env()}` environment.\\nEctoExplorer isn't in any way optimized for Production usage, and forces the preload of each association. Use with care!"
8
9    )
10
11  end
12
13  {:ok, _pid} =
14
15    repo
16
17    |> Macro.expand(__ENV__)
18
19    |> maybe_start_repo_agent()
20
21  quote do
22
23    import unquote(__MODULE__)
24
25  end
26
27end

For the navigation logic to be able to use the repo, it needs to have access to it, hence we start an Agent process to keep it. Notice that we Macro.expand/2 the repo value since its value is quoted (💡 recall that any macro parameter is passed in its quoted form, so Foo.Repo would look like {:__aliases__, [alias: false], [:Foo, :Repo]}).

To use the ~> operator, is now just a matter of using the EctoExplorer module:

elixir
1iex()> use EctoExplorer, repo: EctoExplorer.Repo
2
3EctoExplorer
4
5iex()> f = Repo.get(Flag, 2)
6
7%EctoExplorer.Schemas.Flag{
8
9  __meta__: #Ecto.Schema.Metadata<:loaded, "flags">,
10
11  colors: "YBR",
12
13  country: #Ecto.Association.NotLoaded<association :country is not loaded>,
14
15  country_id: 2,
16
17  id: 2,
18
19  orientation: "horizontal"
20
21}
22
23iex()> f~>country.addresses[0].postal_code
24
2511:20:01.467 [debug] QUERY OK source="countries" db=0.0ms idle=27.6ms
26
27SELECT c0."id", c0."name", c0."code", c0."population", c0."id" FROM "countries" AS c0 WHERE (c0."id" = ?) [2]
28
2911:20:01.470 [debug] QUERY OK source="addresses" db=0.0ms idle=30.2ms
30
31SELECT a0."id", a0."first_line", a0."postal_code", a0."city", a0."country_id", a0."country_id" FROM "addresses" AS a0 WHERE (a0."country_id" = ?) ORDER BY a0."country_id" [2]
32
33"postal_code_ECU_1"

Note that the second f~>country.addresses[0].postal_code expression performed two queries for us, one to retrieve the country, and a second one to retrieve the country addresses association 🎉 .

You can find the library on Github, hope it's as useful to you as it is to me.

By now we walked through most of the EctoExplorer library, but so far we didn't talk about the context that made it possible 🗺️ . The pain with the bazillion of Repo.preload/2 calls, the initial thought process, the additional exploration, and the eventual solution all happened during my normal scope of work here at Remote.

As much as I like to dabble with Elixir metaprogramming, balancing my personal time (family, hobbies) with my daily work means that a library like could never be ready so quickly if it wasn’t for the incredible flexibility of Remote’s weekly learning time.

At Remote we are encouraged to invest two to four hours of our week to expand our knowledge and skills, and this isn't just a pretty bullet point in a slideshow presentation. Our team managers proactively encourage us to use and treasure this time. This learning window is something we truly believe makes us better as an organization. Every journey feels much better when you are allowed to grow each step of the way 🐾

This time I decided to scratch my Repo.preload/2 itch, let's see what comes next!

Subscribe to receive the latest
Remote blog posts and updates in your inbox.