Engineering

Ecto.Query.preload/3 vs. Ecto.Repo.preload/3

When I submitted a PR, a friend asked:
Why are you using “Repo.preload/3” and not “Query.preload/3?
And then the second one pitched in as he saw preload as the first function call in one of my longer queries: Why are you preloading for all of the records?

It got me thinking about which solution is better. Let’s find out what the difference is between Ecto.Query.preload/3 and Ecto.Repo.preload/3.

TL;DR

For those in a rush here you can get to know the essence of my findings.

When using either of preload/3 for it's default job to load associations for every record, like:

Demo.Blog.Post |> Ecto.Query.preload(:comments) |> Demo.Repo.all()
# or
Demo.Blog.Post |> Demo.Repo.all() |> Demo.Repo.preload(:comments)

It doesn’t matter which one we use. These generate the same queries. Looking at time efficiency, they are comparable.

Nevertheless, there are areas in which one is better than the other. Like, Ecto.Query.preload/3 has the ability to limit the number of queries by using join and Ecto.Repo.preload/3 can limit the number of records fetched by not loading again already present associations.

How did I get to these conclusions? Let’s read ahead and find out together.

hexdocs.pm

My first point of interest was documentation pages. I wanted to verify whether official documentation will provide some answers right away.

Ecto.Query.preload/3 \ Preloads the associations into the result set.

Ecto.Repo.preload/3 \ Preloads all associations on the given struct or structs. This is similar to Ecto.Query.preload/3 \ except it allows you to preload structs after they have been fetched from the database. \ In case the association was already loaded, preload won't attempt to reload it.

After reading this, I felt a bit disappointed. These short descriptions provided only part of the information I was after. They covered what I have already known - you can preload associations in two different places:

  1. while fetching records,
  2. when they are already present.

I wasn’t satisfied with this. I wanted to know more, so I ditched documentation for now and went ahead to step number two.

The code

As described in the docs, both preload/3 work the same way fetching associations. They generate separate query (or queries, depending on how many associations we want to fetch) to get all needed records.

defmodule Demo.Blog.Posts do
  use Ecto.Schema
  import Ecto.Changeset

  schema "posts" do
    field :content, :string
    has_many(:comments, Demo.Blog.Comments, foreign_key: :post_id)

    timestamps()
  end
end

defmodule Demo.Blog.Comments do
  use Ecto.Schema
  import Ecto.Changeset

  schema "comments" do
    field :content, :string
    belongs_to(:post, Demo.Blog.Posts)

    timestamps()
  end
end

So I have created simple relations, put some records in DB and started querying to get real answers for questions that scratched at the back of my head.

Are queries generated by Query.preload/3 and Repo.preload/3 different?

# Demo.Blog.Posts |> preload(:comments) |> Demo.Repo.all()
SELECT p0."id", p0."content", p0."inserted_at", p0."updated_at" FROM "posts" AS p0 []
SELECT c0."id", c0."content", c0."post_id", c0."inserted_at", c0."updated_at", c0."post_id" FROM "comments" AS c0 WHERE (c0."post_id" = ANY($1)) ORDER BY c0."post_id" [[2, 1]]
# Demo.Blog.Posts |> Demo.Repo.all() |> Demo.Repo.preload(:comments)
SELECT p0."id", p0."content", p0."inserted_at", p0."updated_at" FROM "posts" AS p0 []
SELECT c0."id", c0."content", c0."post_id", c0."inserted_at", c0."updated_at", c0."post_id" FROM "comments" AS c0 WHERE (c0."post_id" = ANY($1)) ORDER BY c0."post_id" [[2, 1]]

Aaaand they look the same, not what I had expected, but good to know. Next question...

Does Query.preload/3 preloads for all associations?

# Demo.Blog.Posts |> where(content: "Post text etc") |> preload(:comments) |> Demo.Repo.all()
[debug] QUERY OK source="posts" db=4.0ms queue=6.2ms idle=1766.6ms
SELECT p0."id", p0."content", p0."inserted_at", p0."updated_at" FROM "posts" AS p0 WHERE (p0."content" = 'Post text etc') []
[debug] QUERY OK source="comments" db=1.1ms queue=2.0ms idle=1777.1ms
SELECT c0."id", c0."content", c0."post_id", c0."inserted_at", c0."updated_at", c0."post_id" FROM "comments" AS c0 WHERE (c0."post_id" = $1) ORDER BY c0."post_id" [1]

Yes! Finally, some answers I was looking for! When we preload from a query, we only do so for the records that were left after filtering. This is further confirmed by the docs (but in my opinion, it’s not said explicitly enough)

... fetch (...) from the database and then do a separate query returning all (...) associated ...

Can I use queries when preloading data?

Most queries that require preload use what I like to call a default form. Which is a passing list of atoms such as:

(...)
|> preload([:comments, :author])
(...)

or

(...)
|> preload([:comments, author: [:friends]])
(...)

Surprisingly you can also use queries inside preload statements to limit, order or filter preloaded records. This will still produce two queries, but you gain more control over associations and what is important here, both Ecto.Query and Ecto.Repo functions support this. Some of use cases, for example, Post -> Comment associations include:

  • limiting number of comments fetched for a single post

    comments_query = Comments |> limit(5)
    Posts |> Ecto.Query.preload(comments_query) |> Repo.all()
    
  • ordering comments e.g. by date of creation

     comments_query = Comments |> order_by(:inserted_at)
    Posts |> Repo.all() |> Repo.preload(comments_query)
    

What we do need to watch out for is that such queries are working on a whole subset of data we are fetching, that means that limit/2 in the example above will fetch and associate only 5 comments for all of the posts, not 5 comments for each post.

Is there a way to limit number of queries when preloading?

Short answer: yes, but only for Ecto.Query.preload/3, we can use join statements to select and filter our associations \ in the same query. The example below allows for fetching posts and associated comments with single query.

   # Demo.Repo.all(
   # from p in Demo.Blog.Posts,
   # join: c in assoc(p, :comments),
   # where: c.inserted_at > p.updated_at,
   # preload: [comments: c]
   # )
[debug] QUERY OK source="posts" db=13.6ms queue=2.0ms idle=578.7ms
SELECT p0."id", p0."content", p0."inserted_at", p0."updated_at", c1."id", c1."content", c1."post_id", c1."inserted_at", c1."updated_at" FROM "posts" AS p0 INNER JOIN "comments" AS c1 ON c1."post_id" = p0."id" WHERE (c1."inserted_at" > p0."updated_at") []````

So what is better? Right now, I can only answer: it depends. As in almost everything we do, it’s a case by case question we need to answer to achieve the best results.