We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Ecto fragments for complex queries
Deankinyua
Sometimes when writing Ecto queries, you might discover that the conventional approach doesn’t quite cut it. This is where fragments come in, which are a way to inject raw SQL into queries. Consider this example which illustrates the power of fragments:
import Ecto.Query
import Pgvector.Ecto.Query, only: [l2_distance: 2]
@spec get_related_questions(embedding(), id()) :: [question()]
def get_related_questions(embedding, id) do
UserQuestion
|> select([uq], %{
id: uq.id,
description: uq.description,
query: uq.query
})
|> where([uq], uq.id != ^id)
|> where([uq], fragment("? <-> ? >= ?", uq.embedding, ^embedding, 0.55555))
|> order_by([uq], asc: l2_distance(uq.embedding, ^embedding))
|> limit(6)
|> Repo.all()
end
Sidenote : l2_distance ( Euclidean distance ) is one of the metrics that we use to compare two embeddings in a vector database. If the Euclidean distance between two vectors is small, then this indicates greater similarity between them. Here we are using PostgreSQL as a vector database by means of the pgvector extension. More of that here.
With that out of the way, let me explain this step by step. The function name suggests that we are trying to get related questions (questions that are similar). We are going to extract only the id, description and query fields in a map instead of the whole UserQuestion struct. The real juice comes in the line:
|> where([uq], fragment("? <-> ? >= ?", uq.embedding, ^embedding, 0.55555))
Notice the ? signs. They just mean that in place of that sign put the value that I am going to give you. The weird <-> is used for calculating l2_distance via the pgvector extension. In other words you can say that
"uq.embedding <-> ^embedding >= 0.55555"
which means calculate the l2_distance between the embeddings in the database and the embedding that we give you and make sure to return only embeddings where the l2_distance is greater than or equal to 0.55555. You surely can’t write this with regular Ecto. The next segments just order the retrieved episodes by the l2_distance starting with the most similar to the least similar and limits them to just 6 questions. Use cases for fragments are rare but they are nice to know about.
For more examples on fragments, refer to this article.
Copy link
copied to clipboard