We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Using obfuscated production data locally
almirsarajcic
You might have a bug in production that you can’t replicate with an empty database, or you might want to do load testing with greater amounts of data, so you want to use the production data locally or in CI.
That’s fine, but you have to be mindful of the way you handle sensitive customer data.
To take care of that, you can backup the production database, restore it locally and obfuscate it. Only then can you share it with other developers so you can track down the bug or load test the app.
Data sanitization can be easily accomplished in Elixir with help of library called Faker. Usually, I use it for testing. But creating fake data to replace real customer data is another great scenario.
Depending on how what kind of data you keep, you these might just be names and email addresses, but in some cases, you might want to replace geo and IP data that could reveal customer’s location.
Simple example:
import Ecto.Query, only: [from: 1, from: 2]
from(u in "users", select: [:id])
|> Repo.stream()
|> Stream.each(fn %{id: id} ->
obfuscated = [name: Faker.Person.first_name() <> " " <> Faker.Person.last_name()]
from(u in "users",
update: [set: ^obfuscated],
where: u.id == ^id
)
|> Repo.update_all([])
end)
|> Stream.run()
Real-world example:
require Logger
import Ecto.Query, only: [from: 1, from: 2]
alias Data.Repo
# Make sure job queues are empty
from(oj in "oban_jobs")
|> Repo.delete_all([])
fake_first_name = fn _value -> Faker.Person.first_name() end
fake_last_name = fn _value -> Faker.Person.last_name() end
fake_email = fn _value -> Faker.Internet.email() end
fake_school = fn _value -> "University of " <> Faker.Address.city() end
fake_message = fn value ->
words_count =
value
|> String.split()
|> length()
Faker.Lorem.sentence(words_count)
end
obfuscated_or_nil = fn value, faker_fn ->
if is_nil(value) or String.trim(value) == "" do
value
else
faker_fn.(value)
end
end
obfuscate_table = fn table_name, columns, fields ->
from(r in table_name, select: ^columns)
|> Repo.stream()
|> Stream.each(fn %{id: id} = record ->
values = fields.(record)
try do
from(r in table_name,
update: [set: ^values],
where: r.id == ^id
)
|> Repo.update_all([])
rescue
e ->
Logger.error("message: #{Exception.message(e)}, table_name: #{table_name}, values: #{inspect(values)}")
reraise e, __STACKTRACE__
end
end)
|> Stream.run()
end
messages = fn ->
obfuscate_table.("messages", [:id, :text], fn %{text: text} ->
[text: obfuscated_or_nil.(text, fake_message)]
end)
end
users = fn ->
obfuscate_table.(
"users",
[:id, :email, :first_name, :last_name, :school],
fn %{
email: email,
first_name: first_name,
last_name: last_name,
school: school
} ->
[
email: obfuscated_or_nil.(email, fake_email),
first_name: obfuscated_or_nil.(first_name, fake_first_name),
last_name: obfuscated_or_nil.(last_name, fake_last_name),
school: obfuscated_or_nil.(school, fake_school)
]
end
)
end
[
# ...
messages,
users
]
|> Task.async_stream(
fn transformation ->
Repo.transaction(transformation, timeout: :infinity)
end,
max_concurrency: min(5, System.schedulers_online()),
ordered: false,
timeout: 300_000
)
|> Stream.run()
You can have something like this in priv/obfuscate_prod_data.exs
so you can run it with:
mix run priv/obfuscate_prod_data.exs
Before that, you’ll need to restore the DB backup:
mix ecto.drop && mix ecto.create && pg_restore --no-owner -d app_dev app_*.dump
If you don’t know yet how to back up the production DB, check out this drop: https://elixirdrops.net/d/o1JHzjrs.
Finally, be careful when running the script so that you don’t run any background jobs or trigger sending emails, messages, or notifications to real customers.
If you need help with this or anything else Elixir-related, reach out to us at projects@optimum.ba.
Copy link
copied to clipboard