Using obfuscated production data locally

almirsarajcic

almirsarajcic

5 months ago

You might have a bug in production that you can’t replicate with an empty database, or you might want to do load testing with greater amounts of data, so you want to use the production data locally or in CI.

That’s fine, but you have to be mindful of the way you handle sensitive customer data.

To take care of that, you can backup the production database, restore it locally and obfuscate it. Only then can you share it with other developers so you can track down the bug or load test the app.

Data sanitization can be easily accomplished in Elixir with help of library called Faker. Usually, I use it for testing. But creating fake data to replace real customer data is another great scenario.

Depending on how what kind of data you keep, you these might just be names and email addresses, but in some cases, you might want to replace geo and IP data that could reveal customer’s location.

Simple example:

import Ecto.Query, only: [from: 1, from: 2]

from(u in "users", select: [:id])
|> Repo.stream()
|> Stream.each(fn %{id: id} ->
  obfuscated = [name: Faker.Person.first_name() <> " " <> Faker.Person.last_name()]

  from(u in "users",
    update: [set: ^obfuscated],
    where: u.id == ^id
  )
  |> Repo.update_all([])
end)
|> Stream.run()

Real-world example:

require Logger

import Ecto.Query, only: [from: 1, from: 2]

alias Data.Repo

# Make sure job queues are empty
from(oj in "oban_jobs")
|> Repo.delete_all([])

fake_first_name = fn _value -> Faker.Person.first_name() end
fake_last_name = fn _value -> Faker.Person.last_name() end
fake_email = fn _value -> Faker.Internet.email() end
fake_school = fn _value -> "University of " <> Faker.Address.city() end

fake_message = fn value ->
  words_count =
    value
    |> String.split()
    |> length()

  Faker.Lorem.sentence(words_count)
end

obfuscated_or_nil = fn value, faker_fn ->
  if is_nil(value) or String.trim(value) == "" do
    value
  else
    faker_fn.(value)
  end
end

obfuscate_table = fn table_name, columns, fields ->
  from(r in table_name, select: ^columns)
  |> Repo.stream()
  |> Stream.each(fn %{id: id} = record ->
    values = fields.(record)

    try do
      from(r in table_name,
        update: [set: ^values],
        where: r.id == ^id
      )
      |> Repo.update_all([])
    rescue
      e ->
        Logger.error("message: #{Exception.message(e)}, table_name: #{table_name}, values: #{inspect(values)}")
        reraise e, __STACKTRACE__
    end
  end)
  |> Stream.run()
end

messages = fn ->
  obfuscate_table.("messages", [:id, :text], fn %{text: text} ->
    [text: obfuscated_or_nil.(text, fake_message)]
  end)
end

users = fn ->
  obfuscate_table.(
    "users",
    [:id, :email, :first_name, :last_name, :school],
    fn %{
         email: email,
         first_name: first_name,
         last_name: last_name,
         school: school
       } ->
      [
        email: obfuscated_or_nil.(email, fake_email),
        first_name: obfuscated_or_nil.(first_name, fake_first_name),
        last_name: obfuscated_or_nil.(last_name, fake_last_name),
        school: obfuscated_or_nil.(school, fake_school)
      ]
    end
  )
end

[
  # ...
  messages,
  users
]
|> Task.async_stream(
  fn transformation ->
    Repo.transaction(transformation, timeout: :infinity)
  end,
  max_concurrency: min(5, System.schedulers_online()),
  ordered: false,
  timeout: 300_000
)
|> Stream.run()

You can have something like this in priv/obfuscate_prod_data.exs so you can run it with:

mix run priv/obfuscate_prod_data.exs

Before that, you’ll need to restore the DB backup:

mix ecto.drop && mix ecto.create && pg_restore --no-owner -d app_dev app_*.dump

If you don’t know yet how to back up the production DB, check out this drop: https://elixirdrops.net/d/o1JHzjrs.

Finally, be careful when running the script so that you don’t run any background jobs or trigger sending emails, messages, or notifications to real customers.

If you need help with this or anything else Elixir-related, reach out to us at projects@optimum.ba.