`File.stream!/1` processes large files with constant memory

almirsarajcic

3 months ago

2 comments

Copy link

File.read!/1 loads the entire file into memory. A 2GB log file? That’s 2GB of RAM. Use File.stream!/1 to process files line-by-line with constant memory usage, regardless of file size.

# Memory usage grows with file size
File.read!("huge_file.csv")                    # Loads entire file into memory
|> String.split("\n")                          # Doubles memory (original + split)
|> Enum.map(&parse_row/1)                      # Triples memory (adds parsed data)

# Constant memory regardless of file size
File.stream!("huge_file.csv")                  # Opens file handle, reads nothing
|> Stream.map(&String.trim/1)                  # Lazy - processes one line at a time
|> Stream.map(&parse_row/1)                    # Still lazy - no intermediate lists
|> Enum.to_list()                              # Only now does work happen

Streams are lazy. Nothing executes until you call an eager function like Enum.to_list/1 or Enum.each/1. Each line flows through the entire pipeline before the next line is read.

Processing in chunks for batch operations

"events.jsonl"
|> File.stream!()
|> Stream.map(&Jason.decode!/1)
|> Stream.chunk_every(1000)
|> Enum.each(fn batch ->
  Repo.insert_all(Event, batch)
end)

Stream.chunk_every/2 groups lines into batches—perfect for bulk database inserts or API calls with rate limits.

Reading binary files

# Read 64KB chunks instead of lines
File.stream!("video.mp4", [], 65_536)
|> Enum.reduce(0, fn chunk, acc -> acc + byte_size(chunk) end)

The third argument controls chunk size in bytes. Use this for binary files where line-based reading doesn’t make sense.

Comments (2)

Dean Kinyua

3 months ago

Nice post