Executing OS processes using Elixir Ports

Deankinyua

Deankinyua

5 hours ago

Ports in Elixir provide us an interface for communicating with external processes by sending and receiving messages. They make it easy to start and manage OS processes. You might have some bit of functionality that would be too complex to be implemented in Elixir. Ports will help you execute that logic and receive a response back.

To create or open a port we use the Port.open/2 function :

Port.open({:spawn, cmd}, [:binary])
# without :binary, the data would be returned in bytes

The first argument is always a tuple and it directs the OS on what to do. The second argument contains a list of options. For example when the tuple is :

  • {:spawn, cmd} - cmd is a string with the first word being an executable. The OS will search in PATH if an executable with the same name exists. You can use this option directly with arguments e.g
Port.open({:spawn, "echo hello world"}, [:binary])
  • {:spawn_executable, filename} - This can be used when the executable has spaces in its name. It does not search in PATH and you are expected to provide a full file path. Arguments to the program are passed separately. For example:
Port.open({:spawn_executable, "/bin/echo"}, [:binary, args: ["hello world"]])

Please review this to understand more.

It is of course imperative to mention that the process that opened the port becomes the Port’s owner. This means that any output generated by the external program started by the port will be sent to that owner process.

Enough talk, let’s get to the code :)

  use GenServer

  alias SkepticBot.YtDlp.Scraper

  @spec start_link(any()) :: GenServer.on_start()
  def start_link(_opts) do
    GenServer.start_link(__MODULE__, nil, name: __MODULE__)
  end

  @impl GenServer
  def init(_state) do
    {:ok, %{}}
  end

  @spec request_episodes :: :ok
  def request_episodes do
    cmd =
      "yt-dlp --cache-dir /tmp/yt-cache --dateafter 20240609 --cookies #{get_cookie_file()} --print \"%(title)s~~%(duration)s~~%(thumbnail)s~~%(webpage_url)s\" https://www.youtube.com/@RealCandaceO"

    GenServer.cast(__MODULE__, {:message, cmd})
  end

  @impl GenServer
  def handle_cast({:message, cmd}, state) do
    port = Port.open({:spawn, cmd}, [:binary, :stderr_to_stdout, :exit_status])

    state =
      state
      |> Map.put(:port, port)
      |> Map.put(:episodes, [])

    {:noreply, state}
  end

  @impl GenServer
  def handle_info({port, {:data, msg}}, state) do
    episode = process_message(msg)
    episodes = [episode | state.episodes]
    new_state = Map.put(state, :episodes, episodes)
    process_episodes(port, episodes, episode)
    {:noreply, new_state}
  end

  @impl GenServer
  def handle_info({port, {:exit_status, _status}}, state) do
    Port.close(port)
    {:noreply, Map.delete(state, :port)}
  end

  defp get_cookie_file do
    Path.join([:code.priv_dir(:skeptic_bot), "/cookies/normal_cookies.txt"])
  end

A GenServer process would be sufficient to serve as our owner process. Here I am executing yt-dlp, a program that is used to fetch data and also download videos from various sources including YouTube. Note that this could easily be executed using System.cmd/3 which happens to rely on ports behind the scenes.

This time we opened the port using some different options:

port = Port.open({:spawn, cmd}, [:binary, :stderr_to_stdout, :exit_status])

The option :stderr_to_stdout, is used to redirect errors to an output file while :exit_status is important because it sends a message to the owner process if the external process connected to the port exits. This message will allow us to close the port because the external process is essentially dead and there is no use for the port :

  @impl GenServer
  def handle_info({port, {:exit_status, _status}}, state) do
    Port.close(port)
    {:noreply, Map.delete(state, :port)}
  end

Each time the external process produces an output/data, the port will send data to its owner process in the form of {port, {:data, data}} hence why we match the messages like this :

  @impl GenServer
  def handle_info({port, {:data, msg}}, state) do
     ...
    {:noreply, new_state}
  end

Please consider reading about how you would handle zombie processes