Running multiple LLMs in Elixir as part of multiple FLAME pools

almirsarajcic

almirsarajcic

Created 1 month ago

Did you know you could run multiple FLAME pools, each running a different set of processes?

I wanted to run two FLAME pools with one machine each, each running a different LLM model. I achieved that by passing an environment variable to FLAME children, specifying which runner it is.

# config/runtime.exs
config :my_app, :flame_runner, System.get_env("FLAME_RUNNER")

# lib/my_app/application.ex
children =
  children ++
    case Application.get_env(:my_app, :flame_runner) do
      nil ->
        [
          MyAppWeb.Endpoint,
          {Oban, Application.fetch_env!(:my_app, Oban)},
          {FLAME.Pool,
           name: MyApp.EmbeddingRunner,
           backend: {FLAME.FlyBackend, env: %{"FLAME_RUNNER" => "embedding"}}},
          {FLAME.Pool,
           name: MyApp.TranscriptionRunner,
           backend: {FLAME.FlyBackend, env: %{"FLAME_RUNNER" => "transcription"}}}
        ]

      "embedding" ->
        [{Nx.Serving, name: MyApp.Embedding, serving: MyApp.Embedding.serving()}]

      "transcription" ->
        [{Nx.Serving, name: MyApp.Transcription, serving: MyApp.Transcription.serving()}]
    end

Obviously, the config is a bit simplified and works only in prod, so you’d have to move the configuration to the appropriate config file to handle LocalBackend in dev. In our app, we also specify different hardware for each model serving and general FLAME configuration.

Besides this, another use case that comes to my mind is running your app as a fleet of microservices, each with a specific job, without dealing with umbrella apps. That way, you could scale each part of your app independently without complicating things too much.

If you’re interested in doing similar handling of webhook events, but passing messages to LiveViews, Nyakio has written a great explanation in
Handling Mux Video Uploads in LiveView with Process Messaging.