As much as I love python, it also makes you fight hard to avoid doing the wrong things. The wrong thing in this case being global state.
FastAPI, my favorite python web framework, implicitly encourages the use of globals through its Dependency system. You define a global, throw it in a getter function defined as a dependency, you declare them in your handlers, and FastAPI will solve the tree for you, ensuring you don’t get race conditions. As much as I appreciate the power and the ergonomics, I really don’t like this. There’s no way to validate the correct behavior until runtime. It also makes it hard to test, usually requiring to manually override the dependency in unit tests.
The anti-pattern
Imagine you have a global dependency, say, a database engine. Instead of defining it as a global, let’s define it as a function:
Using FastAPI’s dependency system, you would use this as follows:
When this endpoint is hit with a get
request, FastAPI will solve the dependency tree, finding that get_session
depends on get_engine
, then it will call that, provide the value to get_session
, and then we have a database session. Simple!
This code has a problem. If you were to keep calling this endpoint, FastAPI would spin up a database engine per request. It’s best practice to keep an engine for the lifetime of your application, as it handles all the complicated database pooling nonsense. This is simply encouraging poor performance, as Database IO is likely the main blocker for your application.
There’s a bunch of ways you can solve this. You can define a global inside your module:
I don’t like this, and nor should you. Another way we can solve this is by using the functools.cache
decorator (or functools.lru_cache
if you’re on an ancient version of python). Just throw it on, and now,
When this engine is created, our application now has one engine. Problem solved!
This is a suboptimal solution. Our application only creates the engine when a handler that requires the dependency is called. Your application could start up, and things seem alright, but it could then crash on a request if you failed to get a connection for some reason. With the engine tied outside the lifecycle of the application, we don’t get predictable teardowns, which has all the potential for side-effects.
Our database should live immediately before and immediately after FastAPI, like an outer layer. We initialize it when FastAPI starts up, and when we CTRL-C (aka SIGTERM
), our database should have the opportunity to clean itself up. It would be convenient if we could tie it to, say, the lifespan of FastAPI…
Example
Some people attempt to solve this conundrum using
contextvars
. Contextvars scare me and I avoid them wherever possible.
The right way with ASGI Lifespan
FastAPI features support for aptly-named ASGI Lifespan
protocol. For example, here’s a lifespan modified directly from FastAPI’s docs.
Pretty cool! A big improvement on our old code, as we can properly handle clean-ups. But it’s still not optimal, as our dependency relies on the global state. Is it possible to make it, not?
Nested in the ASGI spec, there’s an interesting feature of lifespans: when you yield
, you can yield
stuff from it. Instead of the defining a global, you can just,
And now, our engine is part of our application! To be more specific, it’s part of the ASGI Scope
. You can access it by simply defining our new session dependency like:
Inside that dependency, we get a new session, initialized with a shallow copy
of our engine (important for performance), that’s tied to the lifespan of our FastAPI app. No dependency solving required, as the engine is associated with every request.
When you ask FastAPI to shut down, FastAPI will clean itself up, and then the lifespan will pass its yield
point, allowing the engine to dispose
of itself.
This also means you can introduce additional initialization before your application even starts. For example, let’s say you have some custom logic that uses alembic to migrate a database, you can call it as part of your lifespan logic.
If the migration were to fail (and the function throws), your application wouldn’t just start up in the first place. Since your database is a prerequisite to your entire application, this is more correct behavior than simply waiting for it to happen.
ASGI Lifespans are powerful! You should associate stuff with your application rather than letting it live external to it. I throw in pretty much everything inside of it, including my application settings (of which I use pydantic_settings
), and all my dependencies are just wrappers that pull directly from the ASGI scope. It also has the benefit of being far more testable, as you can just mock the underlying object injected into the lifespan rather than overriding the dependency itself.
The downside of this approach is that you can’t use the database outside of FastAPI. But I consider this to be a feature, not a bug. In my view, it’s an anti-pattern to do things external to your web server without explicit user interaction. If you really need to step out of this, in your lifespan, you can schedule an task on the event loop, but you better have a damn good reason. The lifespan encourages you to think deeply about what the lifecycle of your application is, which I find leads to more predictable and maintainable code.
How to do periodic tasks
If you wanted to do this, here’s a quick example of how this is possible with the
asyncio.TaskGroup
abstraction:The upside of this design being that your TaskGroup will now clean itself up when the lifespan exits. You can additionally pass an object from the lifespan into the defined
periodic_task
(like a database engine), meaning we keep the lifespan philosophy intact.
Appendix: Database dependencies, done right
This design primarily came from an issue that propped up with FastAPI in recent versions. I too, previously did something like:
The goal being that if I had an unhandled exception inside of a handler, the database would automatically roll itself back. This unfortunately stopped working due to some internal changes inside FastAPI’s dependency resolution system.
Instead, we can solve this problem by using middleware.
Info
I took all the code below and put it in a library:
asgi-sqlalchemy
.
First, we define a database abstraction as follows, that implements the Async Context Manager protocol.
Now, we can use async with DatabaseContext(...) as db
inside of our lifespan function, and we’ve abstracted the lifespan of the database itself, so we no longer need to manually dispose.
We can define some middleware that reads the db injected into the ASGI scope, creates a session based off of it, then adds it to the request-specific scope
.
scope.state
is global to every scope, while scope["key"]
is specific to the the given request. A bit confusing, but don’t worry about it.
We finally have to define the dependency:
And we can use it in any handler we desire!
The main power of this approach is that you can automatically rollback the database on an unhandled exception. However, if manually triggering a HTTPException
, a rollback won’t occur unless manually initiated, which is often desired behavior. Take a look at the tests here.