Python asyncio in Practice: Real-World Measurements

Python is the language most similar to PHP in terms of execution model: interpreted, single-threaded (GIL), with a dominance of synchronous frameworks. The transition from synchronous Python (Flask, Django + Gunicorn) to asynchronous (FastAPI, aiohttp, Starlette + Uvicorn) is a precise analogy to the transition from PHP-FPM to a coroutine-based runtime.

Below is a collection of production cases, independent benchmarks, and measurements.


1. Production: Duolingo — Migration to Async Python (+40% Throughput)

Duolingo is the largest language learning platform (500M+ users). The backend is written in Python.

In 2025, the team began a systematic migration of services from synchronous Python to async.

Metric Result
Throughput per instance +40%
AWS EC2 cost savings ~30% per migrated service

The authors note that after building the async infrastructure, migrating individual services turned out to be “fairly straightforward.”

Source: How We Started Our Async Python Migration (Duolingo Blog, 2025)


2. Production: Super.com — 90% Cost Reduction

Super.com (formerly Snaptravel) is a hotel search and discount service. Their search engine handles 1,000+ req/s, ingests 1 TB+ of data per day, and processes $1M+ in sales daily.

Key workload characteristic: each request makes 40+ network calls to third-party APIs. This is a pure I/O-bound profile — an ideal candidate for coroutines.

The team migrated from Flask (synchronous, AWS Lambda) to Quart (ASGI, EC2).

Metric Flask (Lambda) Quart (ASGI) Change
Infrastructure costs ~$1,000/day ~$50/day −90%
Throughput ~150 req/s 300+ req/s 2x
Errors during peak hours Baseline −95% −95%
Latency Baseline −50% 2x faster

Savings of $950/day × 365 = ~$350,000/year on a single service.

Source: How we optimized service performance using Quart ASGI and reduced costs by 90% (Super.com, Medium)


3. Production: Instagram — asyncio at 500M DAU Scale

Instagram serves 500+ million daily active users on a Django backend.

Jimmy Lai (Instagram engineer) described the migration to asyncio in a talk at PyCon Taiwan 2018:

Challenges: High CPU overhead of asyncio at Instagram’s scale, the need for automated detection of blocking calls through static code analysis.

Source: The journey of asyncio adoption in Instagram (PyCon Taiwan 2018)


4. Production: Feature Store — From Threads to asyncio (−40% Latency)

The Feature Store service migrated from Python multithreading to asyncio.

Metric Threads Asyncio Change
Latency Baseline −40% −40%
RAM consumption 18 GB (hundreds of threads) Significantly less Substantial reduction

The migration was carried out in three phases with 50/50 production traffic splitting for validation.

Source: How We Migrated from Python Multithreading to Asyncio (Medium)


5. Production: Talk Python — Flask to Quart (−81% Latency)

Talk Python is one of the largest Python podcasts and learning platforms. The author (Michael Kennedy) rewrote the site from Flask (synchronous) to Quart (asynchronous Flask).

Metric Flask Quart Change
Response time (example) 42 ms 8 ms −81%
Bugs after migration 2 Minimal

The author notes: during load testing, the maximum req/s differed insignificantly because MongoDB queries took <1 ms. The gain appears during concurrent request processing — when multiple clients access the server simultaneously.

Source: Talk Python rewritten in Quart (async Flask)


6. Microsoft Azure Functions — uvloop as Standard

Microsoft included uvloop — a fast event loop based on libuv — as the default for Azure Functions on Python 3.13+.

Test Standard asyncio uvloop Improvement
10K requests, 50 VU (local) 515 req/s 565 req/s +10%
5 min, 100 VU (Azure) 1,898 req/s 1,961 req/s +3%
500 VU (local) 720 req/s 772 req/s +7%

The standard event loop at 500 VU showed ~2% request losses. uvloop — zero errors.

Source: Faster Python on Azure Functions with uvloop (Microsoft, 2025)


7. Benchmark: I/O-bound Tasks — asyncio 130x Faster

Direct comparison of concurrency models on a task of downloading 10,000 URLs:

Model Time Throughput Errors
Synchronous ~1,800 s ~11 KB/s
Threads (100) ~85 s ~238 KB/s Low
Asyncio 14 s 1,435 KB/s 0.06%

Asyncio: 130x faster than synchronous code, 6x faster than threads.

For CPU-bound tasks, asyncio provides no advantage (identical time, +44% memory consumption).

Source: Python Concurrency Model Comparison (Medium, 2025)


8. Benchmark: uvloop — Faster Than Go and Node.js

uvloop is a drop-in replacement for the standard asyncio event loop, written in Cython on top of libuv (the same library underlying Node.js).

TCP echo server:

Implementation 1 KiB (req/s) 100 KiB throughput
uvloop 105,459 2.3 GiB/s
Go 103,264
Standard asyncio 41,420
Node.js 44,055

HTTP server (300 concurrent):

Implementation 1 KiB (req/s)
uvloop + httptools 37,866
Node.js Lower

uvloop: 2.5x faster than standard asyncio, 2x faster than Node.js, on par with Go.

Source: uvloop: Blazing fast Python networking (MagicStack)


9. Benchmark: aiohttp vs requests — 10x on Concurrent Requests

Library req/s (concurrent) Type
aiohttp 241+ Async
HTTPX (async) ~160 Async
Requests ~24 Sync

aiohttp: 10x faster than Requests for concurrent HTTP requests.

Source: HTTPX vs Requests vs AIOHTTP (Oxylabs)


10. Counter-argument: Cal Paterson — “Async Python Is Not Faster”

It is important to present counter-arguments as well. Cal Paterson conducted a thorough benchmark with a real database (PostgreSQL, random row selection + JSON):

Framework Type req/s P99 Latency
Gunicorn + Meinheld/Bottle Sync 5,780 32 ms
Gunicorn + Meinheld/Falcon Sync 5,589 31 ms
Uvicorn + Starlette Async 4,952 75 ms
Sanic Async 4,687 85 ms
AIOHTTP Async 4,501 76 ms

Result: synchronous frameworks with C servers showed higher throughput and 2–3x better tail latency (P99).

Why Did Async Lose?

Reasons:

  1. A single SQL query per HTTP request — too little I/O for coroutine concurrency to have an effect.
  2. Cooperative multitasking with CPU work between requests creates “unfair” CPU time distribution — long computations block the event loop for everyone.
  3. asyncio overhead (standard event loop in Python) is comparable to the gain from non-blocking I/O when I/O is minimal.

When Async Actually Helps

Paterson’s benchmark tests the simplest scenario (1 SQL query). As the production cases above demonstrate, async provides a dramatic gain when:

This aligns with theory: the higher the blocking coefficient (T_io/T_cpu), the greater the benefit from coroutines. With 1 SQL query × 2 ms, the coefficient is too low.

Source: Async Python is not faster (Cal Paterson)


11. TechEmpower: Python Frameworks

Approximate results from TechEmpower Round 22:

Framework Type req/s (JSON)
Uvicorn (raw) Async ASGI Highest among Python
Starlette Async ASGI ~20,000–25,000
FastAPI Async ASGI ~15,000–22,000
Flask (Gunicorn) Sync WSGI ~4,000–6,000
Django (Gunicorn) Sync WSGI ~2,000–4,000

Async frameworks: 3–5x faster than synchronous ones in the JSON test.

Source: TechEmpower Framework Benchmarks


Summary: What Python Data Shows

Case Sync → Async Condition
Duolingo (production) +40% throughput, −30% cost Microservices, I/O
Super.com (production) 2x throughput, −90% cost 40+ API calls per request
Feature Store (production) −40% latency Migration from threads to asyncio
Talk Python (production) −81% latency Flask → Quart
I/O-bound (10K URLs) 130x faster Pure I/O, massive concurrency
aiohttp vs requests 10x faster Concurrent HTTP requests
uvloop vs standard 2.5x faster TCP echo, HTTP
TechEmpower JSON 3–5x FastAPI/Starlette vs Flask/Django
Simple CRUD (1 SQL) Sync is faster Cal Paterson: P99 2–3x worse for async
CPU-bound No difference +44% memory, 0% gain

Key Takeaway

Async Python provides maximum benefit with a high blocking coefficient: when I/O time significantly exceeds CPU time. With 40+ network calls (Super.com) — 90% cost savings. With 1 SQL query (Cal Paterson) — async is slower.

This confirms the formula from IO-bound Task Efficiency: gain ≈ 1 + T_io/T_cpu. When T_io » T_cpu — tens to hundreds of times. When T_io ≈ T_cpu — minimal or zero.


Connection to PHP and True Async

Python and PHP are in a similar situation:

Characteristic Python PHP
Interpreted Yes Yes
GIL / single-threaded GIL Single-threaded
Dominant model Sync (Django, Flask) Sync (FPM)
Async runtime asyncio + uvloop Swoole / True Async
Async framework FastAPI, Starlette Hyperf

Python data shows that transitioning to coroutines in a single-threaded interpreted language works. The scale of the gain is determined by the workload profile, not the language.


References

Production Cases

Benchmarks

Coroutines vs Threads