202510261326 Tokio tasks vs kernel thread switching
I came across a link to a repo for good benchmarks on context switching costs in asynchronous code comparing Tokio tasks to kernel threads.
The summary is an excellent back-of-the-napkin number to keep around so I’m re-posting it here. All credit obviously goes to the author of the repo Jim Blandy of Programming Rust fame.
- A context switch takes around 0.2µs between async tasks, versus 1.7µs between kernel threads. But this advantage goes away if the context switch is due to I/O readiness: both converge to 1.7µs. The async advantage also goes away in our microbenchmark if the program is pinned to a single core. So inter-core communication is something to watch out for.
- Creating a new task takes ~0.3µs for an async task, versus ~17µs for a new kernel thread.
- Memory consumption per task (i.e. for a task that doesn't do much) starts at around a few hundred bytes for an async task, versus around 20KiB (9.5KiB user, 10KiB kernel) for a kernel thread. This is a minimum: more demanding tasks will naturally use more.
- It's no problem to create 250,000 async tasks, but I was only able to get my laptop to run 80,000 threads (4 core, two way HT, 32GiB), even after raising every limit I could find. So I don't know what's imposing this limit. See "Running tests with large numbers of threads", below.
These are probably not the limiting factors in your application, but it's nice to know that the headroom is there.
Links to this note