Hot reloading Python web-servers at scale

Every web application involves continuous changes to the server. It could vary from minor bug fixes, small feature improvements, to major feature releases. How to keep serving your users while replacing your app with a newer version is a problem everybody has to face. It's like replacing car wheels while driving at a full speed. One common practice for rolling out changes to production is turning down a few percentage of the servers and rotate until all changes have been updated on all servers. This common approach is, however, not ideal: it requires either non-trivial amount of extra resources dedicated for rollout, or significant overhead of rollout complexity and speed in case the reserved resource is falling short. More ideally, there's the “hot reload” land, where you can push changes with the same speed or even faster, and no extra resource required. We're going to talk about how Instagram got there.

In this talk, we're going to talk about the benefits of hot reload, and the path we took to get there, and the efforts to make it scale to thousands of servers. It requires no extra resources, even at a machine level, and minimal overhead, which means you could run your servers to nearly full utilization and can still roll out your code changes. With the new deployment system, we not only saved a ton of servers, but also made our continuous rollout faster by enabling full parallelism, thus scaling code rollout to more simultaneous changes, more developers, and in turn, more users.

Presented by

Chenyang Wu

A hacker, a programmer, currently working at Facebook & Instagram. Worked on some low-level Python projects, loved playing with fancy and abstract features of languages. Infamous as "GC killer".


Sponsors