Web-scale companies like Facebook and Google are popularizing a new title within IT operations teams: the Site Reliability Engineer (SRE). For some, this role is referred to as a Production Engineer, while others have even more obscure titles, like Airbnb’s “Developer Happiness Engineer.” No matter the title though, the common thread for all SREs is this: ensuring that an organization’s online presence is up and available at all times, and performing efficiently. And as they march down this path, SREs relentlessly look for answers to these questions: Does it work? Does it work well? Could it work better?