The complexity of managing and delivering the high level of reliability expected of web-based, cloud hosted systems today, and the expectation of Continuous Delivery of new features has led to the evolution of a totally new field of Service Reliability Engineering catered for such systems.
Google, who has been a pioneer in this field, calls it Site Reliability Engineering (SRE). While it would be more aptly named Service Reliability Engineering, the name has caught on. The seminal work documenting Google approach and practices is in the book by Google by the same name (commonly referred to as the ‘SRE book’), and has become the defacto standard on how to adopt SRE in an organization. This session will cover adopting SRE as a practice in organizations also adopting DevOps; address the challenges to adopting SRE faced by large traditional enterprises, and how to overcome them.