Reliability is usually not the highest priority when it comes to creating software. In operation, more so. However, there is still a clear separation here (even in the age of DevOps), because reliability is considered an operational problem. Very few people have a clear idea of what the desired availability should be, how it is measured and what happens if it is not met.
To answer these questions, Site Reliability Engineering (SRE) provides a clear framework for addressing reliability in the product life cycle. Reliability is treated as a “First Class Citizen” and not as “Oh, yesterday we had a problem in production, I think we have to do something”. We’ll also look at how SRE fits into the DevOps world.
Speaker
In his professional life, Florian Kammermann has worked in many different IT areas. Florian is interested in everything (process, technology, human aspects) that makes software development faster, more valuable and more transparent. He is currently working at Swisscom on a DevOps portal where DevOps engineers can manage all the resources they need (tools, infrastructure) as self-service. The tools of choice to achieve the goals are Golang, Angular, Cloud Foundry, and Kubernetes. Florian is also part of the SRE Community of Practice, which implements SRE practices at Swisscom.