Jump to content

Site reliability engineer

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Missionedit (talk | contribs) at 04:16, 28 October 2016 (specify stub). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Site reliability engineer (SRE) is a job description given to software engineers focused on reliability, scalability, and the development of cloud computing infrastructure. SREs develop, maintain and operate software that automates the traditional roles of the system administrator at large scale, such as configuration and cluster management systems, and that support reliability and scalability goals, such as container virtualization and the systems architecture of microservices.[1][2]

Companies employing the SRE title or involved in the SRE profession include Google, Dropbox, Mozilla, Airbnb, Netflix,[2] eBay, LinkedIn, Facebook, Adobe, Microsoft, Uber, IBM and Twitter.[3] At Google, SREs have been responsible for developing the Google File System (GFS), its successor Colossus (CFS), and the Borg cluster management system.[4]

See also

References

  1. ^ Site Reliability Engineering: How Google Runs Production Systems. O'Reilly Media. 2016.
  2. ^ a b Donald FIscher (2016-03-02). "Are site reliability engineers the next data scientists?". TechCrunch.
  3. ^ USENIX. "SREcon".
  4. ^ Charles Babcock (2015-11-11). "Kubernetes Yields 'Operations Dividend,' Still Working On Scalability". InformationWeek.