Revision as of 22:54, 20 September 2023

https://sre.google/

참여자

진행 방식

사전에 주어지는 주제에 대한 분량을 책에서 읽고, 매주 금요일에 인상 깊게 보았던 부분을 서로 공유한다.

주제 내용

Chapter 1 - Introduction
Chapter 2 - The Production Environment at Google, from the Viewpoint of an SRE
Chapter 3 - Embracing Risk
Chapter 4 - Service Level Objectives
Chapter 5 - Eliminating Toil
Chapter 6 - Monitoring Distributed Systems
Chapter 7 - The Evolution of Automation at Google
Chapter 8 - Release Engineering
Chapter 9 - Simplicity
Chapter 10 - Practical Alerting
Chapter 11 - Being On-Call
Chapter 12 - Effective Troubleshooting
Chapter 13 - Emergency Response
Chapter 14 - Managing Incidents
Chapter 15 - Postmortem Culture: Learning from Failure
Chapter 16 - Tracking Outages
Chapter 17 - Testing for Reliability
Chapter 18 - Software Engineering in SRE
Chapter 19 - Load Balancing at the Frontend
Chapter 20 - Load Balancing in the Datacenter
Chapter 21 - Handling Overload
Chapter 22 - Addressing Cascading Failures
Chapter 23 - Managing Critical State: Distributed Consensus for Reliability
Chapter 24 - Distributed Periodic Scheduling with Cron
Chapter 25 - Data Processing Pipelines
Chapter 26 - Data Integrity: What You Read Is What You Wrote
Chapter 27 - Reliable Product Launches at Scale
Chapter 28 - Accelerating SREs to On-Call and Beyond
Chapter 29 - Dealing with Interrupts
Chapter 30 - Embedding an SRE to Recover from Operational Overload
Chapter 31 - Communication and Collaboration in SRE
Chapter 32 - The Evolving SRE Engagement Model
Chapter 33 - Lessons Learned from Other Industries
Chapter 34 - Conclusion

참고 자료

Book

@@ Line 1: / Line 1: @@
-Describe SiteReliabilityEngineering here
+https://sre.google/
+__TOC__
+= 참여자 =
+* [[김경민]]
+* [[김동우]]
+* [[김제신]]
+* [[박민서]]
+* [[음호준]]
+* [[전영은]]
+* [[조영호]]
+= 진행 방식 =
+사전에 주어지는 주제에 대한 분량을 책에서 읽고, 매주 금요일에 인상 깊게 보았던 부분을 서로 공유한다.
+= 주제 내용 =
+* Chapter 1 - Introduction
+* Chapter 2 - The Production Environment at Google, from the Viewpoint of an SRE
+* Chapter 3 - Embracing Risk
+* Chapter 4 - Service Level Objectives
+* Chapter 5 - Eliminating Toil
+* Chapter 6 - Monitoring Distributed Systems
+* Chapter 7 - The Evolution of Automation at Google
+* Chapter 8 - Release Engineering
+* Chapter 9 - Simplicity
+* Chapter 10 - Practical Alerting
+* Chapter 11 - Being On-Call
+* Chapter 12 - Effective Troubleshooting
+* Chapter 13 - Emergency Response
+* Chapter 14 - Managing Incidents
+* Chapter 15 - Postmortem Culture: Learning from Failure
+* Chapter 16 - Tracking Outages
+* Chapter 17 - Testing for Reliability
+* Chapter 18 - Software Engineering in SRE
+* Chapter 19 - Load Balancing at the Frontend
+* Chapter 20 - Load Balancing in the Datacenter
+* Chapter 21 - Handling Overload
+* Chapter 22 - Addressing Cascading Failures
+* Chapter 23 - Managing Critical State: Distributed Consensus for Reliability
+* Chapter 24 - Distributed Periodic Scheduling with Cron
+* Chapter 25 - Data Processing Pipelines
+* Chapter 26 - Data Integrity: What You Read Is What You Wrote
+* Chapter 27 - Reliable Product Launches at Scale
+* Chapter 28 - Accelerating SREs to On-Call and Beyond
+* Chapter 29 - Dealing with Interrupts
+* Chapter 30 - Embedding an SRE to Recover from Operational Overload
+* Chapter 31 - Communication and Collaboration in SRE
+* Chapter 32 - The Evolving SRE Engagement Model
+* Chapter 33 - Lessons Learned from Other Industries
+* Chapter 34 - Conclusion
+= 참고 자료 =
+* [https://sre.google/sre-book/table-of-contents/|SRE Book]