In this talk, we will look at how to make distributed applications, such as database clusters in Kubernetes, observable.
To illustrate this, we will introduce real failures into a Postgres HA cluster managed by the Postgres Operator for Kubernetes and see:
- How to detect each type of failure
- Whether the cluster components can handle each failure automatically and how long it takes to recover
Talk by Nikolay Sivko, Founder & CEO at Coroot, at Percona University (12.19.2022)
He has been in the reliability field for more than 15 years and held the position of head of IT operations at hh.ru (NASDAQ: HHR)
—
Coroot is an open source observability platform that helps engineers fix service outages and even prevent them. It continuously audits telemetry data to highlight issues and weak spots in your services. Quick setup, no code required.
Coroot turns metrics, logs, and traces into answers concerning application issues and their causes. Its built-in set of predefined inspections allows it to accurately identify the root cause of over 80% of outages. Coroot leverages eBPF to build a comprehensive Service Map of any system, making it possible to audit not only any given service but also its dependent services and databases.
Ещё видео!