RED metrics are a key concept in the observability space, particularly for monitoring the performance and reliability of microservices or distributed systems. The acronym RED stands for:
Rate:
The number of requests processed per second.
This metric helps you understand the overall throughput of your system and whether it is handling the expected load.
Errors:
The number or rate of failed requests.
Tracking errors helps you measure the reliability and stability of your system. These errors can include HTTP 5xx responses, timeouts, or any other failure indicators relevant to your application.
Duration:
The time it takes to process a request, typically measured as latency (in milliseconds or seconds).
Monitoring duration helps you assess the performance of your system and whether it meets your latency Service Level Objectives (SLOs).
Use in Observability
RED metrics are often collected and visualized using tools like Prometheus, Grafana, or other monitoring systems.
They are especially useful for Service Level Indicators (SLIs) and for building Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
RED metrics are tailored for microservices and distributed systems, complementing broader metrics frameworks like USE (Utilization, Saturation, Errors) for infrastructure monitoring.
Example
For a web service:
Rate: 1000 requests per second
Errors: 10 requests per second fail (1% error rate)
Duration: Average latency is 200 ms, with a 99th percentile latency of 500 ms
Ещё видео!