Aperture (Reliability Management System)
- is an open source reliability management system that can add these capabilities
- it offers a centralized load management system that collects reliability-related metrics from different systems and uses it to generate a global view
Aperture - 3 components
- Observe - Aperture collects reliability-related metrics (latency, resource utilization, etc.) from each node using a sidecar and aggregates them in Prometheus. You can also feed in metrics from other sources like InfluxDB, Docker Stats, Kafka, etc.
- Analyze - A controller will monitor the metrics in Prometheus and track any deviations from the service-level objectives you set. You set these in a YAML file and Aperture stores them in etcd, a popular distributed key-value store.
- Actuate - If any of the policies are triggered, then Aperture will activate configured actions like load shedding or distributed rate limiting across the system.