OBSERVABILITY:
Leader in the banking sector
01
The context:
The customer wants to overhaul the monitoring system to meet the requirements of the new application architecture under Kubernetes, moving from historical monitoring (static monitoring) to modern monitoring (SRE, KPIs, metrics, etc.).
02
Problem solving approach:
Implementation of a new standardised and portable event-driven monitoring and observability solution in Prometheus format, (retro)compatible with the existing monitoring stack including Nagios, Statsd, Graphite, etc.
03
Result:
A new monitoring structure based on an SRE approach and key observability metrics (golden signals), which has enabled us to reduce the error rate across all applications in the fleet by between 15% and 25%, while optimising processing times where latency was uncorrelated with user traffic.
04
Technical stack used:
Prometheus, Grafana, Nagios, Statsd, Graphite, Cassandra, Kubernetes Docker