We scaled down some post transaction batch jobs (status sync consumers)
which freed up a chunk of DB connections to normal, reduced contention for
the tables.
The dependency of legacy EC service was turned off to allow EC to prevent
it from holding up further connections due to upstream latencies.
Monitoring
Monitoring
We have initiated restart of all application pods across major services which were
affected the most (txn, customer, wallets and Saved Payment methods). We are continuously monitoring the key metrics.
Identified
Identified
We are seeing an increase in 504s across all the production k8s clusters.
Investigating
Investigating
We are seeing outages for multiple APIs and investigating the root cause.