APIs Outage

Resolved
Major outage
Started over 1 year ago Lasted 30 minutes

Affected

API
Updates
  • Resolved
    Resolved

    We scaled down some post transaction batch jobs (status sync consumers) which freed up a chunk of DB connections to normal, reduced contention for the tables. The dependency of legacy EC service was turned off to allow EC to prevent it from holding up further connections due to upstream latencies.

  • Monitoring
    Monitoring

    We have initiated restart of all application pods across major services which were affected the most (txn, customer, wallets and Saved Payment methods). We are continuously monitoring the key metrics.

  • Identified
    Identified

    We are seeing an increase in 504s across all the production k8s clusters.

  • Investigating
    Investigating

    We are seeing outages for multiple APIs and investigating the root cause.