The Cost Ripple from the AWS Outage

On 20 October 2025, AWS had a major outage in its US-East-1 region, one of its biggest data centres. Yesterday’s outage showed just how many everyday apps depend on the Amazon backbone. Thousands of websites

Written by: Martin Thompson

Published on: October 21, 2025

On 20 October 2025, AWS had a major outage in its US-East-1 region, one of its biggest data centres. Yesterday’s outage showed just how many everyday apps depend on the Amazon backbone. Thousands of websites and business systems slowed down or went offline for several hours (Reuters, Wired) from Fortnite and Snapchat to Lloyds Bank and Halifax were impacted (BBC).

Whilst the rest of the world debates whether it is wise to park so much of the internet on a single platform, it’s worth considering the bill. When a big region stumbles the ripple is economy-wide, not just technical.

Outage economics

In IT Management, when cloud services fail, the hidden damage isn’t just downtime, it’s the potential surge in costs that comes afterwards. Systems automatically try to “fix themselves”: they spin up backups, keep re-sending failed requests, or switch to other regions. Each of those reactions costs money. Engineers call this a retry storm, lots of machines panicking all at once.

It’s a good reminder that every second of instability ripples through to ITAM, FinOps, Procurement and Finance teams. Treat the outage like a cost event, not just an IT event.

AWS outage
AWS outage due to a suspected DNS issue

What to Do Before the next outage

  • Connect cost data to incident response – Treat outages like financial events as well as technical ones. When a major incident starts, FinOps should be on the call.
  • Invest in tagging and cost visibility – FinOps best practice 101. You can’t fix what you can’t see. Make sure every environment, team or service can be traced in the cost report.
  • Watch for cost anomalies automatically – FinOps tools can alert you if spend patterns look abnormal.
  • Check your contracts. Ask: if the supplier fails, who pays? Negotiate for flexibility, credits or caps when usage spikes due to provider downtime.
  • Normalise discovery data after incidents. ITAM and SAM teams should exclude temporary spikes from compliance or renewal reports.
  • Guardrails: Set quotas and sensible autoscaling limits to prevent runaway scale-out during failures.

Timing Matters

The day after an outage is rarely the right moment to launch a crusade; teams are tired and still mopping up. What is right today is capturing the evidence while it is fresh: the cost spike in your report, the anomaly alert, the incident timeline, the burst of temporary assets. Use yesterday’s AWS outage on 20 October as a case study, not for finger pointing, but for building better financial resilience for the future.

Leave a Comment

Previous

Why ITAM Forum Should Join the Linux Foundation: My Rationale and Your Questions Answered

Next

AI Governance Through an ITAM Lens: Treat AI as a Status Change, Not a New Asset