On Monday, October 20, Amazon Web Services (AWS) experienced a widespread outage, bringing down thousands of websites and affecting users globally. This significant disruption unfortunately coincided with Diwali celebrations in India, causing considerable issues for numerous online businesses running festive promotions and campaigns. According to outage trackers, AWS services were inaccessible for nearly 12 hours, though some gradually returned online sooner. High-profile platforms and services crippled by this event included Apple TV, Canva, Fortnite, Reddit, Snapchat, and Starbucks, alongside various government agencies.
Unpacking the AWS Outage: A Day of Digital Silence
Reports of issues with AWS services began pouring in on Internet outage monitor Downdetector starting at 4 AM ET (1:30 PM IST) on October 20. The volume of reports surged dramatically, peaking at 8 AM ET (5:30 PM IST), with over 13,000 users confirming they were affected. These widespread reports continued until 2 PM ET (11:30 PM IST), highlighting the scale and duration of the disruption.
Given AWS’s critical role as a leading cloud service provider for countless online platforms and websites, the outage cascaded, causing widespread problems for a diverse array of applications and sites. Popular services like Apple TV, Canva, Fire TV, Fortnite, Hulu, Pinterest, Reddit, Snapchat, and Starbucks all experienced significant interruptions. For example, Fire TV users found themselves unable to connect to servers or stream any content on their smart TVs.
With its profound and widespread impact on major corporations and government agencies alike, this incident is being recognized as the most significant internet outage since last year’s Crowdstrike event, which famously disrupted Microsoft services. The complete financial ramifications of this extensive downtime are still being assessed.
This outage also sparked considerable debate regarding the inherent risks of relying on a single entity for such a vast portion of internet infrastructure. Billionaire and X CTO Elon Musk notably highlighted how even Signal, a privacy-focused messaging platform, succumbed to the AWS disruption, prompting questions about the overall resilience of such centralized systems. Musk also used the occasion to advocate for his own X Chat platform.
The Root Cause: Unraveling the Technical Failure
In its initial assessment shared on the Service Health status page, AWS indicated that the incident was initiated by a “Domain Name System (DNS) resolution issue” impacting its regional DynamoDB service endpoints. This fundamental problem resulted in escalated error rates and increased latency across AWS services specifically in the US-East-1 Region. The DNS issue then triggered a subsequent “impairment in the internal subsystem of EC2,” which, in turn, compromised the Network Load Balancer health checks.
This domino effect of technical failures culminated in a massive system breakdown, requiring AWS nearly 12 hours to fully resolve. The company has committed to providing a comprehensive post-event summary in the near future.