Amazon Battles Massive AWS Outage Hitting US Services

Amazon Battles Massive AWS Outage Hitting US Services
Amazon’s AWS recovering after major outage disrupts apps, services worldwide. Picture by: https://cyprus-mail.com/

AWS outage disrupted major online services across the US on Monday, affecting millions of users trying to access Amazon, Snapchat, Venmo, Reddit, and dozens of other platforms. The Amazon Web Service outage started in the early hours of Monday morning, bringing down critical cloud services and infrastructure that power much of the internet. 

The AWS outage began at 3:11 a.m. ET when the company started investigating connectivity issues in its US-EAST-1 Region. Media has confirmed that more than 80 services were affected at peak Monday morning, with outage reports flooding DownDetector, an outage-tracking website that monitors service providers worldwide. The disrupted access hit everything from workplace tools to financial service providers, leaving customers scrambling for alternatives.

Users across America continued to run into issues accessing their favorite platforms throughout Monday afternoon. Signal, Zoom, Strava, Amazon’s Alexa assistant, Pinterest, and Robinhood all showed problems on the outage-tracking website. The technology disruption extended to airlines including United and Delta, telecoms giants like AT&T and Verizon, and workplace tools such as Slack, Microsoft Teams, and Asana.

Amazon Battles Massive AWS Outage Hitting US Services
Massive AWS Outage Disrupts Global Services and Games

United airline passengers found themselves unable to access the carrier’s app and website overnight. A United spokesperson told reporters that the AWS outage forced them to take immediate action. The airline implemented backup systems to end the disruption and restore normal operations. The company spokesperson confirmed that service operations returned to normal after the emergency protocols kicked in.

Robinhood, a popular financial service provider, reported that its platform was back online and recovering after hours of downtime. The company posted updates on X, formerly Twitter, reassuring customers that trading services were returning. Venmo users also experienced issues with payment transfers, with user reports spiking on DownDetector throughout the morning. T-Mobile appeared on the outage-tracking website, but a company spokesperson clarified they didn’t experience an outage on their own network—their customers simply had issues when trying to use other sites due to a third party’s outage.

Aravind Srinivas, CEO of AI startup Perplexity, posted on X at 3:22 a.m. ET that his service was completely down. “The root cause is an AWS issue,” he wrote in his X post, adding that his team was working on resolving the problem. Perplexity, which relies heavily on cloud services for its operations, remained offline for several hours during the early Monday morning chaos.

A Snapchat spokesperson advised users to “hang tight” while the company investigates the connectivity issues with its app. The social media platform, which depends on Amazon Web Service infrastructure for its online applications, saw thousands of user reports flood in during the peak Monday morning hours. Reddit and other social platforms also showed significant problems, with users unable to access their accounts or post content throughout the morning.

Amazon’s status page revealed that DynamoDB, its crucial database service powering countless online applications, was experiencing significant error rates for requests to data centers located on the US East Coast. The underlying issue stemmed from a problem with DNS—the Domain Name System that translates website names to IP addresses and functions as a phone book for the internet. Without functioning DNS, online services simply cannot direct users to the correct website destinations.

At 6:35 am ET, AWS announced the underlying issue had been “fully mitigated” and most service operations were “succeeding normally now.” However, a fresh wave of outage reports spiked later that morning. By 10:14 a.m. ET, AWS reported “significant API errors and connectivity issues across multiple services in the US-EAST-1 Region.” The severity status on the AWS status page remained “degraded” as the cloud unit worked to restore full functionality.

Amazon’s Recovery Efforts Show Gradual Progress

At 11:43 a.m. ET, Amazon said it had “narrowed down the source of the network connectivity issues that impacted AWS Services.” The company identified the root cause as “an underlying internal subsystem responsible for monitoring the health of our network load balancers.” This critical infrastructure component ensures that requests are properly distributed across Amazon’s vast network of servers and data centers.

The company took additional mitigation steps to aid the recovery of the troubled subsystem and began seeing connectivity and API recovery for AWS services. By 12:13 p.m. ET, Amazon reported progress had been made in restoring normal operations. At 1:38 p.m. ET, the company said mitigation efforts were “progressing” with some internal systems “now showing early signs of recovering in a few Availability Zones (AZs) in the US-EAST-1 Region.” Amazon planned to apply mitigations to the remaining AZs, expecting launch errors and network connectivity issues to subside once complete.

History Repeats as Another Major Outage Rocks Digital Infrastructure

This isn’t the first time an outage at one underlying service provider has brought down large chunks of the internet. In July last year, a faulty software update from CrowdStrike, a cybersecurity company, caused computers around the world to crash, sparking chaos for airlines, hospitals, banks, and businesses. That incident demonstrated how vulnerable our digital world has become to single points of failure.

Notable online service outages also occurred in 2022, 2021, 2020, and 2019—typically stemming from faulty updates or misconfigurations at major cloud services providers. “Today’s outage is another reminder that the digital world doesn’t stop at borders—a local fault can ripple worldwide in minutes,” said Charlotte Wilson, head of enterprise at Check Point Software, a cybersecurity company. “We’ve built convenience on shared systems, but resilience still depends on people and process.”

Leave a Reply

Your email address will not be published. Required fields are marked *