Amazon services ‘recovering’ as Snapchat and banks among sites hit by outage

Liv McMahontechnology reporter And
Lily JamaliNorth American Technology Reporter
Getty ImagesAmazon Web Services (AWS) said late Monday that it had resolved a major outage that left some of the world’s largest websites offline for much of the day.
More than 1,000 apps and websites, including social media platforms such as Snapchat and banks such as Lloyds and Halifax, were affected by issues that Amazon said were central to the cloud computing giant’s operations in the US.
Platform outage monitor Downdetector said user reports of the issue worldwide rose to over 11 million during Monday’s outage.
Even after Amazon fixed the underlying problem, experts said the outage showed the dangers of so many companies depending on a single dominant provider.
“What this episode highlights is how interdependent our infrastructure is,” said Prof Alan Woodward, from the University of Surrey.
“Many online services rely on third parties for their physical infrastructure, meaning problems can arise even with the largest of these third-party providers.
“Small mistakes, often made by humans, can have widespread and significant impacts.”
The problems appear to have started around 07:00 BST on Monday as users began reporting problems accessing a number of platforms.
This included a wide range of different sites and services, from major online games like Fortnite to language learning app Duolingo.
Earlier in the day, Downdetector told the BBC it had seen more than four million reports from users across 500 sites in just a few hours; That’s more than double the amount he’d see during a normal weekday.
It was stated that this number later rose to over 11 million as more services such as Reddit and Lloyds Bank tried to recover.
Amazon said all AWS services “returned to normal operations” around 11 p.m. ET.
But the company has previously had to cut back on parts of its system to fix the underlying problem.
According to Mike Chapple, a professor of information technology at the University of Notre Dame, a new series of “cascading failures” may have emerged after the initial outage.
“It’s like when you have a large-scale power outage. Crews start working to get it back online,” Mr. Chapple said. “The power may flicker a few times,” he explained, but it’s possible that Amazon initially “only addressed the symptoms” and not the cause.
What went wrong?
Amazon has yet to fully explain what caused Monday’s outage or make an official statement about it.
An update on the service status web page stated that the issue “appears to be related to DNS resolution of the DynamoDB API endpoint on US-East-1.”
DNS, which stands for Domain Name System, is often likened to a phone book for the internet.
It effectively translates website names that people use (such as bbc.co.uk) into numbers that can be read and understood by computers.
This process fundamentally underpins how we use the internet, and interruptions to the internet can cause web browsers to be unable to find the content they are looking for.
Matthew Prince, managing director of Cloudflare, told the BBC that the AWS outage highlighted the power of cloud services over the functioning of the internet.
“Everyone has a bad day, Amazon had a bad day today too,” he said.
“There are great things about the cloud; it allows you to scale… but if you experience an outage like this, it can knock out a lot of the services we rely on.”
Cori Crider, president of the Future of Technology Institute, also told the BBC the situation was “a bit like a bridge collapsing”.
“A significant part of the economy has been shattered,” he said.
He said the status quo is “unsustainable” as the majority of cloud computing is driven by Amazon, Microsoft and Google (estimated to be around 70%).
“When you have a concentrated supply in a handful of monopoly providers, when something like this collapses, a large percentage of the economy disappears with it,” he said.
“We should try to buy more local services instead of relying on a handful of American monopoly platforms.
“This is a risk to our security, our sovereignty and our economy, and we need to look at structural divides to make our markets more resilient to such shocks.”
A computer science expert says some of the responsibility lies with the companies that use AWS.
“Companies that use Amazon don’t take enough care to build protection systems into their applications,” says Ken Birman, a professor of computer science at Cornell University in New York.
Outages like Monday’s are not always on this scale, but they occur frequently.
Birman told the BBC that app developers should be careful about investing in backing up critical applications that live in the cloud.
“We know how to make these systems more robust and we know how to do it safely,” says Birman.
The question of liability may well end up in the courts.
More than a year after the massive CrowdStrike outage, Delta Airlines is still wrangling with the company to recoup more than $500 million in losses.
Even after CrowdStrike fixed the issue, the airline said it had to manually reset 40,000 servers, resulting in major flight delays over several days.
Additional reporting by Esyllt Carr.






