Doing Business | IT & Technologies

March 03, 2017

Amazon explains big AWS outage, says employee error took servers offline, promises changes

Amazon has released an explanation of the events that caused the big outage of its Simple Storage Service Tuesday, also known as S3, crippling significant portions of the web for several hours.

Amazon said the S3 team was working on an issue that was slowing down its billing system. Here’s what happened, according to Amazon, at 9:37 a.m. Pacific, starting the outage: “an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.”

Those servers affected other S3 “subsystems,” one of which was responsible for all metadata and location information in the Northern Virginia data centers. Amazon had to restart these systems and complete safety checks, a process that took several hours. In the interim, it became impossible to complete network requests with these servers. Other AWS services that relied on S3 for storage were also affected.

About three hours after the issues began, parts of S3 started to function again. By about 1:50 p.m. Pacific, all S3 systems were back to normal. Amazon said it has not had to fully reboot these S3 systems for several years, and the program has grown extensively since then, causing the restart to take longer than expected.

Amazon said it is making changes as a result of this event, promising to speed up recovery time of S3 systems. The company also created new safeguards to ensure that teams don’t take too much server capacity offline when working on maintenance issues like the S3 billing system slowdown.

Amazon is also making changes to its service health dashboard, which is designed to track AWS issues. The outage knocked out the service health dashboard for several hours, and AWS had to distribute updates via its Twitter account and by programming in text at the top of the page. In the message, Amazon said it made a change to spread that site over multiple AWS regions.

Amazon concluded its explanation with this message:

Finally, we want to apologize for the impact this event caused for our customers. While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further.

Several observers surveyed by GeekWire pointed to the need for redundancy in cloud storage as a key takeaway from the outage. Redundancy in this case can mean spreading data across multiple regions, so that an outage in one area doesn’t cripple an entire site, or using multiple cloud providers.

Anand Hariharan, vice president of products for Mountainview, Calif.-based Webscale Networks noted that Amazon’s retail website didn’t go down during the outage Tuesday because it didn’t put all its eggs in one cloud basket.

As AWS’ incredibly disruptive outage this week showed, every major public cloud provider has experienced – or will experience – downtime. In fact, more and more of our customers – particularly those running e-commerce businesses – recognize that they can’t just rely on one cloud provider, or one region. Amazon themselves stayed live and fast because they do exactly this – spread their infrastructure across multiple regions. Hours – and really just minutes – of downtime are a lifetime for businesses. Downtime costs not only revenue, but brand reputation and consumer trust, so companies need to consider their multi-region/multi-cloud strategies today.

The internet reacted in a pretty jovial manner to the outage Tuesday, with many taking the outage as a chance for a “digital snow day.” Amazon’s explanation of the outage earned praise from some for the company’s transparency and scorn from others.

Text by Geek
 

Tags Cloud

20102011accaaccidentaccidentsADSUadvertisingafghanistanAfrAsiaafricaagalegaagreementagroairair asiaair australAir Franceair madagascarair mauritiusairasiaAirlinesairportairway coffeeAlbionalgeriaalibabaalteoamazonAmeenah Gurib-FakimAMLandroidApollo BramwellappleappointmentsappsarrestasiaATOauditaudit reportaustraliaaustriaaviationawardawardsBABagatelleBagatelle DamBAIBangladeshbankBank OnebankingbanksbarclaysbeachesBeau Bassin-Rose HillbeautybeerBelgiumBelle-MareBelle-RivebetamaxBharat Telecombig databitcoinblue economyBlue-BayBOIboko haramBollywoodBOMbombingbookbossBotswanabpmlBPOBramer BankbrazilbrexitbudgetbuildingbusesBusinessbusiness trendsCabinetcall centresCanadacanecareercareer tipscargocarnivalcasinoCCIDCCMCCTVCEBcelebretiescementChagoscharityCharlie HebdoCHCchilechinaCIELcigarettescinemaclashesCMTcomorosconcertconferenceCongoconstitutional amendmentconstructioncontestcontestscontractcooperationcorruptioncounterfeitcoupCourtCourtsCPBCPEcreativitycreditscrisiscruise shipsCSOCT PowerCultureCurepipecustomercustomerscustomsCWAcyclingcyclonedamDawood RawatdayDBMdeficitdenguedeportationdevelopmentDiego GarciadivalidoctorsdodoDomaine les PaillesDonald TrumpDPPdrugsDTAADubaiDuty Freee-commercee-servicesearthquakeebeneebolaecoecologyeconomiceconomicseconomyEducationeducation abroadeducation reformEEZEgyptEIILMelectionselectoral reformelectricityembassyEmiratesEmtelenergyENLentrepreneurshipEOCEUEuroEuropeeventeventsexaminationexamsexpoexportfacebookfairFalcqfarmersfarmingfashionfast foodfbiFDIfeesferryfestivalFievre AphteuseFIFAFIFA World CupFilm Rebate SchemefilmingfilmsfinancefinancesfinancingFirefishfishingFIUFlacqFlic-en-FlacFloodsflourfoodFootballforecastforeign workersForumFrancefraudfruitfruitsFSCFTAfuelfunnyGAARgabongadgetsgalaxygalaxy notegamblinggame of thronesgamesgasgazaGDPGermanyghanaGlobal BusinessgolfgoogleGorah Issac casegovernmentGRAGrand Baygrand-baiegreecegreengreen energyhackershajjhamashappinessHawaiihawkershealthhealthcareHeritage Cityhi-techhighlandshistoryHolcimholidaysHong Konghorse racingHospitalhotelhotel businesshotelshowhow toHRHRDCHSBCHSCHSC ProhtchungaryhuntingIBAIBLICACICTICTAID cardiframacillegal fishingillegal migrationillegal workersIMFimportindiaIndian OceanIndian Ocean Island GamesIndonesiainflationInfographicsinfrastructureinnovationinnovationsinsuranceinterest rateinternetinterpolInterviewinterview tipsinvestinginvestmentinvestmentsIOCIORECiosiPadiphoneIPOiraniraqirelandIRSISISislamicisraelITItalyjapanJewelleryJin FeijobjobsjournalismJulian AssangeKailash TrilochunKenyakmpgKPMGkreollabourLafargelandlandslidelawlawslayoffsLe MorneleadershipleakLepepliberiaLibyalifeloanslocal governmentlotteryLottotechLRTLufthansaMadagascarmalariamalaysiamalaysia airlinesMaldivesMalimallmanagementmanagement tipsmanufacturingmarketmarketingmarketsMauBankMauritiansmauritiusMBCMCBMCCIMeatmeccamediaMedical CouncilMedical tourismmedicamentsmedicineMedPointmeetingMEFMegh PillayMESMEXAMFAMHCmichaela harte caseMicrosoftMIDMidlandsMIEmigrationmigration crisismiss mauritiusmistakesMITDmlMMMmobilemobile phonesMokamoneymoney launderingmonkeyMont-ChoisyMoody’sMoroccomotivationmoviesMozambiqueMPAMPCMPCBMPLMQAMRAMSBMSCMSMMTMTCMTPAMusicMV BenitaNaïadeNamibiaNandanee SoornackNarendra ModinasanatureNavin RamgoolamNCBNCGNDUNECnefNelson MandelaNeotownNepalnetherlandsnetworkingNew Mauritius Hotelsnew zealandNGONHDCNICNICHLNigerianight clubsNitin Chinien caseNobel Prizenokianorth koreaNRBNTANTCoceanofofficialsoffshoreoilOlympic GamesOmega ArkOmnicaneoniononlineopinionOppositionorangeoscaroscar pistoriusOUMoutsourcingPakistanpalestinePamplemoussesPanama Paperspandit sungkurparliamentPaul BérengerpensionsPhilippinesPhoenix Beveragesphonespicture of the daypillspiracyplanPMPMOpmsdPNQpokerpolicepoliticspollutionPonzi SchemeportPort LouisPort-LouispostPovertyPRPravind JugnauthPRBpresentationpresentation tipspresidentpricepricesprisonproblemprofitprojectprojectsprostitutionprotestspsacPSCpsychologyPTRpublic servicepwcquatre-bornesquotesrainsRajindraparsad SeechurnRakesh Gooljauryratesratingratingsreal estaterecallsreformsrefugeesreligionrepo ratereportRESrestaurantsresultresultsresumeresume tipsretailReunionrevenuericeRiche-TerreriseRiviere-Noireroadsroche-boisRoches-Noires caseRodriguesRogersRose-HillrosewoodRoshi BhadainRum and SugarRundheersing BheenickrupeeRussiaRwandasafetySAJsalariessalarysalesalessales tipssamsungsanctionssaudi arabiaSBIsbmSCscamscandalscholarshipscholarshipsSchoolschoolsscienceseasecuritySEMDEXSenegalSeychellesShakeel MohamedShanghaisharksshootingshoppingshopping fiestashopping mallshopping mallsshowSICsicomSierra LeonesingaporeSITskillssmall businesssmart citysmartphonesSMeSMEDAsmmsnapchatSobrinho casesocialsocial mediasocial networks & messengerssoftwaresolar energySomaliasonysouthsouth africaSouth China Seasouth koreasouth sudanspainsponsorshipsportSportsSquatterssri lankaSSRStar KnitwearstartupsstatisticsstatsSTCstockstock marketstocksstrategystreet vendorsStressstrikestudystudy abroadstylesuccesssuccess storysugarsummitSun ResortsSun Tan caseSunkai casesurveySwedenSwitzerlandsyriaTAtabletsTaiwanTanzaniataxtax fraudtax heaventaxesTbillsteaTeachersteamTECtechnologytelecomtendersterrorismtertiarytextilethailandthethefttimetime managementtipstotourismtoystradetrade feetradingtrainingstransporttraveltrendsTrioletTripAdvisorTrou-aux-BichestsunamitunaTunisiaTurfTurkeyturkish airlinesTVtwittertyphoonUdMUgandaukukraineununemploymentunionsuniverseuniversityuomUSUTMvacanciesVacoas Popular Multipurpose Cooperative SocietyVacoas-PhoenixVanilla Islandsvarma caseVATvegetablesvideoVideo of the Dayvirtual realityvisaVishnu LutchmeenaraidoovisitvolcanowasteWaterWaterparkWeatherwest africawhatsappWhitedot Casewi-fiwikileakswindowsWMAwomenworkworkersWorkshopWorldWorld Bankwriting tipsWTOXavier-Luc DuvalyahooyemenYEPyoutubeZambiazimbabwe
Mauritius
© 2010-2017 mega.mu