May 2, 2019: Major Azure Outage Due to DNS Migration Issue https://buildazure.com/2019/05/03/may-2-2019-major-azure-outage-due-dns-migration-issue/ … 1:05 - 2019年5月4日
The company plans to publish a detailed root cause analysis within the next 72 hours.
5/2 Network Connectivity - DNS Resolution Summary of impact: Between 19:43 and 22:35 UTC on 02 May 2019, customers may have experienced intermittent connectivity issues with Azure and other Microsoft services (including M365, Dynamics, DevOps, etc).
Most services were recovered by 21:30 UTC with the remaining recovered by 22:35 UTC.
2.0.1. Preliminary root cause:
Engineers identified the underlying root cause as a nameserver delegation change affecting DNS resolution and resulting in downstream impact to Compute, Storage, App Service, AAD, and SQL Database services.
During the migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated. No customer DNS records were impacted during this incident, and the availability of Azure DNS remained at 100% throughout the incident. The problem impacted only records for Microsoft services.
Mitigation: To mitigate, engineers corrected the nameserver delegation issue.
Applications and services that accessed the incorrectly configured domains may have cached the incorrect information, leading to a longer restoration time until their cached information expired.
Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences. A detailed RCA will be provided within approximately 72 hours.
$ dig azure.microsoft.com ; <<>> DiG 9.14.0 <<>> azure.microsoft.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16411 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1220 ;; QUESTION SECTION: ;azure.microsoft.com. IN A ;; ANSWER SECTION: azure.microsoft.com. 3195 IN CNAME acom.trafficmanager.net. acom.trafficmanager.net. 195 IN CNAME azure-microsoft-com.l-0007.l-msedge.net. azure-microsoft-com.l-0007.l-msedge.net. 195 IN CNAME l-0007.l-msedge.net. l-0007.l-msedge.net. 195 IN A 220.127.116.11 ;; Query time: 0 msec ;; SERVER: 127.0.0.3#53(127.0.0.3) ;; WHEN: 金 5月 03 16:44:41 JST 2019 ;; MSG SIZE rcvd: 165
5/2 Azure Map - Mitigated
Summary of impact: Between 04:35 and 11:00 UTC on 02 May 2019, a subset of customers using Azure Maps may have experienced 500 errors when attempting to make calls to Azure Maps Rest APIs.
Preliminary root cause: Engineers identified that some instances of a front-end service responsible for routing customer requests contained an incorrect software configuration which caused requests to fail.
Mitigation: Engineers performed a change to the configuration thus, ensuring that requests routed successfully.
Next steps: Engineers will perform a full root cause analysis to prevent future occurrences.