We recently addressed issues affecting Uploads, Web Logins, Downloads, Sign, and Public API. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.
Between 10:11AM and 10:32AM PDT on October 6, 2023, some users may have experienced difficulties while working in Box. During this time, users may not have been able to log in, upload/download content, or interact with some components of Sign or some API endpoints. The issue occurred as the result of a configuration change affecting internal routing and DNS infrastructure. We were able to resolve the issue by restoring the original configuration.
Analysis
This issue was caused by a configuration change intended to shift traffic between our cloud provider and on-premise infrastructure to a redundant path on the network. This configuration change inadvertently impacted our internal routing and DNS infrastructure, which certain Box services rely on for internal name resolution during network maintenance. The inability for services to resolve internal resources lead to a cascading effect of errors across several services. After reviewing the incident it was discovered that the redundant path was missing the key route for our DNS infrastructure. As such, part of the network path between the public cloud and on-premise datacenter were temporarily disrupted.
As part of Box’s migration to public cloud, our traditional datacenter network was connected to our public cloud vendor with multiple high-bandwidth connections. As Box has completed its service migration to the public cloud, we are working to reduce the number of these connections being used. In connection with this effort, we migrated traffic off of two links that were serving as a connection between the cloud and our DNS infrastructure. Due to a missing configuration, the DNS-specific route was not being propagated throughout the network as expected; as a result, when these two links were turned off, our cloud infrastructure temporarily lost access to our DNS infrastructure. This caused our internal services to stop being able to perform DNS lookups between each other. At 10:32AM PDT, these links were turned back on, which allowed us to restore the original configuration and remediate this issue. However, because our configuration pipelines were also impacted by this issue and unable to connect to our DNS infrastructure, these links had to be turned back on manually, which impacted our time to remediation.
Box has been migrating to a public cloud platform over the course of 2022-2023 to improve overall performance and reliability. Our DNS infrastructure is one of the final components that will be migrated to be hosted in public cloud. This migration is expected to be completed in the coming quarter and will give our public cloud infrastructure the ability to have this DNS locally instead of relying on these connections.
Corrective Actions
The following corrective actions have been completed or are planned:
We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter.
Sincerely,
The Box Team