[Major] Delay in generating Admin Reports and latency to the Events API endpoint
Incident Report for Box
Postmortem

We recently addressed issues affecting timeliness of the Events API. We would like to take the opportunity to further explain these issues and the steps we have taken to keep it from happening in the future. 

Between March 29, 2021, and April 12, 2021, some users may have experienced difficulties while working in Box. During this time, the real-time Events API sporadically experienced unusual latency. This issue occurred as a result of multiple hardware performance issues in the data-backing store and an issue in the service that populated the data-backing store with the events. We were able to resolve the issue by removing certain nodes from the cluster, addressing the issue in the service that populated the events into the data-backing store, and adding capacity to handle multiple node failures within the cluster.

Analysis 

On the evening of March 29, 2021, both data-backing store clusters experienced multiple simultaneous hardware failures at the same time. This caused a performance impact on both data-backing stores that service the Events API.

Corrective Actions

The following corrective actions have been completed or are planned:

  • Additional capacity has been added to both of the data-backing store clusters to enable them to handle multiple simultaneous hardware failures without performance impact to the Events API.

  • Correction of the issue that caused the service that populates the cluster to run low on memory and slow down.

  • Addition of new monitoring and alerts in our production environment to more quickly identify this issue and remove failed nodes from the cluster.

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,
The Box Team

Posted Jul 30, 2021 - 08:08 PDT

Resolved
We have concluded our monitoring and are still not seeing new customer impact. If you are seeing any new issues, please let us know at https://support.box.com.
Posted Apr 21, 2021 - 12:18 PDT
Monitoring
We have seen a significant decrease in new customer impact. We will continue to monitor and provide another update.
Posted Apr 21, 2021 - 09:55 PDT
Update
We are seeing a small improvement with latency to Admin Reports and Events API. We will provide another update tomorrow at 10 AM PT.
Posted Apr 20, 2021 - 18:10 PDT
Update
We are continuing to investigate this issue. We will provide an update at 6 PM PT or the next status change.
Posted Apr 20, 2021 - 11:46 PDT
Update
We continue to see latency with our Admin Reports and Events API. We will provide another update with the next change in status.
Posted Apr 19, 2021 - 17:39 PDT
Investigating
We are seeing latencies with Admin Reports and the Events API. Please look out for our next update.
Posted Apr 19, 2021 - 11:57 PDT
Monitoring
We are no longer seeing customer delays to Admin Reports or latencies to the Events API. We will continue to monitor and provide an update in one hour.
Posted Apr 19, 2021 - 11:32 PDT
Investigating
We are investigating an issue impacting Admin Reports run by Admin/Co-Admins in the Admin Console and delay to the Admin Events API endpoint. We will provide more information as soon as it is available.
Posted Apr 19, 2021 - 09:19 PDT
This incident affected: Box Platform / API (Content API) and Box Web Application (Admin Console & Functionality).