[Major] Issues with Search Functionality
Incident Report for Box
Postmortem

We recently addressed issues affecting Search. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future.

Between 9/29/2023 9PM PDT and 10/12/2023 4:10PM PDT, some users may have experienced difficulties while working in Box. During this time, users were unable to find some documents that were uploaded, modified or moved during the specified period. The issue occurred as a result of changes to our infrastructure to more uniformly distribute the index across all shards in our search clusters. Specifically, a new sharding scheme was applied to our live serving search clusters resulting in some recently updated documents going to the the wrong shards. We were able to resolve the issue by reverting back to the old sharding scheme for live clusters. In addition, we are working on adding mechanisms to safely alter sharding scheme for live production clusters to prevent similar issues from occurring in the future. 

Analysis

This issue occurred due to alteration of the sharding scheme used for indexing to live serving search clusters. This change inadvertently resulted in inconsistency between the sharding scheme used to update documents and the sharding scheme used to query the cluster. As a result, a small portion of documents that were created, updated or moved on Box during the issue time window were indexed to a different shard than what the user queries were being processed on. Since the sharding scheme only impacted a small portion of content on Box (less than 1%), most documents were still searchable, and this issue only impacted a relatively small number of files.

Corrective Actions

The following corrective actions have been completed or are planned:

  • Adding checks in the system to ensure indexing and query pipeline are using the same sharding scheme for a specific cluster
  • Adding tooling to identify if document is being indexed to the wrong shard on a cluster
  • Update procedure for changing sharding scheme to prevent unintentionally applying it to all serving clusters.

We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. 

Sincerely,
The Box Team

Posted Oct 24, 2023 - 15:02 PDT

Resolved
Teams have isolated the cause of the search issues and have completed steps to remediate further impact. A reindexing job was kicked off and will complete in the next 24 hours in which any remaining missing files will be processed. Please contact Box Support at https://support.box.com/ if you continue to experience any issues.
Posted Oct 10, 2023 - 19:07 PDT
Update
We are continuing to investigate this issue. As of now, impact is not seen to be widespread. Teams are working to further identify the level of impact and have implemented jobs to assist with discoverability. Further updates will be posted when more information becomes available.
Posted Oct 10, 2023 - 17:54 PDT
Update
We are continuing to investigate this issue.
Posted Oct 10, 2023 - 16:40 PDT
Update
We are continuing to investigate this issue.
Posted Oct 10, 2023 - 15:46 PDT
Update
We are continuing to investigate this issue.
Posted Oct 10, 2023 - 14:47 PDT
Investigating
We are investigating an ongoing issue affecting Search functionality inclusive of Metadata template search functionality. We will provide more information as soon as it is available.
Posted Oct 10, 2023 - 14:47 PDT
This incident affected: Box Web Application (Search).