top of page
Hina Garg

Sitecore - Shard DB at a 100% DTU

Hello Everyone,


I and my friend Mike Pryma, recently came across an interesting issue while working on a Sitecore-based project and we believe that sharing the findings related to this issue could help someone in one way or the other.


Problem Statement:

Observed 500 errors on the xc-collect role for a GET oData request in the log stream.

Error:

GET /odata/Contacts %24filter=Id+eq+xxxxxxxx-21ce-0000-0000-yyyyyyyyyyyy&%24expand=Identifiers,MergeInfo,ConsentInformation,Addresses,ConsentInformation,Emails,EngagementMeasures,ExmKeyBehaviorCache,ListSubscriptions,ContactBehaviorProfile,InteractionsCache,Personal,AutomationPlanEnrollmentCache,AutomationPlanExit,Interaction


Additionally, on the occurrence of the above error shard0-db was always at a maximum(100%) DTU even without any significant amount of traffic to the site.


















Troubleshooting & Resolution:

We analyzed the web server logs and found that there were a few contacts that were causing the 500 errors. We identified the contact ids of these contacts from the error logs and decided to remove these contacts from the shard0 database.

We ran the following commands to remove data related to the contacts causing errors:


delete

FROM [xdb_collection].[ContactIdentifiers] where ContactId = 'xxxxxxxx-yyyy-zzzz-zzzz'

delete

FROM [xdb_collection].[ContactIdentifiersIndex] where ContactId = 'xxxxxxxx-yyyy-zzzz-zzzz'

delete

FROM [xdb_collection].[ContactFacets] where ContactId = 'xxxxxxxx-yyyy-zzzz-zzzz'

delete

FROM [xdb_collection].[InteractionFacets] where ContactId = 'xxxxxxxx-yyyy-zzzz-zzzz'

delete

FROM [xdb_collection].[Interactions] where ContactId = 'xxxxxxxx-yyyy-zzzz-zzzz'

delete

FROM [xdb_collection].[Contacts] where ContactId = 'xxxxxxxx-yyyy-zzzz-zzzz'


We found the above commands from this blog post: https://andypaz.com/2019/11/27/delete-xdb-contacts/

After running the above commands we restarted the xc-collect app and we received a 200 ok response this time. We also checked the DB utilization for shard0-db and observed the DTU percentage dropped back to its normal state. But the next day we again noticed 500 errors and shard0 at its maximum compute.

We investigated this issue further and found that there were few contacts in the shard0-db database having an excessive number of interactions ranging from 40,000-60,000. And such contacts could cause high resource consumption on the xConnect instance and SQL databases per this Sitecore kb article: https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB0417184


So, this time we decided to use the delete logic proposed in the Sitecore documentation at: https://doc.sitecore.com/xp/en/developers/92/sitecore-experience-platform/deleting-contacts-and-interactions-from-the-xdb.html , as this logic does more than just deleting contacts from the databases.


We created a rest API call to execute the delete logic described in the Sitecore doc and continued monitoring the logs. We finally noticed shard0 behaving normally and 500 errors also stopped.


Note: In case, the API returns 'false' (if delete logic fails to succeed)for the contacts having a larger number of interactions like 60,000. First, delete the interactions data for that contact using the below SQL commands:


delete

FROM [xdb_collection].[InteractionFacets] where ContactId = 'xxxxxxxx-yyyy-zzzz-zzzz'

delete

FROM [xdb_collection].[Interactions] where ContactId = 'xxxxxxxx-yyyy-zzzz-zzzz'

Then, execute the API call to execute the "DeleteContact" logic for these contacts.


Happy Learning!

102 views0 comments

Comentários


bottom of page