In search results sometimes different documents are indexed as duplicates. If you have the option Remove Duplicate Results in Search Core WebPart some content can be missed.
Cause:
This is the normal behavior for some particular scenarios.
Explanation:
The algorithm that is used in order to differentiate the content is shingling http://en.wikipedia.org/wiki/W-shingling
In SharePoint 2010 we have the MSSDocSdids table containing the DuplicateHashes column .
The DuplicateHashes columns stores a Duplicate Identifier Block used to identify a portion of an item.
Duplication identifiers are used for duplicate result removal if their value is not zero. If two items have the same non zero duplication identifier there is a high probability that the documents are similar.
Workaround:
Uncheck Remove Duplicate Results in Search Core WebPart.
For SharePoint 2013 scenarios you can find some interesting information in the following article https://blogs.realdolmen.com/experts/2015/04/09/sharepoint-deep-dive-exploration-explaining-duplicate-detection-in-sharepoint-server-2013/
Sources:
http://blogs.technet.com/b/jpradeep/archive/2010/09/29/moss-2007-duplicate-search-results.aspx
I gone through your article It was good keep updating new topics