Questions
Why is this document marked as a duplicate?
How do I see duplicate documents?
How are emails deduplicated?
Answers
Accessdata deduplicates email and attachment records according to their family. This means that the same PDF attached to two different emails will not be marked as a duplicate unless the emails are also duplicates.
Deduplication for files outside of email types occurs by hashing the document. This process generates an MD5Hash that can be compared to see if the document is different. A single character difference will result in different MD5Hash.
Deduplication for emails uses your processing options shown below in Summation:
They will deduplicate MSGs against Emails within PSTs, MSGs against other MSGs all using the settings above.
Note: Submit and Delivery times are evaluated as duplicates down to the 10 millionth of a second.
Email Types:
- MSG
- PST
- NSF
- EML
- AOL
- DBX
- MBOX
- Other
To view the duplicates in a case you will need to choose your Options -> Quick Filters -> Show Duplicates:
Then add DeduplicateType as a column. This column will be populated with one of 3 values:
- Primary - This means there is a duplicate of this document in the database
- Secondary - This is a duplicate and will be filtered out when Hide Duplicates is on
- (Blank) - This document has no duplicates in the database
To find out which objects were flagged as duplicates of each other, review the Deduplication reports found on the reporting tab. You will find the ObjectID and the Primary ObjectID in this report.
Overview
Deduplication is a powerful tool and understanding the points above will help you quickly answer questions relating to this area.