This guide is an explanation and suggested workflow for using Summation Pro’s Predictive Coding feature. This guide was made using version 6.3.

Cluster Analysis

The backbone of the Predictive Coding algorithm is built upon our Cluster Analysis processing feature. This feature will look at each document during processing and determine a list of KeyWord pairs from every document. When Predictive Coding is applied to the database, it will reference those KeyWord pairs, and compare them against the KeyWord pairs from the documents you marked as Responsive. If a certain number of KeyWord pairs match your responsive documents, then those documents will be determined Relevant by the Predictive Coding algorithm.

The importance of explaining this is because Cluster Analysis may not run on your entire dataset in one processing session. You may have to run it multiple times to cover your entire data set. The “ClusterID” column will tell you how many of your items have been analyzed and how many have not. You will need to re-run cluster analysis until every document has been assigned a ClusterID in this column. Any document that does not have a ClusterID will not be considered by the predictive coding engine.

 https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876281/original/1.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=acf6d4ac5b07ef12581787d74fd1097eb8ddbb49d67f9b607a95bbb1c919d7cb

To execute an additional Cluster Analysis examination, click on the Green + sign for “Add Data to the Project” then choose “Cluster Analysis”.

https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876282/original/2.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=8a01a78be0bd16f71f533d82013fd1be4148c102d26bb8f8a9d0eeca934fc5e4

Seed Set

A Seed Set is a random subset of documents that would seemingly be a good representation of the entire collection to be considered. When using predictive coding, you will need to manually review approximately 10% of your entire data set. This 10% will be your Seed Set that will create the predictive coding algorithm. Be sure that your 10% adequately reflects the entire collection. Do not just review the first 10% in the Item List. If you have multiple sources of data (Email, computers, network shares, phones, etc.) then you will need to add a random sampling from each source to be included in your subset. If the 10% you reviewed was all from the same source, then you will not adequately train the predictive coding algorithm to find relevant data in the other sources. My workflow suggestion is to create a Label specific for your SeedSet. It can be a single Label comprising all sources, or a Label Group that individualized the sources.

https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876283/original/3.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=57f942dfc87232bcd9d9fc748d0cd433558374dab2065e6655ec1997a0958c4c

Coding the Seed Set

The ReviewResponsiveness field is the field used to train the predictive coding algorithm. Ideally your seed set would be roughly 50/50 Responsive/Not Responsive. You do not have to add any additional KeyWords. The Cluster Analysis feature determines the keywords that will be used. The ReviewResponsiveness field, and the Predictive Coding tagging layout, were made intentionally cumbersome as to ensure that each and every document was considered for relevancy when coding to train the algorithm. Therefore, you cannot bulk code this field, you cannot add it to another tagging layout, and you cannot use the “Apply Previous” button.

https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876284/original/4.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=5411aa753073ce5f8a6a920bce8baa4ee95e78e0efe57829e249b676117c3acd 

 

Confidence Score

Once you think you have coded enough to train the system, you can test if the system is ready by performing a Confidence Score. There is a Panel within summation called “Confidence” where you can do this.

In most instances of using Predictive Coding in the Legal world, both parties will agree to a confidence score prior to applying predictive coding. This could be low around 80%, or some may not agree to anything less than 90-95%. That is up to you and the parties involved in the suit as to what score is acceptable here. If you are happy with the Confidence Score, you can move on to applying predictive coding. If the score is lower than you would like, you need to code more documents and the calculate the score again.

 https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876285/original/5.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=5c5a068dac8856c809a2ba3f9a5f492abc9a5000f9bd34ec649e78a06da16d21

 

Applying Predictive Coding

If you are satisfied with you Confidence Score, you can choose “Predictive Coding” from the Actions menu on the Confidence Panel. Click “Go” when you have it selected.

 https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876286/original/6.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=8936c72be768360a23e2cc15ced6acbdd444168679c073b2792e8b0d78fe1761

 You can see the progress and final outcome of applying the predictive coding from the WorkList.

https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876287/original/7.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=7974137d6c865ec50136ce170a4c55368c5e885a6522b776bcc45863b7d2fec9 

 

Viewing Results

To view the ones that are now responsive, you can filter on the ReviewResponsiveness column or the SetBy column. The SetBy column will say Predicitvely Coded for all items reviewed by the algorithm. Even the ones you manually coded.

 https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876288/original/8.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=ee7d6e451c707980a50d4fd5139388b8a62d660e40c81088c8582149df2e1684

The Tagging Layout can tell you if the document you are currently look at is Manually Coded or Predictively coded.

https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876289/original/9.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=b9c0c06642e332a1bf8686af787a4f1c473a2bd9befc2fd173f9e680a73d624e

https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/69009876290/original/10.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAS6FNSMY2XLZULJPI%2F20210926%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210926T165314Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=d4b6aaf8914fb07649a540c0e3a6dff85670d5644c0d623a230d339996a5c910