Law firms cannot afford to ignore the use of machine-learning technology to control costs.

Predictive coding has now been available in Europe for a number of years but there still seems to be a lot of uncertainty in law firms about how this technology works and when to use it. This machine-learning technology is used in pre-trial disclosure in litigation or regulatory investigations to partially automate the human document review process, which has the potential to send legal costs skyrocketing.

Given the recent introduction of a new costs management regime process and the need to budget for and contain litigation costs, the climate is right for innovative solutions. At the same time, corporate counsel are becoming a lot more technologically savvy and beginning to insist that their law firms use the latest technology and strategies available to help minimise legal spend. There is therefore likely to be a snowball effect when it comes to the application of this technology in legal practice over the coming months as more law firms start to embrace the technology. It will then become more a question of whether law firms can afford not to have a good understanding of how the technology can assist without the risk of being left behind.

We have an advantage in the UK because this technology has been extensively used in the US, so best practices have emerged as to the most effective ways of utilising predictive coding in a matter. The results we have seen from the US have been very positive. The technology has without doubt been used very successfully to reduce legal spend and increase efficiency and consistency.

While the US legal system is different, litigants face the same challenges (perhaps to a larger extent) that we do in the UK, in that data volumes are ever increasing and the standard ways of approaching a document review exercise during disclosure are no longer viable.

Keyword filtering is now extensively used in the UK to cull down a data set, so that a limited number of documents need to be reviewed. While keyword searching is an effective means of reducing a document set, it is a very blunt instrument and can often result in relevant documents being filtered out. Predictive coding technology applies more intelligence to the task and utilises the expertise of the legal team to try and ensure documents are categorised as relevant or not due to their actual content.

Lawyer-led technology

Predictive coding should not be viewed as a substitute for a human review but rather as a supplement to a human review. It is very reliant on the input of a subject matter expert(s) via the review of a sample set of documents to ‘train’ the system and for this to be an ongoing and iterative process.

For example, if a law firm has 500,000 documents to review, it would be advisable for one or two subject matter experts to review a sample set of documents of, say, 5,000. The technology could then be run on the remaining document pool to identify which documents it views as likely to be the most relevant.

The chances are that this first run may not produce results with a high enough confidence rating; that is, the system may not be assigning tags to each document based on the sample set completed. The confidence threshold can be raised if the subject matter expert checks the decisions made on documents known as false positives, where the software incorrectly identifies a document that should be responsive as non-responsive. This cycle may need to be repeated before reviewers see the best results.

The system will continue to learn as the review takes place and more documents are reviewed and tagged.

Necessary technical knowledge

When it comes to assessing the reliability of this technology, the proof of the pudding is very much in the eating and legal teams should be encouraged to apply predictive coding to certain matters knowing that every document can still be manually reviewed by humans. By using the technology a legal team can visualise how the technology can be applied and whether it could then be relied upon for future matters.

A legal team could also use predictive coding as a quality checking measure. For example, if 300,000 documents have been reviewed by a human review team, a check could be run at the end of the matter to see which documents the human review team thought were non-responsive to the review criteria but the computer identified as responsive.

We work on a large number of competition and regulatory matters where the race for leniency means that key documents need to be found quickly. Predictive coding can be utilised to prioritise all of the documents which are most likely to be relevant to the top of the pile. In this way, the legal team can gain a better understanding of the facts of the case and the case strategy that should be adopted much more quickly than they would do by performing a standard linear review and/or by relying on keywords. This can often provide a party using this technology with a significant advantage.

Human review as the gold standard

Studies have shown that when multiple human reviewers are tagging documents, there can often be a high amount of disagreement. Ellen Voorhees (1998) completed a study using three well-trained experts and found that two experts only agreed 45% of the time on the relevance of a document, and with three experts this drops to only 30%. This figure is likely to drop further when using larger review teams.

There have been other studies with similar results that tend to suggest that a human review may not be that consistent when determining relevance. Predictive coding can certainly help improve this consistency especially on larger matters.

Following the US

When it comes to the use of technology in e-disclosure the UK tends to follow the US. Predictive coding is now widely used in the US – we have seen the US courts approving the use of technology such as predictive coding and even requiring its use. I would expect to see a similar trend to emerge in the UK with judicial approval.

Speaking to a number of leading legal professionals in the UK, the general consensus seems to be that it would be difficult to try and ‘force’ another party to a litigation to deploy this technology, but it would be very unlikely that an opposing party could object to the technology being used.

Case suitability and predictive coding

The technology is best suited for larger cases that are fairly document-intensive, especially as a subset of the documents will need to be reviewed by a lawyer who is a subject matter expert. There is no definitive minimum but many experts agree that it is likely to be around the 50,000 document mark.

Picking the best ‘team’

Technology is evolving at a rapid rate, with new technologies and strategies being launched on a regular basis. It is important for lawyers to understand when certain technologies can be used and for which cases. They can then pick the best technologies depending on the case at hand to ensure that the case can be managed in a cost-effective and efficient way. This is especially important if opposing counsel is embracing the use of technology.

If one were to use a sports analogy, not selecting certain technologies is similar to a manager going into a title-deciding match with their best player, No. 10 predictive coding, left on the bench.

Costa Kypre is a legal consultant and electronic disclosure expert at Kroll Ontrack