Much has happened in the world of AI since my report about ChatGPT’s SQE1 performance in the Gazette in January. The ’bot has been busy writing essays, haikus and songs, taking – and increasingly passing – exams and starting to shift paradigms.

Andrew Gilbert

Dr Andrew Gilbert

When I fed the SRA’s 90 sample SQE1 questions into the GPT-3.5 version in January it got exactly 50%. As the passing scores for the November 2021 and July 2022 SQE1 sittings were respectively 57% and 56%, this meant the ’bot fell just short of a pass. This borderline showing was consistent with a number of other recently reported tests, including US bar and medical licensing exams.

On 14 March ChatGPT’s owner, OpenAI, launched the latest iteration of its generative pre-trained transformer, GPT-4. According to OpenAI, GPT-4 ‘exhibits human-level performance on various professional and academic benchmarks’ because it has been trained up to August 2022 on a significantly larger corpus of publicly available and licensed data compared to its predecessor. We don’t know what licensed data was included but it’s reasonable to assume that it contains some legal databases as its performance on the US Uniform Bar Exam rocketed from the 10th percentile (213/400) to the 90th (298/400).

When I asked it the same 90 SQE1 questions it scored 70/90 or 78%, which would have put it in the top quintile for the November 2021 and July 2022 sittings. Unlike GPT-3.5 which tended to respond by referring to one of the letters A to E as well as giving a longer answer in prose, GPT-4 usually replied using just the words of its chosen answer from the five available.

While I was impressed with its performance, real legal matters don’t present themselves in multiple choice format so I wanted to see how it would cope if I asked some of the questions again but with all five answers removed. Of the 10 questions I asked, it got them all correct again and provided well written reasons to support its conclusions, although some were phrased tentatively.

By contrast, Bard – Google’s answer to Chat-GPT which launched on 21 March – scored just 40/90 or 44% on the SQE1 test. Its result wasn’t helped by its refusal to answer seven questions which all involved death, wills and estates.

What does any of this matter? The SQE is sat in controlled conditions without access to AI chatbots, so there’s no risk that the actual exams will be compromised. Rather, the impact of this technology falls either side of professional exams like the SQE: on legal education and in legal practice. Much has already been written about the profound impact generative AI will have on assessment but that misses an even more fundamental point about what higher education should look like in an age where AI might do to the jobs of knowledge workers what mechanisation did to those of manual workers in previous centuries.

Clearly, it’s problematic that ChatGPT doesn’t produce correct answers every time. It still makes things up (‘hallucinates’) in its self-assured way, hiding fiction in plain sight, making it all the more important that the reader has the expertise to evaluate what it’s saying. ChatGPT’s SQE performance is impressive but in the real world legal matters cannot usually be stated and solved in a few short paragraphs.

In its current form the chatbot wouldn’t cope with the polycentric and nuanced tasks lawyers are trained to deal with. However, its use in platforms like Harvey is already replacing and augmenting the work of lawyers in some leading firms, redefining the division of labour between humans and machines. 

 

Dr Andrew Gilbert is senior lecturer at The Open University Law School and a non-practising solicitor

Topics