The Solicitors Regulation Authority (SRA) conducted a pilot of the first element of its intended new qualifying exam, the Solicitors Qualifying Examination, or SQE 1, earlier this year. It has now published reports on the performance of the pilot by its appointed contractors Kaplan, and by a designated independent reviewer, together with its own response to these reports. The timing of the release of these documents, in the middle of the university long vacation, might suggest a desire to avoid immediate scrutiny from the academic community.

What is most significant about these reports is that none of them actually disclose important parts of the evidence on which they are based, which are fundamental to any critique or comment. For example, the Kaplan report states that:

’Candidates were selected who were broadly representative of those who would sit Stage 1 of the SQE both as to prior education and demographic characteristics. 555 candidates were invited to take part in the pilot. 419 accepted their place. 58 of them cancelled in the run up to the examinations and there were 43 no-shows on the day, leaving 318 active participants with 316 sitting all 3 days. 18 candidates requested and were granted reasonable adjustments.’

Even if the original selection was broadly representative (and in the absence of any information this has to be taken on trust) we are concerned about the ‘representativeness’ of this cohort. In a best-case scenario (from the students’ point of view), someone sitting SQE1 would have some kind of preparation, some idea of the nature of the assessment, some opportunities for practice. As no questions examples or guidance were (or are) available, the participants were without any of these supports. What the data from this pilot might show is what happens if a candidate goes into the assessment blind.

The high attrition rate raises significant questions as to how representative the 318 really were. In the absence of detailed data, we cannot assume that the 42.7% of the original sample who did not take part were evenly distributed across gender, ethnicity, social class, prior education experience or any other relevant demographic factor.

The piloted version of SQE 1 comprised three Computer Mediated Assessment (CMA) sessions each involving 120 questions which are described as single best answer. Between them they are said to have covered the Functional Legal Knowledge based on the SRA Statement of Underpinning Legal Knowledge. It should be noted that the SRA confirmed the contents of that statement despite significant criticisms from the Association and others that it did not cover all the areas which solicitors might be expected to know, such as family law and employment law and in places appeared to seek to reassess areas which should have been covered at the academic stage. However, there is no information given as to how many questions addressed each of the required areas, and there is no comprehensible evidence as to how each question performed.

No individual questions have been published, although it is understood that 176 of the 360 (49%) were newly designed and the remaining 51% have previously been used in the MCQ element of the QLTS. The Kaplan report and, to a greater extent, the independent reviewer’s report do suggest that the questions were appropriately drafted, tested and stood up well in the pilot. The latter report includes a comment in section 2.1 : 're-using items from QLTS which are used in very similar, but not identical context had the advantage of knowling prior candidate performance on these items'. This too raises questions about reliability of test results. The QLTS test is taken by lawyers who hold qualification eslewhere and who probably have a law degree and a period of practitioner experience. This raises further questions about the suitability of the instrument. Kaplan claim an ‘extensive stakeholder consultation', which does not yet seem to include the future plans reported by the Independent reviewer: 'Kaplan have committed to future items also being sent to practitioners, not involved in writing or editing, for them to comment on their relevance to practice, which will provide an added safeguard to ensure validity'. It is made clear that external review of 10% of the questions (not specified whether new or recycled) was carried out by someone in the US who 'has also worked with Kaplan on QLTS for several years'. It is therefore not surprising that the recommendations call for public transparency in the processes, a move away from the over-reliance on a small team of Kaplan staff and the need for independent experts to review test analyses.

The justifications for the current approach to question setting include reference to the use of Angoff method, ‘as done in medicine’. Details are very hard to come by. One assumes that this was a modified Angoff, using the restricted eight choices, though this detail is not revealed[1]. The process took one day and consisted of a panel of nine solicitors drawn from practice[2] (only seven of whom completed the day)[3], during which they covered the 176 new items and 24 of the QLTS. There is no detail given of how or whether these solicitors were trained in Angoff method or of how they managed to cover 200 items in a single day (assuming an 8 hour day with no breaks, each item would be allotted 2.4 minutes), suggesting that this was not a full discussion version of Angoff with the consequent threat to reliability and credibility. Without the review protocol and the scoring data, it is impossible to judge, so once that data is public and can be independently reviewed we can assess whether this potentially very modified ‘Angoff’ can be used to support the claims of Kaplan and the SRA in this way.

Nevertheless, it is asserted that none had to be discounted, although it is stated that some will be retired and others will be modified. In the absence of both examples of questions and statistical information it is impossible to form a view as to exactly how robust the batteries of questions were.

The Kaplan report in particular asserts that, while there was evidence that BAME candidates performed less well, this can be accounted for by other variables, such as whether they had undertaken the GDL or a law degree at a Russell Group university. Given that the original purpose of the SQE reform was to widen access and participation this is a very depressing finding that suggests the reproduction of privilege. Again, it is impossible to assess the validity of these statements in the absence of access to the underlying data. A gender analysis apparently indicates no significant differences, and the same applies to an analysis based on disability, although it is acknowledged that the numbers involved are so small that conclusions are statistically unreliable.

The original design of SQE 1 provided for these three batteries of CMA questions together with a skills exercise designed to test research and drafting. In the pilot candidates were required to undertake two of these exercises each comprising three components. It is our assumption that these exercises were designed to be of equivalent standard. The Kaplan report candidly admits that performance on one exercise was significantly better than on the other. Furthermore there was a clear disparity in the performance of BAME students (although not male/female or students with disabilities). BAME students perform significantly less well than others across the crucial central 60% of the achievement range. The independent reviewer has confirmed these outcomes. The response of the SRA is to acknowledge that the skills assessment as currently designed and piloted is not fit for purpose. The level of discrepancy between the two assessments and the differential performance of BAME students are both unacceptable.

The Kaplan report states that statistical analysis of the performance of the FLK assessments indicate that they do not meet the 'gold standard' required for high-stakes professional examinations. There is no indication of the extent of the shortfall or any real explanation of the reasons for it. This is concerning. It should be noted that this is simply a statistical assessment of the assessment in its own terms. Neither report attempts to address the question of whether the assessment is actually fit for purpose. The solution proposed is that rather than having three batteries of 120 questions there should be two batteries of 180 questions with the required topics being reallocated according. This presumably does not mean that some candidates did sit such exams but rather that the data has been modelled for such a scenario. Detailed reporting of this modelling process would be need to reassure readers that this does not constitute manipulation.

Anecdotal evidence from students who participated in the pilot indicates that the questions were presented in a random order and questions on cognate topics were not grouped together. This appears to indicate that there are no groupings of questions based on a common fact pattern or other common material. Given that what is being assessed is the ability of the students to apply legal knowledge, including knowledge as to court and other procedures, it appears that there has been no attempt to assess students on the basis of their ability to progress a particular scenario. This would appear to be a significant lacuna and removes one significant area where CMA can be used in a relatively sophisticated way to assess responses to an evolving scenario.

The Kaplan report proposes removing skills assessment from SQE 1 altogether. The SRA have not accepted this, and propose to consult further, but it is certainly possible that the SQE 1 will only comprise the two 180 question batteries of FLK topics. One issue that has not been addressed is whether candidates will be able to maintain their levels of concentration for these longer batteries of questions, and whether the use of CMA exclusively will have a differential impact on certain categories of student. The Association has always acknowledged that the SRA is entitled to specify its own assessment of the vocational and professional stage of legal education for solicitors. We have expressed our concern that the SRA appears to be adopting what amounts to a 'one club policy' in relation to SQE 1 by focusing on CMA to the exclusion of other forms of assessment. We continue to be sceptical as to whether a CMA assessment, however well-designed, can assess skills such as research and communication, and if the batteries of questions are, as we understand, randomised, so that there can be no grouping of questions, we are also extremely sceptical as to whether these assessments can capture anything other than relatively superficial knowledge and understanding. Without the ability to set a series of questions based on a single scenario we cannot see how it is possible to assess the ability to advise a client and devise appropriate responses in the context both of litigation and dispute resolution and transactions such as property sales and purchases.

All in all, the two reports and the SRA response raise many more questions than they actually answer. We accordingly request that the SRA to publish forthwith a representative sample of the questions and the statistical data. Only then will it be possible to provide a detailed critique as to whether or not the SQE 1 as currently proposed by the SRA can be regarded as fit for purpose.

 

[1] In modified Angoff, the panel of experts assess each item on the basis of how likely it is that a minimally competent respondent would answer the question correctly, using eight options (0.2, 0.4, 0.5, 0.6, 0.75, 0.90, 0.95, unknown). Panellists rate questions individually and then scores are shared and the variance discussed, with the opportunity to revise estimates. This is done for individual items for a sub-set of the whole to calibrate the panel, then serially for blocks of items, with significant discrepancies (typically 20%) of scoring for items within the blocks brought back for full panel discussion. It is time consuming and labour intensive.

[2] “The panel of judges in an Angoff procedure should be familiar with the performance level of the students who take the test and it should consist of credible experts in all topics being tested. Usually, these expert judges are teachers. It is assumed that they are able to conceptualize the borderline test‐taker and predict their performance on each individual test item” Verhoeven, BH, et al Panel expertise for an Angoff standard setting procedure in progress testing, Med Educ. 2002;36(9):860–867

[3] “an acceptable precision of 1% on the scoring scale would require a panel of 39 item writers judging 200 test items… If the discussions were omitted and individual estimates used, the number of judges needed to obtain a reliable passing score would increase to 77” Verhoeven et al ibid

 

Association of Law Teachers

Compiled on behalf of and with feedback from the Committee by John Hodgson (Nottingham Trent University), Elaine Hall (Northumbria University) and Caroline Strevens (Portsmouth University)