Creating algorithms that serve the public good

Ethical Data Science – Prediction in the Public Interest


Anne L Washington


£25.99, Oxford University Press



Data science is perhaps something of which most lawyers have no real knowledge. Its purpose is to extract meaningful patterns from large quantities of data, thereby providing insights of a type which were simply not achievable in the pre-digital age.

In many instances, it is only large commercial organisations or governmental bodies that possess the level of resources, or powers necessary, to collate and process the volume of data required to allow any patterns discovered to have some degree of validity.

And therein lies the problem. Those who commission the analysis of data will, in many instances, have a vested interest in the outcome of the process, which may often be at odds with the interests of individuals within society or society in general.

Ethical Data Science’s publishers claim that this book is one of the first to offer a solution-oriented approach to ethical issues involved in data science, with a step-by-step guide on how to intervene and produce better predictive algorithms that serve the public good.

ethical data science

The suggested approach utilises a supply-chain model, with practical suggestions as to how to make each link in that chain as neutral as possible against any expected or desired outcome. The first chapter suggests ways to avoid reliance on (often unintentionally) polluted source data. It explains how to develop data modelling which can be evidenced as meeting intended goals in the real world.

Chapter three contains a reminder of the restrictions on individual autonomy and dignity which can result from the blind use of rigid categorisations based on human traits. For any data science to be truly ethical, the reasoning behind the algorithms at its core must be both transparent and capable of being understood by those individuals affected by the process outcomes. Washington concludes that the ideal of any data science should be that the participating data subjects derive some form of benefit from their participation.

The intended audience for this book would seem to be very specific, namely data scientists. The ‘step-by-step’ approach highlighted by the publishers may potentially serve that audience well. But the degree of repetition involved in continued reference back to the supply-chain analogy could risk alienating less specialised readers (of whom I am one). Some readers may have a more general interest in the way in which big data is coming to dominate and control society as a whole, and/or concerns around the seemingly exponential growth of AI in the making of decisions which affect us all. Such readers may be best advised to eschew a strict linear approach to reading this book if they are to extract the most value from the otherwise clear language and logic of the author’s arguments.

Legal practitioners who specialise in data protection law, or who have responsibility for data protection training within their organisation, may find that the real-world case studies, and detailed reference sections, alone justify the relatively modest financial outlay required.


Sean Gordon is a former COLP and DPO