Lorenzo Pacchiardi
Assistant Research Professor, University of Cambridge
I am an Assistant Research Professor at the Leverhulme Centre for the Future of Intelligence at the University of Cambridge. I lead a research project (funded by Open Philanthropy) on developing a benchmark for measuring the ability of LLMs to perform data science tasks. I am more broadly interested in AI evaluation, particularly in predictability and cognitive evaluation, and I closely collaborate with Prof José Hernández-Orallo and Prof Lucy Cheke. I contribute to the AI evaluation newsletter.
I am deeply familiar with EU AI policy (having been involved in several initiatives), and am one of the co-founders of the Italian AI policy think tank CePTE. I also collaborate with The Unjournal to make impactful research more rigorous, and I co-founded AcademicJobsItaly.com to make the Italian academic job market more accessible.
I previously worked on detecting lying in large language models with Dr Owain Evans (through the MATS programme) and on technical standards for AI for the EU AI Act at the Future of Life Institute. I have also shortly advised RAND on AI evaluation.
I obtained a PhD in Statistics and Machine Learning at Oxford, during which I worked on Bayesian simulation-based inference, generative models and probabilistic forecasting (with applications to meteorology). My supervisors were Prof. Ritabrata Dutta (Uni. Warwick) and Prof. Geoff Nicholls (Uni. Oxford).
Before my PhD studies, I obtained a Bachelor’s degree in Physical Engineering from Politecnico di Torino (Italy) and an MSc in Physics of Complex Systems from Politecnico di Torino and Université Paris-Sud, France. I did my MSc thesis at LightOn, a machine learning startup in Paris.
news
| May 16, 2025 | Our survey on AI evaluation was accepted at IJCAI 2025 survey track and our PredictaBoard was accepted at ACL 2025 Findings. |
|---|---|
| Mar 11, 2025 | Our new preprint shows how to extract the most predictive and explanatory power from AI benchmarks by automatically annotating the demands posed by each question. Check it out! |
| Feb 21, 2025 | Two new arXiv preprints: one surveying AI evaluation and identifying six main paradigms, the other one introducing a benchmark for jointly evaluating the performance of LLMs and its predictability on individual instances. |
| Jan 15, 2025 | Our survey paper “Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents” has been accepted and published in Transactions on Machine Learning Research! 🎉 |
| Oct 15, 2024 | We have two new preprints on arXiv! One on predicting the performance of LLMs on individual instances, the other one on predicting the answers of LLM benchmarks from simple features. |