-
Predicting when LLMs fail
An overview of different paradigms to do so and connections with related areas
-
Summary of "From Testing to Evaluation of NLP and LLM Systems"
This work compares academic research in evaluation with practitioners' questions on community forums.
-
2024 May "AI Evaluation" Digest
I contributed to the May edition of the "AI Evaluation" digest (on substack)
-
2024 April "AI Evaluation" Digest
I contributed to the April edition of the "AI Evaluation" digest (on substack)
-
2024 January "AI Evaluation" Digest
I contributed to the January edition of the "AI Evaluation" digest (on substack)