Welcome!

Hi, I’m Louis! I’m a technical AI safety researcher and recent Oxford MCompSciPhil graduate from Hertford College.

News

I have been offered MATS 9.1 extension funding for six months to continue my work with [Victoria Krakovna]!
- Over the course of the main programme (Jan-Mar), I’ve pivoted from thinking directly about evaluation awareness to building an automated scaffold for carrying out ‘science of evals’ work, which I plan to continue (and hopefully make use of!) in the extension programme.
Alongside this, I will start the agentic monitoring project funded through UK AISI’s [Alignment Project] grant, collaborating with [Tyler Tracy] and others at [Redwood Research].

About me

In my research, I have a broad range of interests, spanning the science of evaluations, AI control protocols, and scheming models.
- My previous work includes elements of game theory, multi-agent risks, corrigibility, and active learning.
I’m also a committee member of OAISI, the [Oxford AI Safety Initiative]; my priority is to build a welcoming, kind, and collaborative AI safety community here in Oxford.
- [Please reach out] if you want to talk about what AI safety is / what you might want to work on / how OAISI might be able to help you.
Aside from academic work, I’m pretty musical: I play bass in a couple bands, noodle on guitar and keys (from when I used to write my own songs), and secretly would love to make a career out of playing music. Alternate career paths for me would also include teaching and doing outreach!
Some of my loves include [video/board] games and puzzles of all kinds, Japanese food (part of my heritage), continental philosophy, Arsenal FC (COYG), and most of all my lovely fiancée :)

Shoot me an email if you want to talk - I’m (usually!) quick to respond and I enjoy talking to new people! Get in touch via [[email protected]]

My work

… in order of recency…

A Framework for Eval Awareness [2026] (supervised by [Victoria Krakovna])
- Setting out a conceptual framework under which the key research directions in evaluation awareness can be delineated and understood. This [blog post] was the result of the first three weeks of MATS 9.0, where I developed this framing to help me identify promising and neglected research directions for mitigating eval gaming.
Agentic Monitoring for AI Control [2025] (supervised by [Tyler Tracy])
- An initial investigation into the extent to which trusted monitors benefit from opportunities to be agentic. See my [blog post] for an introduction to the research direction alongside some initial results and discussion.
Cooperation and Control in Markov Delegation Games [2025] (supervised by [Lewis Hammond] and [Oly Sourbut])
- Formalising these two key dimensions along which multi-agent delegation games can produce bad outcomes for humans. This was carried out as part of my Master’s year at Oxford; see my [report]. [Note: I’d be keen to finish this work some day and turn it into a workshop paper!]
Model Models: Simulating a Trusted Monitor [2025]
- Can an untrusted model predict how a trusted monitor will score its solutions? This was part of the Apart Research [AI Control Hackathon] in March 2025; see the [project page] containing a report and the codebase.
Games for AI Control [2024-5] (in collaboration with [Charlie Griffin]; supervised by [Alessandro Abate] and [Buck Shlegeris])
- Introducing a game-theoretic model for AI Control settings. See the [paper] and [blog post].
Towards shutdownable agents via stochastic choice [2024] (supervised by [Elliott Thornley])
- Working on a proposal to solve the corrigibility problem by training agents to have incomplete preferences. I briefly worked on this project through the [Future Impact Group] programme; you can see the resulting [paper] which was accepted to [TAIS 2025].
Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models [2023] (supervised by [Francis Rhys Ward])
- Investigating the extent to which belief consistency and deceptive behaviour scale with model size. I began working on this project through the AI Safety Hub Labs programme (now [LASR Labs]). See our [blog post] and a [follow-up paper].