Tomás Fernandes

DeepMind

Investigated the use of supervised methods to detect deceptive behavior in Transformer-based language models in the activation-space.