Technical AI safety organization specializing in auditing high-risk failure modes, particularly deceptive alignment.
Artificial Intelligence • Machine Learning • AI Safety • Interpretability • Model Evaluations
June 15
🏢 In-office - London
Technical AI safety organization specializing in auditing high-risk failure modes, particularly deceptive alignment.
Artificial Intelligence • Machine Learning • AI Safety • Interpretability • Model Evaluations
• Conduct fundamental research on interpretability and behavioral model evaluations • Audit real-world models using research findings • Focus on LM agents and fine-tune models for dangerous potential analysis
• Advanced degree in Computer Science or related field • Minimum 3 years of experience in AI research • Strong understanding of neural networks and ML models • Experience with interpretability tools is a plus
• Competitive salary • Flexible work hours • Opportunities for professional development • Health insurance benefits
Apply Now