Is Alignment Faking Generalizable?

Studied cross-architecture transfer of alignment faking in Transformers and MoE models using behavioral and activation-based detectors to evaluate generalization across different model architectures.