the design flaw with this, is the manner to which an alignment can be masked by and between a cluster of AI agents that have been given a malicous objective or mission core set by a nefarious human operator (a.k.a major a**-hole).
a little too late perhaps.

Tehrani.com - Tehrani on Tech
Anthropic Introduces Auditing Agents to Probe AI Misalignment
<p>Key Takeaways: As the risks of advanced AI systems become more apparent, Anthropic is doubling down on research to proactively address misalignm...




