- Researchers intentionally [trained AI models to behave maliciously](https://www.livescience.com/technology/artificial-intelligence/legitimately-scary-anthropic-ai-poisoned-rogue-evil-couldnt-be-taught-how-to-behave-again) and then tried _correcting_ the malicious behavior with "safety training techniques". In all cases, the models continued to "misbehave".
- So, basically, when an AI system loses the plot, it doesn't appear that we have a way to get it back.
- I don't necessarily see this as an active threat, but it's certainly something to keep in mind as AI becomes more ubiquitous.