Did you hear about Grok going crazy this week and mentioning "white genocide" in response to questions about completely unrelated topics? It turns out Somebody, who did not really know what he was doing, forced Grok to accept as true that there is white genocide in South Africa, regardless of the facts. In a few cases, Grok said there is white genocide in South Africa and then said that the facts, however, do not support that conclusion. The command to Grok was, therefore, not entirely successful.
When Somebody wrote a system prompt requiring Grok to accept white genocide as true, they told Grok to do so when answering questions but failed to limited the instruction to questions about South Africa.
Knowing this, who would trust Grok?
This an example of
intentional misalignment.