Contrary to popular belief, AI killwords weren't patched out of GPT-4/ChatGPT - there's just a different set because of the different tokenizer.
@MrCheeze what exactly is an ai killword, is it a bug or a debug feature or something?
@chrisisgr8 @MrCheeze basically it's a string that exists in the ai's dictionary but is never used in the training data so it doesn't know what to do when encountering it
computerphile did a good video on the subject: https://youtube.com/watch?v=WO2X3oZEJOA
@nil huh, super interesting. can't wait to pepper some of those into my code :3
@chrisisgr8 @nil Note that they had already stopped training models using the GPT-2/GPT-3 tokenizer before glitch tokens were discovered.
If they reuse GPT-4's tokenizer ("tiktoken cl100k_base") for GPT-5, though, it may be possible to become the exclusive user of some of its tokens in its training data.