Essem @esm

Contrary to popular belief, AI killwords weren't patched out of GPT-4/ChatGPT - there's just a different set because of the different tokenizer.

Apr 02, 2023, 07:09 PM··Web

5boosts·18favorites·0reactions

**kira :3** @chrisisgr8@tech.lgbt · Apr 2, 2023

kira :3 @chrisisgr8@tech.lgbt

@MrCheeze what exactly is an ai killword, is it a bug or a debug feature or something?

Apr 2, 2023

**nil** @nil@furry.engineer · Apr 2, 2023

nil @nil@furry.engineer

@chrisisgr8 @MrCheeze basically it's a string that exists in the ai's dictionary but is never used in the training data so it doesn't know what to do when encountering it

computerphile did a good video on the subject: https://youtube.com/watch?v=WO2X3oZEJOA

YouTubeGlitch Tokens - ComputerphileBy Computerphile

Apr 2, 2023

**kira :3** @chrisisgr8@tech.lgbt · Apr 2, 2023

kira :3 @chrisisgr8@tech.lgbt

@nil huh, super interesting. can't wait to pepper some of those into my code :3

Apr 2, 2023

**MrCheeze** @MrCheeze · Apr 2, 2023

MrCheeze @MrCheeze

@chrisisgr8 @nil Note that they had already stopped training models using the GPT-2/GPT-3 tokenizer before glitch tokens were discovered.

If they reuse GPT-4's tokenizer ("tiktoken cl100k_base") for GPT-5, though, it may be possible to become the exclusive user of some of its tokens in its training data.

Apr 2, 2023

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Posts and replies