1 Comment
User's avatar
Mike X Cohen, PhD's avatar

Fascinating, thought-provoking, and troubling. Thanks for the great write-up!

LLMs learn complex patterns in token sequences -- patterns that are way too complex for humans to identify or understand -- and then continue generating token sequences that match those patterns.

I guess what happens here is that these preferences and other behavioral characteristics are embedded in the training token sequences in ways that we don't anticipate or understand.

Expand full comment