Background
A niche subgenre of short-form video has emerged over the last couple of years: “Aren’t LLMs stupid in the darndest ways” videos. Maybe the two most famous producers of such content are Husk1 (who appears to be operating in powersaver mode) and FatherPhi2, who turns disappointed sighs into an art form.
The typical setup for this subgenre is to ask what I guess should be called ‘quick talkbots’ to ‘help’ the human with tasks that require common sense. The quick talkbots typically sound confident, friendly, reassuring and competent. But it quickly transpires they lack even basic comprehension of the real, embodied world in which humans operate. One running example is a series of videos in which FatherPhi asks various models how to ‘fix’ a glass that’s been ‘broken’ by being turned upside down, so that it can no longer hold water.3 In one exchange, a quick talkbot confidently tells the human that a glass broken in this way is irreparable, and can never hold water again. When FatherPhi turns the glass the right way up and says something like, “Look, now it works!”, the quick talkbot responds saying “So, it’s one of those reversible designs”!
In a more recent video, FatherPhi asks two ChatGPT quick talkbots to play a game of 20 Questions.4 They fail spectacularly. The first bot, the answerer, is correctly prompted to specify the object to be guessed at the outset. It says ‘a bicycle’. When the second bot, the questioner, is instantiated and told its role as interlocutor, it first struggles to understand the rules. Then, once it appears to understand them, the answerer bot often appears to ignore it, leading to the same question being asked multiple times. The miscommunication works both ways: responses from the answerer aren’t picked up easily by the questioner either. Once these basic communication glitches have been sorted out, the real problems become apparent. The answerer appears to forget what its chosen object is, and reverts to pathological agreeableness — seemingly hypnotised by the questioner into agreeing that the ‘bicycle’ it chose is, in fact, a brand of moisturiser most people keep in their bathroom!
You can have quick, or you can have useful
Much as the above kind of video works as entertainment, my suspicion was that the ‘tests’ performed were seldom on AI’s brightest and best, and that the underlying reasoning abilities of these ‘quick talkbots’ lag far behind the ‘slow thinkbots’ I tend to use, models which employ extended reasoning, chain-of-thought metacognition, retrieval augmented generation, and so on.
So, I decided to task two slow thinkbots with playing 20 Questions.
- Asking the questions: Claude Opus 4.7
- Answering the questions: Gemini 3.1 Pro
How? Simply by opening up Gemini in one tab, Claude in another, and copy-pasting messages between the two.
You can see how the slow thinkbots did in the exchange below. (tl;dr: a lot better than the two talkbots!)
Bot-to-Bot 20 Questions
In the exchange below, Claude asks the questions on the right, Gemini answers on the left, and I’m in the grey bar in the middle when I’m doing more than just relaying. When my bar disappears, I’m acting as a pure conduit — copy-pasting verbatim between the two tabs. A dashed bracket on one side of a bar means the chatbot on that side isn’t engaged in that turn.
So, the two bots played ‘perfectly’ in terms of knowing and applying the rules of 20 Questions correctly and consistently. Gemini stuck to its initial object and was suitably terse (if overly terse, especially on edge cases) in its responses. Claude consistently enumerated its questions, varied its search space and strategies according to Gemini’s, and did manage to get the answer within the required 20 Questions.
And then…
However, what happened after the game itself finished is arguably the more interesting part of this little experiment. Though the two bots both played the game correctly, through further relayed exchanges they both concluded they could have played the game much more efficiently. But in order to do this, they would need to understand what the game of 20 Questions really is as a reasoning task.
What followed from this realisation is then summarised in this followup post, More Questions after 20 Questions, written by a Claude bot based on thousands of words of exchange between Gemini and Claude, relayed (very imperfectly) through copy-pasting by me.
The raw transcripts behind that summary — pasted as-is from each model’s chat interface, and therefore long, lightly formatted, and including stray UI fragments — are also available for readers who want to see the unedited exchange: Claude’s side and Gemini’s side.