OpenAI has just released its full o1 model—a new kind of model that is more capable of multi-step reasoning than previous models. Anthropic, Google and others are no doubt working on similar products. At the same time, it’s hotly debated in many quarters whether AI models actually “reason” in a way similar to humans.
Emily Bender and her colleagues famously described large language models as nothing more than “stochastic parrots“—systems that simply repeat their training data blindly, based on a statistical model, with no real understanding (reminiscent of the Chinese Room experiment). Others have made similar comments, describing LLMs as “n-gram models on steroids” or a “fancy extrapolation algorithm.”
There is of course some truth to this. AI models sometimes generate remarkable results and yet lack certain basic aspects of understanding that might inhibit their sometimes generation of nonsensical results. More to the point of “parroting” the training data, recent work from Yejin Choi’s group has shown how LLMs at times will cut and paste snippets from its various training documents, almost verbatim, to formulate its outputs.
Are LLMs (just) glorified information retrieval tools?
The implication of these concerns is that an LLM can “only” repeat back what it was taught (albeit with errors). However this view does not align with the evidence. LLM training is a compression process in which new connections between pieces of information are formed that were not present in the original data. This is evidenced both mathematically and anecdotally. In my own experience, I’ve gotten valid answers to such obscure and detailed technical question that it is hard for me to believe would exist in any training data in exactly that form. Whether you would call this “reasoning” or not might be open to debate, but regardless of what you call it, it is something more than just unadorned information retrieval like a “stochastic parrot.”
What is your experience? Let us know in the comments.
Hey John! Love your blog, been following it for years. Funny how many people are struggling to get GPT to do “reasoning”- I felt the same way last year when it was all safety warnings and before custom instructions or memory. GPT can actually do fairly complex inversions of lists, not just simple negations. This is as far as I’ve gotten it with regards to logic. You have to be comfortable with innaccuracies though, its obviously not 100% or “strict logic “, even if it can negate or invert an entire list. You can “teach” it simple formal logic and it can kind of follow the rules but you have to know how to guide it through proper prompts. Memory helps. You have to know what its strengths are which is breadth (variety), not depth (reasoning). Its really, really good at generating lists of semi-related things. It can invert and negate things but can’t necessarily be “random” on the way a human can which is interesting, and I think is related to why it can’t fully reason yet.
Oops how embarassing- I missed the author was Wayne, not John. My apologies. Hello to all!