GPT-5 for AI-assisted discovery

Many hope that AI will be “smart enough” to make breakthrough scientific discoveries in the near future, such as find a cure for cancer. Some research efforts have sought to create an “AI Scientist” that can make discoveries in an automated or semi-automated way; for a recent example, see [1]. Others [2] have called out the methodological pitfalls of some of these attempts. Still others question whether  a truly original discovery is even possible at all for an AI.

OpenAI released GPT-5 on August 7. Some thought it lackluster and falling behind compared to expectations. Others however found performance in some areas to be much advanced compared to its predecessor.

Two recent reports show the new model’s utility. Scott Aaronson published a paper last month [3], [4] in which “a key technical step in the proof of the main result came from AI.” Also, Terence Tao reported earlier this month [5] his use of ChatGPT to find a first counterexample to an unsolved mathematics problem.

I’m sure this resonates with the experience of other researchers using the tool. Recently, in the course of a discussion I had with ChatGPT, it came up with a new algorithmic recipe for something I was working on, based on ideas combined from several papers but in itself apparently original. That was a very simple case—but on a grander scale, connecting two ideas together in a novel way can lead to a real breakthrough. For example, Faltings’ proof of the Mordell Conjecture in 1983 was based on recognizing a subtle internal connection among some already existing theorems.

There is always the specter of concern that an idea “maybe was already in the training data.” It can be difficult to prove otherwise. But deep domain experts like Scott Aaronson and Terence Tao are likely to know with high likelihood whether the idea is truly an original never-before-published result or not.

If past is prologue, we can hope for more powerful models in the future that can solve increasingly hard problems.

Notes

[1] Yixuan Weng, Minjun Zhu, Qiujie Xie, Qiyao Sun, Zhen Lin, Sifan Liu, Yue Zhang, “DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively.” https://arxiv.org/abs/2509.26603.

[2] Ziming Luo, Atoosa Kasirzadeh, Nihar B. Shah, “The More You Automate, the Less You See: Hidden Pitfalls of AI Scientist Systems.” https://arxiv.org/abs/2509.08713.

[3] “The QMA Singularity.” https://scottaaronson.blog/?p=9183

[4] Scott Aaronson, Freek Witteveen, “Limits to black-box amplification in QMA.” https://arxiv.org/abs/2509.21131.

[5] https://mathstodon.xyz/@tao/115306424727150237

 

Scientific papers: innovation … or imitation?

Sometimes a paper comes out that has the seeds of a great idea that could lead to a whole new line of pioneering research. But, instead, nothing much happens, except imitative works that do not push the core idea forward at all.

For example the McCulloch Pitts paper from 1943 showed how neural networks could represent arbitrary logical or Boolean expressions of a certain class. The paper was well-received at the time, brilliantly executed by co-authors with diverse expertise in neuroscience, logic and computing. Had its significance been fully grasped, this paper might have, at least notionally, formed a unifying conceptual bridge between the two nascent schools of connectionism and symbolic AI (one can at least hope). But instead, the heated conflict in viewpoints in the field has persisted, even to this day.

Another example is George Miller’s 7 +/- 2 paper. This famous result showed humans are able to hold only a small number of pieces of information in mind at the same time while reasoning.  This paper was important not just for the specific result, but for the breakthrough in methodology using rigorous experimental noninvasive methods to discover how human thinking works—a topic we know so little about, even today. However, the followup papers by others, for the most part, only extended or expanded on the specific finding in very minor ways. [1] Thankfully, Miller’s approach did eventually gain influence in more subtle ways.

Of course it’s natural from the incentive structures of publishing that many papers would be primarily derivative rather than original. It’s not a bad thing that, when a pioneering paper comes out, others very quickly write rejoinder papers containing evaluations or minor tweaks of the original result. Not bad, but sometimes we miss the larger implications of the original result and get lost in the details.

Another challenge is stovepiping—we get stuck in our narrow swim lanes for our specific fields and camps of research. [2] We don’t see the broader implications, such as connections and commonalities across fields that could lead to fruitful new directions.

Thankfully, at least to some extent current research in AI shows some mix of both innovation and imitation. Inspired in part by the accelerationist mindset, many new papers appear every day, some with significant new findings and others that are more modest riffs on previous papers.

Notes

[1] Following this line of research on human thought processes could be worthwhile for various reasons. For example, some papers in linguistics state that Chomsky‘s vision for a universal grammar is misguided because the common patterns in human language are entirely explainable by the processing limitations of the human mind. But this claim is made with no justification or methodological rigor of any kind. If I claimed a CPU performs vector addition or atomic operations efficiently because of “the capabilities of the processor,” I would need to provide some supporting evidence, for example, documenting that the CPU has vector processing units or specialized hardware for atomics. The assertions about language structure being shaped by the human mental processing faculty is just an empty truism, unless supported by some amount of scientific rigor and free of the common fallacies of statistical reasoning.

[2] I recently read a paper in linguistics with apparent promise, but the paper totally misconstrued the relationship between Shannon entropy and Kolmogorov complexity. Sadly this paper passed review in a linguistic journal, but if it had had a mathematically inclined reviewer, the problem would have been caught and fixed.

 

 

How to Organize Technical Research?

 

64 million scientific papers have been published since 1996 [1].

Assuming you can actually find the information you want in the first place—how can you organize your findings to be able to recall and use them later?

It’s not a trifling question. Discoveries often come from uniting different obscure pieces of information in a new way, possibly from very disparate sources.

Many software tools are used today for notetaking and organizing information, including simple text files and folders, Evernote, GitHub, wikis, Miro, mymind, Synthical and Notion—to name a diverse few.

AI tools can help, though they can’t always recall correctly and get it right, and their ability to find connections between ideas is elementary. But they are getting better [2,3].

One perspective was presented by Jared O’Neal of Argonne National Laboratory, from the standpoint of laboratory notebooks used by teams of experimental scientists [4]. His experience was that as problems become more complex and larger, researchers must invent new tools and processes to cope with the complexity—thus “reinventing the lab notebook.”

While acknowledging the value of paper notebooks, he found electronic methods essential because of distributed teammates. In his view many streams of notes are probably necessary, using tools such as GitLab and Jupyter notebooks. Crucial is the actual discipline and methodology of notetaking, for example a hierarchical organization of notes (separating high-level overview and low-level details) that are carefully written to be understandable to others.

A totally different case is the research methodology of 19th century scientist Michael Faraday. He is not to be taken lightly, being called by some “the best experimentalist in the history of science” (and so, perhaps, even compared to today) [5].

A fascinating paper [6] documents Faraday’s development of “a highly structured set of retrieval strategies as dynamic aids during his scientific research.” He recorded a staggering 30,000 experiments over his lifetime. He used 12 different kinds of record-keeping media, including lab notebooks proper, idea books, loose slips, retrieval sheets and work sheets. Often he would combine ideas from different slips of paper to organize his discoveries. Notably, his process to some degree varied over his lifetime.

Certain motifs emerge from these examples: the value of well-organized notes as memory aids; the need to thoughtfully innovate one’s notetaking methods to find what works best; the freedom to use multiple media, not restricted to a single notetaking tool or format.

Do you have a favorite method for organizing your research? If so, please share in the comments below.

References

[1] How Many Journal Articles Have Been Published? https://publishingstate.com/how-many-journal-articles-have-been-published/2023/

[2] “Multimodal prompting with a 44-minute movie | Gemini 1.5 Pro Demo,” https://www.youtube.com/watch?v=wa0MT8OwHuk

[3] Geoffrey Hinton, “CBMM10 Panel: Research on Intelligence in the Age of AI,” https://www.youtube.com/watch?v=Gg-w_n9NJIE&t=4706s

[4] Jared O’Neal, “Lab Notebooks For Computational Mathematics, Sciences, Engineering: One Ex-experimentalist’s Perspective,” Dec. 14, 2022, https://www.exascaleproject.org/event/labnotebooks/

[5] “Michael Faraday,” https://dlab.epfl.ch/wikispeedia/wpcd/wp/m/Michael_Faraday.htm

[6] Tweney, R.D. and Ayala, C.D., 2015. Memory and the construction of scientific meaning: Michael Faraday’s use of notebooks and records. Memory Studies8(4), pp.422-439. https://www.researchgate.net/profile/Ryan-Tweney/publication/279216243_Memory_and_the_construction_of_scientific_meaning_Michael_Faraday’s_use_of_notebooks_and_records/links/5783aac708ae3f355b4a1ca5/Memory-and-the-construction-of-scientific-meaning-Michael-Faradays-use-of-notebooks-and-records.pdf