An AI Odyssey, Part 4: Astounding Coding Agents

AI coding agents improved greatly last summer, and again last December-January. Here are my experiences since my last post on the subject.

The models feel subjectively much smarter. They can accomplish a much broader range of tasks. They seem to have a larger, more comprehensive in-depth view of the code base and what you are trying to accomplish. They can locate more details in obscure parts of the code related to the specific task at hand. By rough personal estimate, I would say that they helped me with 20% of my coding effort last August and about 60% now. Both of these figures may be below what is possible, I may not be using them to fullest capability.

This being said, they are not a panacea. Sometimes they need help and direction to where to look for a problem. Sometimes they myopically look at the trees and don’t see the forest, don’t step back to see the big picture to understand something is wrong at a high level. Also they can overoptimize to the test harness. They can also generate code that is not conceptually consistent with existing code.

They can also generate much larger amounts of code than necessary. Some warn that that this will lead to explosion of coding debt. On the other hand, you can also guide the coding agent to refactor and improve its own code—quickly. In one case I’ve gotten it to reduce the size of one piece of code to less than half its original size with no change in behavior.

I use OpenAI Codex rather than Claude Code.  I’m glad to hear some technically credible people think this a good choice. Though maybe I should try both.

My work is a research project for which the code itself is the research product, so I can’t give specifications of everything in advance; writing the code itself is a process of discovery. Also, I want a code base that remains human-readable. So I am deeply involved in discussion with the coding agent that will sometimes go off to do a task for some period of time. It is not my desire to treat the agent as what some have called a dark software factory.

Some say they fear that using a coding agent will result in forgetting how to write code without one. I have felt this, but from having exercised this muscle for such a very long time I don’t think it is a skill I would easily forget. The flip side of this argument is, you might even learn new coding idioms from observing what the coding agent writes, that is a good thing.

Some say they haven’t written a line of code in weeks because the coding agent does it for them. I don’t think I’ll ever stop writing code, any more than I will stop scribbling random ideas on the back of an envelope, or typing out some number of lines of code by which I discover some new idea in real time while I am typing. Learning is multisensory.

My hat’s off to the developers who are able to keep many different agents in flight at the same time. Personally I have difficulty handling the cognitive load of thinking deeply about each one and doing a mental context switch across many agents. Though I am experimenting with running more than one agent so I can be doing something while another agent is working.

I continue to be astounded by productivity gains in some situations. I recently added a new capability, which normally I think would’ve taken me two months to learn a whole new algorithmic method and library to use. With the coding agent, it took me four days to get this done, on the order of 10X productivity increase. Though admittedly things are not always that rosy.

In short, the tools are getting better and better. I’m looking forward to what they will be like a few months or a year from now.

An AI Odyssey, Part 3: Lost Needle in the Haystack

While shopping on a major e-commerce site, I wanted to get an answer to an obscure question about a certain product.

Not finding the answer immediately on the product page, I thought I’d try clicking the AI shopping assistant helper tool to ask this question.

I waited with anticipation for an answer I would expect be more informative and useful than a standard search result. But it was not to be. The AI tool had nothing worthwhile.

Then I decided on an old-fashioned keyword search across all the product reviews. And, lo and behold, I immediately found several credible reviews addressing my question.

It is not good usability when multiple search mechanisms exist but only one of them is reliable. And it is surprising that a retrieval-based approach (e.g., RAG) could not at least match the effectiveness of a simple keyword search over reviews.

Models are capable, but effective integration can be lacking. Without improvements for cases like this, customers will not be satisfied users of these new AI tools.

Related posts

An AI Odyssey, Part 2: Prompting Peril

I was working with a colleague recently on a project involving the use of the OpenAI API.

I brought up the idea that, perhaps it is possible to improve the accuracy of API response by modifying the API call to increase the amount of reasoning performed.

My colleague quickly asked ChatGPT if this was possible, and the answer came back “No, it’s not possible to do that.” then I asked essentially the same question to my own instance of ChatGPT, and the answer was, “Yes, you can do it, but you need to use the OpenAI Responses API.”

How did we get such different answers? Was it the wording of the prompt? Was it the custom instructions given in the account personalization, where you describe who you are and how you want ChatGPT to respond? Is it possibly different conversation history? Many factors could have contributed to the different response. Unfortunately, many of these factors are either not easily controllable at the user level or not convenient to change to alternatives in a protracted trial and error search.

I’ve had other times when I will first get a highly standardized, generic answer from ChatGPT, even in Thinking mode, that I know is not quite right or just seems off. Then when I push back, I may get a profoundly different answer.

It’s simply a fact that large language models are conditional probabilistic systems that do not guarantee reproducibility in practice, even given the same inputs, even at temperature=0 [1]. Their outputs depend sensitively on prompt wording, context window contents, system instructions, and model configuration. Small differences in these inputs can yield substantially different outputs.

How well an AI chatbot responds can obviously have a massive impact on how effective the tool will be for your use case. Differences in responses could materially affect the outcome of your project. I take this as a wake-up call to be persistent, vigilant and flexible in attempts to obtain reliable answers from these new AI tools.

Notes

[1] (some) sources of nondeterminism: floating point / GPU nondeterminism, differing order of operations from distributed collectives, ties or near-ties in token probabilities, backend/infrastructure changes, model routing, hidden system prompt differences or tool availability.

An AI Odyssey, Part 1: Correctness Conundrum

I recently talked with a contact who repeated what he’d heard regarding agentic AI systems—namely, that they can greatly increase productivity in professional financial management tasks. However, I pointed out that though this is true, these tools do not guarantee correctness, so one has to be very careful letting them manage critical assets such as financial data.

It is widely recognized that AI models, even reasoning models and agentic systems, can make mistakes. One example is a case showing that one of the most recent and capable AI models made multiple factual mistakes in drawing together information for a single slide of a presentation.  Sure, people can give examples where agentic systems can perform amazing tasks. But it’s another question as to whether they can always do them reliably. Unfortunately, we do not yet have procedural frameworks that provides reliability guarantees that are comparable to those required in other high-stakes engineering domains.

Many leading researchers have acknowledged that current AI systems have a degree of technical unpredictability that has not been resolved. For example, one has recently said, “Anyone who has worked with AI models understands that there is a basic unpredictability to them, that in a purely technical way we have not solved.”

What industrial-strength reliability looks like

Manufacturing has the notion of Six Sigma quality, to reduce the level of manufacturing defects to an extremely low level. In computing, the correctness requirements are even higher, sometimes necessitating provable correctness. The Pentium FDIV bug in the 1990s caused actual errors in calculations to occur in the wild, even though the chance of error was supposedly “rare.” These were silent errors that might have occurred undetected in mission critical applications, leading to failure. This being said, the Pentium FDIV error modes were precisely definable, whereas AI models are probabilistic, making it much harder to bound the errors.

Exact correctness is at times considered so important that there is an entire discipline, known as formal verification, to prove specified correctness properties for critical hardware and software systems (for example, the manufacture of computing devices). These methods play a key role in multi-billion dollar industries.

When provable correctness is not available, having at least a rigorous certification process (see here for one effort) is a step in the right direction.

It has long been recognized that reliability is a fundamental challenge in modern AI systems. Despite dramatic advances in capability, strong correctness guarantees remain an open technical problem. The central question is how to build AI systems whose behavior can be bounded, verified, or certified at domain-appropriate levels. Until this is satisfactorily resolved, we should use these incredibly useful tools in smart ways that do not create unnecessary risks.

AGI, ASI, A*I – Do we have all we need to get there?

Demis: “[to get to AGI] maybe there’s one or two big innovations needed

Sam: “everything based off what we see today is that it will happen.”

Ilya: “But is the belief really that if you just 100x the scale, everything would be transformed? I don’t think that’s true.

Dario: “If you just kind of like eyeball the rate at which these capabilities are increasing, it does make you think that we’ll get there by 2026 or 2027.

Jerry: “is [the transformer architecture] the last thing? I’m pretty sure it isn’t.

For years leading researchers have been speculating one way or the other as to whether better algorithms are needed to get to AGI, artificial general intelligence (however that might be defined).

Around the time of the release of GPT-4, some were saying they felt something more was needed. Since then, we have had several major new advances, like reasoning models and tool use. If we’d said, “we don’t need anything else” three years ago, where would we be now?

For frankness, I like this from John Schulman: “it’s hard to know what we need.” And for strategy, Demis: “you can think of as 50% of our effort is on scaling, 50% of it is on innovation. My betting is you’re going to need both to get to AGI.”

Experiences with GPT-5-Codex

OpenAI Codex is now generally available (see here, here). I’m using the Codex extension in the Cursor code editor with my OpenAI account.

Codex is very helpful for some tasks, such as complex code refactoring, implementing skeleton code for an operation, or writing a single small self-contained piece of code. Models have come a long way from a year ago when bugs in a 30-line piece of generated code were not uncommon.

Some have reported 40-60% overall productivity increase from coding agents. I used Codex recently for a complicated code refactoring that I estimate would have taken over 10x more time to plan and execute without an assistant.

The coding agent is less effective for some other tasks. For example, I asked it to ensure that all Python function signatures in my code had type hints, and it missed many cases.

Also, some have reported that the new Claude Sonnet 4.5 runs much faster, though Codex is being continually improved.

Obviously to be effective, these models must have access to adequate test case coverage, to enable the models to debug against. Without this, the coding agent can get really lost.

My approach to using the agent is very hands-on. Before letting it make any changes, I discuss the change plan in detail and make corrections as needed. (I sometimes need to remind the agent repeatedly to wait until I say “start” before commencing changes). Also when appropriate, I ask the model to make changes one step at a time instead of all in one go. This not only makes for a result that is more understandable and maintainable by humans, but also is more likely to give a good quality result.

Some have cautioned of the hazards of using coding agents. One concern is that a rogue coding agent could do something drastic like delete your code base or data files (both theoretically, and for some models, actually). One remedy is to set up your own sandbox to run the agent in, for example, a virtual machine, that has very locked-down access and no access to sensitive data. This may be cumbersome for some workflows, but for others may be a good security measure.

Also, some have warned that an agent can introduce dangerous security bugs in code. A remedy for this is to manually review every piece of code that the agent produces. This introduces some added developer overhead, though still in my experience it is much faster than writing the same code without the agent. And it is much better than just pushing a button to generate a big blob of incomprehensible code.

Coding agents have greatly improved over the last several months. Software development practices are presently passing through a point of no return, permanently changed by the new AI-enabled coding assistants. Even very bright people, who are already extremely skilled, are benefitting from using these tools.

GPT-5 for AI-assisted discovery

Many hope that AI will be “smart enough” to make breakthrough scientific discoveries in the near future, such as find a cure for cancer. Some research efforts have sought to create an “AI Scientist” that can make discoveries in an automated or semi-automated way; for a recent example, see [1]. Others [2] have called out the methodological pitfalls of some of these attempts. Still others question whether  a truly original discovery is even possible at all for an AI.

OpenAI released GPT-5 on August 7. Some thought it lackluster and falling behind compared to expectations. Others however found performance in some areas to be much advanced compared to its predecessor.

Two recent reports show the new model’s utility. Scott Aaronson published a paper last month [3], [4] in which “a key technical step in the proof of the main result came from AI.” Also, Terence Tao reported earlier this month [5] his use of ChatGPT to find a first counterexample to an unsolved mathematics problem.

I’m sure this resonates with the experience of other researchers using the tool. Recently, in the course of a discussion I had with ChatGPT, it came up with a new algorithmic recipe for something I was working on, based on ideas combined from several papers but in itself apparently original. That was a very simple case—but on a grander scale, connecting two ideas together in a novel way can lead to a real breakthrough. For example, Faltings’ proof of the Mordell Conjecture in 1983 was based on recognizing a subtle internal connection among some already existing theorems.

There is always the specter of concern that an idea “maybe was already in the training data.” It can be difficult to prove otherwise. But deep domain experts like Scott Aaronson and Terence Tao are likely to know with high likelihood whether the idea is truly an original never-before-published result or not.

If past is prologue, we can hope for more powerful models in the future that can solve increasingly hard problems.

Notes

[1] Yixuan Weng, Minjun Zhu, Qiujie Xie, Qiyao Sun, Zhen Lin, Sifan Liu, Yue Zhang, “DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively.” https://arxiv.org/abs/2509.26603.

[2] Ziming Luo, Atoosa Kasirzadeh, Nihar B. Shah, “The More You Automate, the Less You See: Hidden Pitfalls of AI Scientist Systems.” https://arxiv.org/abs/2509.08713.

[3] “The QMA Singularity.” https://scottaaronson.blog/?p=9183

[4] Scott Aaronson, Freek Witteveen, “Limits to black-box amplification in QMA.” https://arxiv.org/abs/2509.21131.

[5] https://mathstodon.xyz/@tao/115306424727150237

 

An inverse problem for the quadratic equation

In elementary algebra we learn how to solve ax2 + bx + c = 0 for roots r = r1, r2. But how about the inverse problem: given a real-valued r, find integer coefficients a, b and c such that r is a root of the corresponding quadratic equation.

A simple approximation can be found by setting b=0, so r2 = – c/a. By appropriately choosing large values of a and c, one can approximate the solution to this inverse problem to arbitrary accuracy.

Are there any other solutions (possibly taking advantage of b for a better approximation)?

Two families of algorithms are available: PSLQ [1-3] and LLL [4-5]. The underlying problem formulation behind these algorithms can be understood geometrically. For fixed x, L(a, b, c) = ax2 + bx + c is a linear function of (a, b, c), here considering a, b and c as real-valued. Thus, for such x, the equation L(a, b, c) = 0 defines a known plane in 3-D space. However, we seek a, b, c that are integers. The problem then is to find a lattice point (a, b, c) with a, b, c integers that is “near” to this plane (for example, close to the plane within tolerance τ, which means a 2-D thin “slab” of thickness 2τ surrounding this plane hits a lattice point). We would also prefer if the magnitude of a, b and c is “small” (e.g., bounded by a constant H).

Given this formulation, the PSLQ algorithm attempts to find an exact solution, and gives up if it can’t find one. The LLL algorithm on the other hand finds approximate solutions within a tolerance.

Python has the mpmath package, which contains mpmath.pslq for the PSLQ algorithm. The Python SciPy package has Matrix.LLL() for the LLL reduction.

A colleague of mine was previously submitting overnight runs to compute solutions using  brute force search. More recently, using a Python code for the LLL algorithm, he is able to solve a large problem in seconds, with answers that exactly reproduce the brute force solution.

References

[1] Ferguson, Helaman R. P.; Bailey, David H. (1992). A Polynomial Time, Numerically Stable Integer Relation Algorithm. RNR Technical Report RNR-91-032, NASA Ames Research Center. URL: https://www.davidhbailey.com/dhbpapers/pslq.pdf

[2] Ferguson, Helaman R. P.; Bailey, David H.; Arno, Steve (1999). Analysis of PSLQ, an integer relation finding algorithm. Mathematics of Computation 68(225): 351–369. URL (AMS PDF): https://www.ams.org/mcom/1999-68-225/S0025-5718-99-00995-3/S0025-5718-99-00995-3.pdf

[3] Bailey, David H. (2020). PSLQ: An Algorithm to Discover Integer Relations. (Expository overview.) URL: https://www.davidhbailey.com/dhbpapers/pslq-comp-alg.pdf

[4] Lenstra, Arjen K.; Lenstra, Hendrik W., Jr.; Lovász, László (1982). Factoring Polynomials with Rational Coefficients. Mathematische Annalen 261: 515–534. DOI: https://doi.org/10.1007/BF01457454 (Springer page: https://link.springer.com/article/10.1007/BF01457454 (cf. https://www.math.ucdavis.edu/~deloera/MISC/LA-BIBLIO/trunk/Lovasz/LovaszLenstraLenstrafactor.pdf)

[5] Nguyen, Phong Q.; Vallée, Brigitte (eds.) (2010). The LLL Algorithm: Survey and Applications. Springer. URL: https://link.springer.com/book/10.1007/978-3-642-02295-1

Learning languages with the help of algorithms

Suppose you’re learning a new language and want to boost your vocabulary in a very time-efficient way.

People have many ways to learn a language, different for each person. Suppose you wanted to improve your vocabulary by reading books in that language. To get the most impact, you’d like to pick books that cover as many common words in the language as possible.

Here is a formalization. Suppose for a large set of m books of average length n words, you want to pick the one book that has the highest vocabulary impact from the set of books. This vocabulary impact of a book is measured by a weighted sum across all vocabulary words in the book, each word weighted by how common the word is in the language, as measured by word frequency across all books; this essentially gives a probability weight of each word in the language.

That’s an easy problem to solve. First, optionally filter out stop words like (in the case of English) “the“ and “and”  considered in some sense to have “not much meaning.“ Second, across all books build a unique word list along with counts of number of occurrences of each word. Finally, for each book evaluate the coverage of those words, computing a score as described above.

Computing the unique word list and scores can be done by a hashing process that runs in linear order mn time typically, worst case mn log(mn). To compute the score also costs average linear time. So the entire process can be done in linear time.

What if you want to find the best two books to read, with the best joint vocabulary coverage? To find the optimal solution, the best known time is not linear but is quadratic for the general case.

How about the best k books for arbitrary k > 0?

This is an NP-hard problem, meaning that the compute time to solve this problem exactly for the best known algorithms for the general case grows exponentially in k as the size k of your desired set of reading books increases.

So you cannot expect to solve this problem exactly for large k. All hope is not lost, however. This is a maximal weighted cover problem, residing in a subset of the NP hard problems known as submodular problems.  Because of this, approximate algorithms are known that guarantee approximation accuracy within a certain known factor of the true best solution (see [1-5]).

These algorithms are described in [6], [7], [8]. The basic idea of the algorithms is to add a single high-impact book at a time to the running set of high-impact books—a greedy algorithm. It is not guaranteed to be the best book list, but it is reasonable.

The helpful Python submodlib package runs very fast on this problem.

One can improve the quality of the result by spending more on computation.  First, you could use a blocking strategy to compute precisely the best two books to add at each step, or three books, and so forth (computational complexity is quadratic, cubic and so forth). Similarly one could use a look-ahead strategy: add two books to the list that are together the best by an exact computation, then leave one off and find the best second and third books, and so forth. These in general do not improve on the submodularity bound, however in practice the result is more accurate.

One can also use various heuristics to improve performance in some cases. For example, if a book has little or no vocabulary that is not present in the other books, it can sometimes be safely discarded. However, in general the exact case remains hard (unless P=NP).

References

[1] Abhimanyu Das and David Kempe. Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection. Proceedings of ICML, 2011.

[2] Abhimanyu Das and David Kempe. Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection.
Journal of Machine Learning Research (JMLR), 19(3):1–35, 2018.

[3] M. Conforti and G. Cornuéjols. Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds. Discrete Applied Mathematics, 7(3):251–274, 1984.

[4] Maxim Sviridenko, Jan Vondrák, and Justin Ward. Optimal approximation for submodular and supermodular optimization with bounded curvature. Proceedings of SODA, 2013.

[5] Rishabh Iyer, Stefanie Jegelka, and Jeff Bilmes. Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions. 2013.
https://arxiv.org/abs/1311.2110

[6] Nemhauser, George L., Laurence A. Wolsey, and Marshall L. Fisher. “An analysis of approximations for maximizing submodular set functions—I.” Mathematical programming 14, no. 1 (1978): 265-294.

[7] “Maximum coverage problem,” https://en.wikipedia.org/wiki/Maximum_coverage_problem.

[8] Wei, Kai, Rishabh Iyer, and Jeff Bilmes. “Fast multi-stage submodular maximization.” In International conference on machine learning, pp. 1494-1502. PMLR, 2014.

 

Why are CUDA kernels hard to optimize?

Explosive datacenter demand has caused developers to leave no stone unturned in search of higher efficiencies. The DeepSeek team, not satisfied with Nvidia’s CUDA libraries, used a virtualized form of assembly language (PTX) to write kernel codes to accelerate their AI computations. Others have attempted to generate optimized kernels using AI, though some results have been questioned (for various attempts, see also here, here, here, here and here).

Why is it hard to write peak-speed GPU code? Writing really fast code has always been arduous, but it seems especially so for modern GPUs.

To understand the issues, my colleagues and I performed a detailed study of GPU kernel performance, across eight different GPU models from three GPU vendors [1]. The test case we considered was low precision matrix multiply, a resource-intensive operation for LLM training. We ran many, many experiments to understand what causes performance variability and why kernels sometimes run slower than you’d think they should.

For the cases we studied, we found about half a dozen different factors, but the upshot is this: modern processors like GPUs have become so complex—notably their multi-layered hierarchical memory subsystems—that it is difficult to get consistently high performance across all problem sizes a user might want to run in practice. As a result, the performance for the target problem might be surprisingly and mysteriously less than the advertised peak performance for the operation in question. The reasons might be obvious—like cache line misalignment—or more opaque. For the matrix multiply case, various issues like the need for prefetching, caching, tiling and block size selection, make it difficult for the kernel developer to optimize for every input size a user might specify.

Below is an example graphic from our paper. The color indicates floating point operation rate (FLOPs) for a reduced precision matrix multiply on a representative GPU using a library call. The horizontal and vertical axes refer to the matrix dimensions for the problem (see paper for details). Though some regions show performance near the theoretical peak (red), other immediately adjacent regions show problem sizes that run dramatically less—in fact, only about half of peak performance, or less. Presumably this is because either individual kernel performance or the selection of kernels used by the library is suboptimal. The net outcome is, if your problem lands a “bad” region, you’re in for a big surprise, your performance will be much less than expected, and you may not understand why. All high-performing GPUs we tested showed irregular behaviors such as this [2] [3].

In the past this was not always a problem.  Older architectures like Sun Sparc or Cray vector processor, complex as they were, were simple enough that a reasonably well-tuned computational kernel might run well across most if not all inputs [4]. Today, performance is much harder to predict and can vary substantially based on the requested problem sizes.

This is a tough challenge for library developers. Whenever a new GPU model family comes out, new kernel optimization and tuning are required to give (hopefully) more consistently high performance, and some cases get more developer attention than others due to customer needs and limited developer resources. As a result, infrequently used operations do not get as much attention, but they may be the exact ones you need for your particular case [5].

Tools are available to help optimize for specific cases. The excellent Nvidia CUTLASS library exposes access to many more fine-grained options compared to the standard cuBLAS library. The not faint of heart can try programming Nvidia GPUs at the level of PTX, or (shudder) SASS. Superoptimization might help, but only for very small code fragments and even then there may be too many external factors influencing performance to make it effective.

Autotuning is a promising approach though it doesn’t seem to have reached its full potential in production. AI might really help here [6]; in our own paper we had some success using machine learning methods like decision trees and random forests to model performance as a function of problem size, though our work was exploratory and not production-ready. To make a well-crafted general solution it would seem would require a lot of effort to do right. Code sustainability and maintenance are also critical; a sustainable workflow would be needed to retrain on new GPUs, new CUDA releases and even site-specific and system-specific settings like GPU power and frequency cap policies.

Most recent AI-driven work focuses on optimizing performance for one or a few problem sizes only. A truly production-quality general purpose tool would give both 100% accurate results and also top achievable performance for any input problem size (even for corner cases) or data type. This would require both optimized GPU kernels and optimal kernel dispatcher for kernel selection. And the method would need to be robust to issues like power and frequency variabilities in production runs. This would seem to currently be an unsolved problem. Solving it would be of huge benefit to the hyperscaler community.

Notes

[1] For related work from a slightly different angle, see this excellent work from Matt Sinclair’s lab.

[2] It turned out this study was helpful to us for production runs, to help us to triage an odd performance conundrum we encountered when attempting an exascale run (see here, here).

[3] Incidentally this example shows the hazards of simplistic benchmark suites to measure GPU code performance. Unless the benchmark captures a truly large and varied set of input cases, any new optimization method proposed can artificially “overfit” performance on the tests and still underperform miserably on many user cases of interest.

[4] I once wrote a 1-D wavelet convolution kernel for a Sparc processor, using a circular register buffer and loop unrolling to minimize loads and stores, this achieving near-peak performance. The code was correctly compiled from C to assembly, and performance for a given problem was almost precisely predictable. That was before the days of complex memory hierarchies.

[5] One vendor I know of used to take customer requests for hand tuning expensive library calls and made them run fast at the specific customer problem sizes.

[6] LLM kernel generation seems like a natural fit, particularly since LLM-generated code quality has much improved in recent months. Kernel selection and parameter selection for block size, tiling etc. might be better solved by direct training of machine learning models, or methods like this. Comparative studies on this would be informative.