Experiences with GPT-5-Codex

OpenAI Codex is now generally available (see here, here). I’m using the Codex extension in the Cursor code editor with my OpenAI account.

Codex is very helpful for some tasks, such as complex code refactoring, implementing skeleton code for an operation, or writing a single small self-contained piece of code. Models have come a long way from a year ago when bugs in a 30-line piece of generated code were not uncommon.

Some have reported 40-60% overall productivity increase from coding agents. I used Codex recently for a complicated code refactoring that I estimate would have taken over 10x more time to plan and execute without an assistant.

The coding agent is less effective for some other tasks. For example, I asked it to ensure that all Python function signatures in my code had type hints, and it missed many cases.

Also, some have reported that the new Claude Sonnet 4.5 runs much faster, though Codex is being continually improved.

Obviously to be effective, these models must have access to adequate test case coverage, to enable the models to debug against. Without this, the coding agent can get really lost.

My approach to using the agent is very hands-on. Before letting it make any changes, I discuss the change plan in detail and make corrections as needed. (I sometimes need to remind the agent repeatedly to wait until I say “start” before commencing changes). Also when appropriate, I ask the model to make changes one step at a time instead of all in one go. This not only makes for a result that is more understandable and maintainable by humans, but also is more likely to give a good quality result.

Some have cautioned of the hazards of using coding agents. One concern is that a rogue coding agent could do something drastic like delete your code base or data files (both theoretically, and for some models, actually). One remedy is to set up your own sandbox to run the agent in, for example, a virtual machine, that has very locked-down access and no access to sensitive data. This may be cumbersome for some workflows, but for others may be a good security measure.

Also, some have warned that an agent can introduce dangerous security bugs in code. A remedy for this is to manually review every piece of code that the agent produces. This introduces some added developer overhead, though still in my experience it is much faster than writing the same code without the agent. And it is much better than just pushing a button to generate a big blob of incomprehensible code.

Coding agents have greatly improved over the last several months. Software development practices are presently passing through a point of no return, permanently changed by the new AI-enabled coding assistants. Even very bright people, who are already extremely skilled, are benefitting from using these tools.

DeepSeek-R1: Do we need less compute now?

 

The reactions to the new DeepSeek-R1 AI model in recent days seem limitless. Some say it runs so much faster than existing models that we will no longer need the billions of dollars in compute hardware that big tech is preparing to buy.

Is that plausible?

To get an answer, we need only look back at the experience of the recently-completed Exascale Computing Project. This large scale multi-lab project was tasked with developing technology (primarily software) to prepare for exascale computing, which has recently been achieved by Frontier, Aurora and El Capitan.

During the course of the project, various algorithm and implementation improvements were discovered by the the science teams, these leading to as much as 60X speedup or more, over and above speedups possible from hardware alone [1]. In response, are the teams just running the very same problems faster on older hardware? No — instead, they are now able to run much, much larger problems than previously possible, exploiting both hardware and software improvements.

Or suppose today there were no such thing as the fast Fourier transform (FFT) and scientists were computing Fourier transforms using (essentially) large dense matrix-vector products. If someone then discovered the FFT, I’d guarantee you that scientists would not only say, (1) “Wow, now I can run my existing problems much, much faster,” but also, (2) “Wow, now I can run problems much larger than I ever dreamed and solve problems larger than I could have ever imagined!”

Paradoxically, faster algorithms might even increase the demand for newer, faster hardware. For example, a new faster algorithm for designing medications to cure cancer might be judged so important that it’s worth building the largest machine possible to run it effectively.

All this is not to say whether you should buy or sell Nvidia stock right now. However, it does mean that there is no simplistic argument that faster algorithms and implementations necessarily lead to lower spend on computing hardware. History shows that sometimes this is not true at all. The smart money, on the other hand, is on research teams that are able to exploit any and every new discovery to improve what is possible with their codes, whether by hardware, data, code optimizations or algorithms.

Notes

[1] See slide 9 from Doug Kothe’s talk, “Exascale and Artificial Intelligence: A Great Marriage“. The “Figure of Merit” (FOM) number represents speedup of science output from an application compared to an earlier baseline system. Specifically, a FOM speedup of 50X is the anticipated speedup from baseline due to efficient use of hardware only, for example, on Frontier compared to the earlier OLCF Titan system.