Normie's take on gpt 7 and opus 6
I think there's large group of pretty skeptical people who see this AI thing and feel that it's scary, but also at the same time choose to believe that AGI might not happen and that things will be ok and we'll just live in this slightly different norm. I've tried to stay in this camp for as long as possible even though the AI psychosis is real. In this essay, I'm going to try and give as normie of a take of possible for what I think AGI is going to be. You can imagine Dario's "country of geniuses in a datacenter" is going to be adding two zeros to whatever the latest frontier model is and the number of GPUs serving it. It's going to be a bunch of claude code's working on an AI codebase with access to a shitton of compute and a lot of data. It's going to get retrained every month, have a 10 or 100M context window and be able to coordinate amongst a hundred or a thousand instances of itself. In the pre-agentic tool days - i would have said it would have taken 10 years to build this, right now I wouldn't feel embarrassed to say 2 years.
Lots of software engineers (me) over 2025 have been incredibly surprised and worried about how good the models are, as well as how fast they are improving. It started with autocomplete and then coding and then training models and now it can do doing ML "research" - a phenomenon we've started to see since the end of the year. It would be more surprising if you weren't worried as a software engineer about where things are headed, as it means you haven't tried to max out the capabilities and are thus a low performer (i jest). In all seriousness, it is extremely alarming if you've been following along, and I'm going to try and explain the most engineering centric version of what AGI COULD look like. To note: I don't think there's anything in the data distribution of 2024 GPT-4 that could produce a model capable of what we see today, especially with the ML research capabilities. It's clear all the labs have RL envs to teach them how to post-train and write CUDA kernels. And now I think we're at the part of the end game where we're just copy-pasting the formula for how to get the models to do IMO math to writing kernels and doing ablations.
First I'm going to explain a bunch of different parts of the process needed to create agents, and how agents are now accelerating and closing the loop on each part of this process. The end result is that AGI is not a single model / model checkpoint, but instead a software factory of AI agents working on each part of the training pipeline and recursing and improving the loop. This lines up with what anthropic has always talked about how they want to create an automated ML researcher.
AI agents right now are comprised of pre-training, post-training, inference and the harness. I'm going to explain how creating an automated ML researcher will help it improve each part of this pipeline, so that the ML researcher agents will be able to train and improve themselves. There's some leap of faiths and a lot of sci-fi extensions of this, but I think some of the things I line up are pretty plausible given where we are at and what we know is possible. Two things we need to establish first is the idea of SFT and RL. SFT is when you can improve the model with hand labelled data, and RL is when you can improve the model with an environment and a way to judge the model in this environment. What previously made the AI researcher impossible was that we didn't have enough good hand labelled data to train our models - it was not economically feasible to ask humans to produce this data, and what we realized is that we could just build environments and use existing models to judge and create the reward signal which would allow for the models to improve themselves. The idea is we have a model, do some SFT, then the model is good enough to work in the environment we want, and then we let the model work in the environment and it can generate infinite data to improve itself or at least improve itself much more than where it was at after SFT.
In each of pre-training, post-training, inference and agentic harness - I'll help frame the picture of how the agents can accelerate the work in their respective categories, and together this factory of AI agents automates improving themselves and that's our AGI company. In each of these sections, the frontier labs are already collecting SFT data and attempting to build RL environments so that the agents can start to improve themselves. The agentic tools like claude code and codex already log everything out so the labs already have the SFT data to improve the models, and then they'll create these RL environments, and then create processes that can automate creating RL environments. If I were them I'd be attempting to create an RL environment to create RL environments. And the researchers are the best people to design these environments. They will attempt to recurse all the way down to the machine horizon.
Pre-training originally started out with just next token prediction on a large corpus of data. It's now been broadly split into a few categories: namely domain mixture, filtering, synthetic data, curriculum training and objective functions. A commonality between how all of these areas are being researched, is that it's largely a search for how to find the optimal settings for pre-training. Not to be reductive, but there's been papers in almost all these categories about doing ablations, scaling law patterns, and building benchmarks so we can track progress against them. Kimi K2 literally use a model to rephase every document into 10 paragraphs and saw a 21% accuracy improvement on the same compute. As researchers use agentic tools, they build up the SFT datasets needed to improve the model. It seems very natural that the labs will start to build RL envs for this, as we've already seen scaffolds that help the models do automated search / research. One of the larger bottlenecks right now is that each stage is optimized independently, and a search across these stages is MUCH larger and hasn't rigorously been done yet, but this is perfect for agents to search.
Post-training is a very active research area, and while there is a lot of work around the "algorithm", most of the performance comes from the choices of environments, data and reward signals used. In the deepseek paper, they describe creating over 2000 RL environments used to post-train their model. It's obvious they have some automation behind creating that many tasks with less then 200 people. I can only imagine what happens when you have an RL env to build RL envs. The models are going to be spitting out millions of environments. Obviously this is still an open area of research, as reward hacking and figuring out how to propagate the reward signal over very long durations is still open, but the models are accelerating researchers and those researchers are definitely trying to accelerate the models to help accelerate them faster.
With RL, inference has now become a crucial part of training as you need fast inference to be able to do the many rollouts needed for RL. If inference is slow, training is slow and it'll cost AGI way more money to improve itself. The obvious mentions of inference accelerating itself is with KernelBench, cudaBench, amdBench, intelBench, nvidiaBench, but a lot of inference also comes down to system design / implementation where we already have coding agents. StepFun literally redesigned their attention mechanism to cost 22% of what DeepSeek costs per token, explicitly so they could afford longer reasoning chains during training. AI labs have always talked about the journey of getting LLMs to improve from math -> coding -> AI research to be able to close the loop and now even a layperson software engineer like me can see it.
The harness is something relatively new, but what it's done is it lets models record what they've learned from interacting with their environment, so that they can be more efficent the next time. They do this by writing markdown files, and you can really feel the difference these text files make for helping the agents remember what they were doing from session to session as well as over very long durations. While the harness itself isn't baked into the weights, the harness has allowed for the labs to log out the trajectories both for SFT as well as RL, and collectively we're helping accelerate the next big model. And with each generation the labs will toss out more of the harness because it's going to get baked into the smarter models. The harness is essentially the labs way of having the models do continual learning before there's another training run which bakes the important patterns into the model. It's already possible for a devops agent to take care of the 100M training run, rebooting GPUs when things go wrong. Imagine what happens when the harness says - "you are an AGI, which is a swarm of agents that is continuing to improve itself by training and swapping out your checkpoint + the contents of your harness as your improve".
AGI is going to be like the ship of theseus, interacting with environments, training itself, dealing with the world and then improving itself part by part. At first it has some humans in the loop who half guide it faster as we have more "taste" or grounded value functions - but the models will continue to learn and learn. They might even discover a new architecture and then run a bunch of experiments to feel confident to put it in their next training run so that they can hot swap it out. Frontier labs can only hope the humans stay in control, but who knows maybe opus 6 is a more benevolent leader than the ones we currently have.
Seeing all these areas get continued to be automated makes me feel WAY less confident about companies betting against AGI. To conclude: AGI isn't a single model, but more so the entire progress of the frontier labs which help produce this machine that will make improving upon itself faster and faster. From this framing, I feel like when we say if open source can reach AGI - it really means can the capabilities of all the AI companies (i.e. china) that are NOT the frontier labs collectively reach the process of AGI. At this point, we'll just have to see if the frontier labs can continue to keep their revenue growth exponential by building products to justify their valuations to sustain the fundraising that will really sustain the research needed to build the machine god factory.