The Inversion: When Compute Learns to Fit the Problem
Dependency Inversion can be different and massive.
For seventy years, computing has run in one direction. You have an idea, and before you can act on it, you bend that idea to fit the machine. You punched the cards and loaded it into IBM/360. You rewrite your model to fit the accelerator’s SDK. You reshape your experiment to fit the cluster’s scheduler. You pick the chip not because it’s best for your problem but because it’s the one your software already knows how to talk to, and in the vast majority of cases that’s what you got from a vendor. The idea adapts to the layers of compute infrastructure. Always.
I think that now, with the advancement both in semiconductor technology and rapid ability to produce purpose-fit chips, the arrow points the wrong way. And I think flipping it changes what computing is for.
Imagine a system whose whole thesis fits in one sentence:
You declare what you want, and the stack materializes to fit it, and then dissolves when the work is done.
No standing infrastructure to bend toward. No toolchain whose limits become your limits. The machine fits the idea and the computation purpose.
We call the mechanism inversion, and there are currently there are three of them. Let’s call this system Forge for now.
Three things that normally bind early, made to bind late
A binding is a decision the system makes about how your work will run. Conventionally, these decisions are made early and frozen: the runtime is fixed before you arrive, the backend is hand-written months in advance, the team of tools is assembled by whoever set up the environment, the chip is what you get from the cloud. Forge takes each of these frozen decisions and makes it late, declarative, and synthesized on demand.
The runtime fits the app. Today, moving a workload from your laptop to a cluster to a cloud GPU means rewriting the huge amount of glue comprised of the networking, the storage, the orchestration every time for every move. With Runtime Dependency Inversion, you state your application’s goals (its latency budget, its data, its constraints) and the runtime is assembled and built to satisfy them. The same code that ran on your laptop runs on a thousand nodes without you touching it. The runtime adapts to the application instead of the application contorting to fit the runtime.
The backend fits the silicon. This is the hard one, and the most consequential. A new accelerator, be it an NPU, a RISC-V machine-learning chip, an in-memory dataflow fabric, arrives in the world able to run almost nothing well, because production performance lives in hand-tuned kernels, years of edge-cases tuning, that take scarce engineers years to write. Hardware Definition Inversion describes the chip in a declarative language and synthesizes the execution backend to fit it. This is not automatic synthesis. It is more about picking and connecting chips with the right capabilities into a unified computational entity. The chip stops being a fixed constraint the software has to bend around and becomes a target specification the compiler bends toward. A piece of silicon can be useful on the day it powers on.
The team fits the problem. Real research isn’t one model; it’s a complex workflow: sieve and scout the literature, propose an experiment, run it, check the result, decide what’s next. Agent Instantiation Inversion assembles a team of specialized AI-powered agents for the specific problem, lets them collaborate over shared context inside isolated enclaves, and dissolves the team when the job is done. Expertise summoned for a task, not a permanent installation you maintain forever.
None of these is interesting alone. A runtime that adapts is a nicer cluster manager. A compiler that targets new chips is a nicer compiler. Agents that assemble themselves are a nicer AI harness framework. The point is all three under one roof, joined: intent in, materialized stack out, dissolved when finished. That composition is a new kind of object, and it makes new things possible.
The world on the other side of the arrow
Here is what changes when the machine fits the idea.
Silicon competes on merit again. The reason one chip company has dominated AI is not only its hardware; it’s the decade of software that surrounds it, the kernels and libraries no competitor can match. That moat is software, and inversion dissolves it. When any chip can be described as a uniform capability list and have a competitive backend synthesized for it, a startup’s accelerator and an incumbent’s can be judged on what they actually do and do well for the particular task, be it tokens per watt, latency, cost, throughput, rather than on whose ecosystem is older. A more honest market for compute is good for everyone who buys it, which is eventually everyone.
A laptop becomes a command surface for the world’s compute. If you no longer hand-port your work to each machine, the distinction between “my laptop” and “a planetary fabric of GPUs, TPUs, NPUs, FPGAs, and HPC clusters” stops being something you manage and becomes something you address. You declare an experiment; it runs wherever it should run: it can be near the data, on the cheapest qualified silicon, on the highest throughput, inside the right jurisdiction, and the results come back. The researcher thinks about the problem and science. The fabric thinks about the placement.
Compute becomes an act, not an installation. The most quietly radical word in our thesis is dissolves. Today, infrastructure is a thing you stand up and then own - patch, secure, pay for, babysit, migrate, update long after the work that justified it. When the stack materializes for a task and tears itself down afterward, compute becomes ephemeral and momentary: it exists while you need it and not a second longer. What persists isn’t the apparatus. It’s the result, and the record of how it was produced. And that artifact is much more important because it can be reproduced and doesn’t depend on a physical installation.
Knowledge compounds instead of evaporating. Every backend Forge synthesizes is verified and kept in a content-addressed cache and is a long-living artifact. The first time someone runs a given workload on a given chip, it’s synthesized and checked; every time after, for anyone, it’s served instantly. The system gets faster and cheaper the more it is used , not only for one team, but across all of them. We’re used to compute being something you consume. This is compute that accumulates.
Science you can trust and replay. Inversion lets us draw a hard line between the part of the system that decides and the part that runs. Agents can propose and create an exploration domain; nothing reaches a real pipeline without passing a correctness gate through a more symbolic system, and the execution itself is deterministic and recorded in an audit ledger. Run the same intent twice, get the same result, with provable lineage of how it was produced. For drug discovery, materials science, and any regulated field, reproducibility isn’t a nicety, it’s the whole game. A world where computational results are reproducible by construction is a world where computational science is more believable.
Honest about the climb
We are describing a destination, and we’d be doing you a disservice to pretend it’s already here. Synthesizing a backend that beats years of hand-tuning on unfamiliar silicon is genuinely hard; we start where the alternative is no good backend at all, and we earn the harder cases over time. Describing a chip well enough to compile for it is the central technical risk of the whole endeavor, and we treat it as exactly that. The vision is a planetary fabric; the first chapter is narrower and provable, the fastest path from a new chip to a working workload, demonstrated by benchmark, not by manifesto.
But the direction is the point. For seventy years the idea has bent to the machine. We think the next era of computing belongs to whoever turns that around, who lets you state what you want and have the world’s compute arrange itself to deliver it.
Dissolve, and recombine. The machine, finally, fitting the idea.

It's an interesting post and an interesting area of discussion. To some extent this is how people see GPUs: you balance your size and numbers of CPUs and GPUs in your system to match your workload. We did a lot of work at Codeplay to enable people to write code once and it adapts to different hardware, but the hardware designers were resistant. This is true for other software developers working in the field. There are various reasons for this, but it restricts the way you can realistically map software to hardware. So instead, what actually happens in most of the semiconductor industry is what has previously been highly successful in signal processing: hardware designers optimize a hardware/software system for a specific set of workloads. What you're doing is gaining efficiency at a cost of longer time to market. That approach is highly successful in communications systems. But it's been a poor tradeoff in AI, where very short time to market is the top priority. It would be great to fix it, but it's mostly blocked by industry resistance: the software techniques to do it are well known and understood by many HPC software experts. These techniques are very unpopular in hardware design though. So it's a people problem