<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>ai on Akshay Katyal | MrDHat</title>
    <link>https://akshay.co/tags/ai/</link>
    <description>Recent content in ai on Akshay Katyal | MrDHat</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <lastBuildDate>Sat, 06 Jun 2026 20:42:30 +0100</lastBuildDate><atom:link href="https://akshay.co/tags/ai/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>We Accidentally Built a Second Codebase</title>
      <link>https://akshay.co/posts/accidental-second-codebase/</link>
      <pubDate>Sat, 06 Jun 2026 20:42:30 +0100</pubDate>
      
      <guid>https://akshay.co/posts/accidental-second-codebase/</guid>
      <description>A few weeks ago, I deleted 93 skills in a single pull request.
If you haven&amp;rsquo;t worked with them, a skill is a small bundle of instructions you hand to the model, a workflow you&amp;rsquo;ve written down once so it runs the same way every time, the kind of thing you&amp;rsquo;d reach for when you&amp;rsquo;re chasing down a flaky test, or pulling together the context for an incident at 2 am.</description>
      <content>&lt;p&gt;A few weeks ago, I deleted 93 skills in a single pull request.&lt;/p&gt;
&lt;p&gt;If you haven&amp;rsquo;t worked with them, a skill is a small bundle of instructions you hand to the model, a workflow you&amp;rsquo;ve written down once so it runs the same way every time, the kind of thing you&amp;rsquo;d reach for when you&amp;rsquo;re chasing down a flaky test, or pulling together the context for an incident at 2 am. Teams bundle them into plugins and publish those to a shared marketplace, and any engineer can install whichever plugins they want.&lt;/p&gt;
&lt;p&gt;I went looking because people had started pointing out the obvious problem: there was no quality control on any of it. Anyone could publish anything, and given enough time, anyone could. Nobody had actually decided that should be the rule; it just became the rule because nobody had decided otherwise. So I opened the list one afternoon expecting to find a few duplicates and tidy them up.&lt;/p&gt;
&lt;p&gt;My assumption going in was that most of them were load-bearing. Someone had written each one for a reason, someone depended on it, and pulling it would quietly break a workflow I&amp;rsquo;d never heard of. That assumption, more than anything, is what has let the list grow for so long. Everyone treated every skill as untouchable, me included, because you could never quite be sure who was relying on it.&lt;/p&gt;
&lt;p&gt;Reality, as always, was less dramatic. Those 93 skills belonged to the same plugin (which had 96 skills in total), and when I checked, every one had been invoked exactly zero times in 60 days. So I pulled them, opened a PR, and merged it.&lt;/p&gt;
&lt;p&gt;Nothing broke. I&amp;rsquo;d assumed at least one person would DM me about it, whoever had written one of the deleted skills, maybe, but nobody did. It was a little funny, to be honest. We&amp;rsquo;d been debating deleting these for a while. We kept not pressing the button, because we couldn&amp;rsquo;t tell whether anyone would care, or worse, whether it would put people off writing skills altogether, and we genuinely still want people writing skills. The answer turned out to be that nobody noticed at all.&lt;/p&gt;
&lt;p&gt;And that left me with a question I didn&amp;rsquo;t have a good answer to: how had we ended up maintaining dozens of things that nobody was using?&lt;/p&gt;
&lt;h2 id=&#34;the-one-that-made-it-click&#34;&gt;The one that made it click&lt;/h2&gt;
&lt;p&gt;The clearest example isn&amp;rsquo;t even one of the 93 I deleted. It&amp;rsquo;s one I left exactly where it was, a skill that explains to the model how to find its way around our codebase, where the important things live and how the layers fit together. It was a perfectly sensible thing to write, and I&amp;rsquo;d bet it saved people real time for a while.&lt;/p&gt;
&lt;p&gt;Then the codebase moved, the way codebases do, and the skill stayed where it was. Now it hands the model a map to a renovated building.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a worse problem buried under that one. Even on the day it was written, the map was only ever accurate for the corner of the codebase the author happened to work in. Anyone working somewhere else got directions that were confident, specific and wrong. The model is perfectly capable of opening the repo and figuring out the layout for itself, but instead of letting it do that, we sat it down and told it the way things are, incorrectly.&lt;/p&gt;
&lt;p&gt;I keep coming back to this one because of what it exposes. Nobody touched the skill. Nobody edited a mistake into it. It went bad while sitting perfectly still, because the thing it described kept moving, even though it didn&amp;rsquo;t. And honestly, the original sin was writing it down at all: we took something the model could work out on its own, froze one person&amp;rsquo;s snapshot of it, and signed ourselves up to maintain that snapshot forever. A map you have to keep redrawing is worse than letting the model read the territory.&lt;/p&gt;
&lt;p&gt;Someone should look at that skill hard and probably kill it, but it&amp;rsquo;s still sitting there doing its thing. The 93 I deleted were the easy case; nobody used them, so nothing pushed back when they vanished. The genuinely awkward ones are skills like this, the ones still in use, because being used is what keeps them safe from scrutiny and also what makes them a liability.&lt;/p&gt;
&lt;h2 id=&#34;a-different-kind-of-debt&#34;&gt;A different kind of debt&lt;/h2&gt;
&lt;p&gt;Normal technical debt piles up because the software keeps changing. You ship, you patch, you bolt another thing onto the side, and the accumulated weight of all those changes is the debt.&lt;/p&gt;
&lt;p&gt;A lot of what I was looking at worked the other way around. The skill itself never changed; everything around it did. The model would improve, so a workaround, a skill had carefully spelt out, wasn&amp;rsquo;t needed anymore. Or the tooling improved, and the manual steps it walked you through got handled elsewhere. Or the codebase moved, and the skill kept pointing to where things used to be. Or a team got reorganised out of existence, and its skills quietly outlived it.&lt;/p&gt;
&lt;p&gt;This is much harder to catch than the ordinary kind, because nothing in your diff history points at it. The file looks completely fine; &lt;code&gt;git blame&lt;/code&gt; tells you nothing useful, and the only way actually to find the rot is to know what&amp;rsquo;s changed &lt;em&gt;outside&lt;/em&gt; the file. Almost nobody is doing that against a list of skills they&amp;rsquo;ve half forgotten they own.&lt;/p&gt;
&lt;p&gt;And it&amp;rsquo;s everywhere once you start looking. Roughly half the skills in our catalogue have a single commit to their name, written once and never touched again, and most have had only one author. Write a skill once, never open it again, and it just carries on describing a world that has quietly moved on without it.&lt;/p&gt;
&lt;h2 id=&#34;every-one-of-them-made-sense&#34;&gt;Every one of them made sense&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s no villain in any of this, by the way. I wasn&amp;rsquo;t staring at a pile of bad decisions.&lt;/p&gt;
&lt;p&gt;The pattern will be familiar to anyone who&amp;rsquo;s shipped software. A team notices they keep doing the same dance over and over, so they write it down as a skill and stop doing it by hand. It works well enough that they share it. The next team sees that and does the same for their own workflow. Someone gets ambitious and wires a handful of them together into an orchestration that runs end to end, and someone else adds an investigation flow for the kind of problem their team runs into every other week.&lt;/p&gt;
&lt;p&gt;Every one of those moves is the right call at the moment it&amp;rsquo;s made. Zoom in on any single decision, and it&amp;rsquo;s completely rational.&lt;/p&gt;
&lt;p&gt;Then you look up one day and the shared marketplace has more than 600 skills, spread across nearly 80 plugins, any of which an engineer can install in a second. I deleted 93 and barely made a dent. Nobody set out to build this. It&amp;rsquo;s just what hundreds of small, reasonable, local decisions add up to.&lt;/p&gt;
&lt;p&gt;And none of this is new. We&amp;rsquo;ve all watched this story play out before; it&amp;rsquo;s just that this time, it&amp;rsquo;s .md files, not .rb or .py. Internal tools, one person wrote on a Friday, that three teams now can&amp;rsquo;t work without. Dashboards nobody can explain but everybody trusts. Microservices that made sense as a split at the time and now mostly just exist. CI jobs that have been green so long no one remembers what they actually check. Docs that were accurate two reorgs ago. Same failure mode as ever, just moved up a layer in the stack, and we show up with the same instincts that let it grow last time: keep it, don&amp;rsquo;t touch it, someone out there probably needs it.&lt;/p&gt;
&lt;h2 id=&#34;i-stopped-thinking-of-them-as-documentation&#34;&gt;I stopped thinking of them as documentation&lt;/h2&gt;
&lt;p&gt;For a long time, I&amp;rsquo;d filed skills under &amp;ldquo;documentation&amp;rdquo; in my head. Helpful text, the kind of thing you write once and forget about.&lt;/p&gt;
&lt;p&gt;That stopped feeling right, because they&amp;rsquo;d started behaving like a codebase. They change how the model acts depending on which ones are loaded, they interact with each other in ways nobody intended, and they lean on tools and on each other and on half-stated assumptions about the world they run in. And like any other code, they go stale and need looking after, and once in a while one of them needs deleting outright.&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t get me wrong, I&amp;rsquo;m not trying to push the analogy too far. They don&amp;rsquo;t compile and the syntax doesn&amp;rsquo;t matter one bit. But the maintenance burden is real, and on that score they have far more in common with a codebase than with a page on the wiki. We&amp;rsquo;d quietly grown a second codebase, written mostly in English, and nobody was treating it like one.&lt;/p&gt;
&lt;p&gt;I wasn&amp;rsquo;t the only one landing there. Months before I deleted anything, an engineering leader had said much the same thing in a Slack thread I only came across later:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Our skills etc. are now our &amp;ldquo;code&amp;rdquo;. Have we discussed what a quality ensuring pipeline would look like here?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;most-skills-should-probably-die&#34;&gt;Most skills should probably die&lt;/h2&gt;
&lt;p&gt;If there&amp;rsquo;s one thing I have started to believe, it&amp;rsquo;s that most skills should eventually be deleted, and that this is a sign of health rather than failure.&lt;/p&gt;
&lt;p&gt;Think about what a skill usually is the day it gets written:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a workaround for something the model couldn&amp;rsquo;t do well yet&lt;/li&gt;
&lt;li&gt;a nudge, to push its behaviour in some direction you wanted&lt;/li&gt;
&lt;li&gt;a stand-in for a capability that didn&amp;rsquo;t exist at the time&lt;/li&gt;
&lt;li&gt;a bit of knowledge that, until then, had only lived in one person&amp;rsquo;s head&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now give it a year or two. The model gets better at the very thing some workaround was working around, the tooling quietly absorbs the manual steps, the product moves on, and the reasons a skill existed start expiring one by one without anyone noticing. By then the workaround isn&amp;rsquo;t needed, the capability has properly shipped, and whatever knowledge the skill held has been written down somewhere more durable. The skill has done its job. The honest thing to do is retire it.&lt;/p&gt;
&lt;p&gt;Which is why a catalog that only ever grows isn&amp;rsquo;t the good sign it looks like. It usually means nothing is being allowed to finish its job and leave.&lt;/p&gt;
&lt;h2 id=&#34;what-id-actually-want-to-argue-about&#34;&gt;What I&amp;rsquo;d actually want to argue about&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m not going to pretend I have this figured out. Internally I&amp;rsquo;ve floated a few things, a quality bar in front of the catalog, actual named owners, some way of retiring skills on a schedule, and I genuinely don&amp;rsquo;t know how many of them are any good. The honest situation is that nobody has worked out yet what &amp;ldquo;good&amp;rdquo; even looks like here.&lt;/p&gt;
&lt;p&gt;Is it benchmarks for skills? Quality metrics? Alerting that fires when a skill&amp;rsquo;s assumptions have gone stale? I really don&amp;rsquo;t know. The one thing I&amp;rsquo;m fairly confident about is that we shouldn&amp;rsquo;t end up settling this the same way we built the catalog in the first place, by default, one reasonable little addition at a time, until you glance up and it&amp;rsquo;s six hundred deep. This is the sort of thing worth deciding on purpose, for once.&lt;/p&gt;
&lt;p&gt;So these are the questions I&amp;rsquo;d want a room full of engineers to actually fight about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What do you measure, when &amp;ldquo;barely used&amp;rdquo; and &amp;ldquo;rarely needed but critical&amp;rdquo; look identical from the outside?&lt;/li&gt;
&lt;li&gt;What earns deletion, and who gets to pull the trigger?&lt;/li&gt;
&lt;li&gt;How much governance can you add before you kill the thing that made skills useful in the first place, that anyone could write one?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The deletion was the easy part. I still don&amp;rsquo;t know how you keep hundreds of these honest as the ground keeps shifting under them, and I don&amp;rsquo;t think anyone else does yet either.&lt;/p&gt;
</content>
    </item>
    
  </channel>
</rss>
