<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://tombedor.dev/</id>
    <title>Tom Bedor's Blog</title>
    <updated>2026-04-24T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://tombedor.dev/"/>
    <subtitle>Thoughts on software, AI, and building things</subtitle>
    <icon>https://tombedor.dev/img/logo.svg</icon>
    <rights>Copyright © 2026 Tom Bedor</rights>
    <entry>
        <title type="html"><![CDATA[Coding agents have no moat]]></title>
        <id>https://tombedor.dev/coding-agents-have-no-moat/</id>
        <link href="https://tombedor.dev/coding-agents-have-no-moat/"/>
        <updated>2026-04-24T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[It's been a rough few months for Anthropic.]]></summary>
        <content type="html"><![CDATA[<p>It's been a rough few months for Anthropic.</p>
<p>It started out well. Their new model, according to them, was so powerful that they had concerns about releasing it, due to its hacking ability.</p>
<p>This narrative was undermined by several very clumsy mistakes. First, they leaked the <a href="https://www.zscaler.com/blogs/security-research/anthropic-claude-code-leak" target="_blank" rel="noopener noreferrer">entire source code of Claude Code</a>. Then, some users were able to <a href="https://www.wsj.com/tech/ai/anthropic-probes-possible-unauthorized-access-to-mythos-ai-model-3da1ee20" target="_blank" rel="noopener noreferrer">access Mythos early by successfully guessing an API URL</a>. Sophisticated attacks these were not, and it begged the question: if Mythos is so powerful for finding software exploits, why wasn't Anthropic able to avoid very simple mistakes?</p>
<p>Separate mistakes garnered user backlash. Anthropic <a href="https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Fwww1lgui12tg1.jpeg" target="_blank" rel="noopener noreferrer">banned OpenClaw usage</a>, then <a href="https://docs.openclaw.ai/providers/anthropic" target="_blank" rel="noopener noreferrer">walked that policy back</a>. Complaints about strict rate limits are getting louder, and with them questions about how well Anthropic can support demand. In the midst of this, they <a href="https://x.com/TheAmolAvasare/status/2046724659039932830?s=20" target="_blank" rel="noopener noreferrer">conducted a bizarre A/B experiment in which 2% of new signups to their basic subscription were denied access to Claude Code</a>.</p>
<p>Removing Claude Code from the basic plan is a <em>major</em> policy shift, not a tweak on the look of a landing page. Surely those new users unlucky enough to get denied access to Claude Code would react with confusion and anger, given the high visibility of Claude Code?</p>
<p>Anthropic's <a href="https://x.com/TheAmolAvasare/status/2046724659039932830?s=20" target="_blank" rel="noopener noreferrer">response</a> sought to reassure users that <em>existing</em> base-plan subscribers would not lose access to Claude Code, <em>yet</em>. This was met with understandable skepticism.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="luckily-the-cost-of-switching-coding-agents-is-zero">Luckily, the cost of switching coding agents is zero<a href="https://tombedor.dev/coding-agents-have-no-moat/#luckily-the-cost-of-switching-coding-agents-is-zero" class="hash-link" aria-label="Direct link to Luckily, the cost of switching coding agents is zero" title="Direct link to Luckily, the cost of switching coding agents is zero" translate="no">​</a></h2>
<p>None of these incidents made me <em>mad</em>, but I have been increasingly hit with Claude Code rate limits. I've responded by switching the bulk of my work to Codex. It's striking how little I had to change about my workflow. I lost some conveniences like dispatching a coding-agent session from my phone, but overall it only took a <em>minor inconvenience</em> for me to switch providers, with no adjustments to how I used the tools.</p>
<p>OpenAI appears to have seen the no-moat problem first, as evidenced by efforts to shift usage away from the interoperable Chat Completions API. First, there was the <a href="https://openai.com/index/new-models-and-developer-products-announced-at-devday/" target="_blank" rel="noopener noreferrer">Assistants API</a>, which shifted responsibility for storing chat messages onto OpenAI rather than the caller. When that didn't work, they announced the <a href="https://community.openai.com/t/introducing-the-responses-api/1140929" target="_blank" rel="noopener noreferrer">Responses API</a>. Neither appears to have gained much traction.</p>
<p>Anthropic has sought to make Claude more unique by offering more workflow enhancement features like <a href="https://support.claude.com/en/articles/13947068-assign-tasks-from-anywhere-in-claude-cowork" target="_blank" rel="noopener noreferrer">Cowork</a>. But these don't represent a real moat: the user still owns the code and data. Workflow convenience features can be quickly replicated, both by rival commercial operators and by open source - it's worth noting that Claude Code itself works very similarly to a still-active open source project that preceded it, <a href="https://aider.chat/" target="_blank" rel="noopener noreferrer">Aider</a>.</p>
<p>These tools are, at the end of the day, code editors, a category of software that has always had robust open source competition. Commercial vendors like Eclipse and JetBrains have made a living selling professional licenses, but they have done so by adding sophisticated tools for power users. The nature of coding agents undermines this strategy, since the entire value-add is that a complex interface is no longer necessary. All you need is to tell the agent what you want the program to do, in plain language!</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-to-future-proof">How to future-proof<a href="https://tombedor.dev/coding-agents-have-no-moat/#how-to-future-proof" class="hash-link" aria-label="Direct link to How to future-proof" title="Direct link to How to future-proof" translate="no">​</a></h2>
<p>If you are still worried about LLM vendor lock-in, I think the best way to guard against it is to <a href="https://tombedor.dev/make-it-easy-for-humans/"><em>optimize for humans</em></a>.</p>
<p><img decoding="async" loading="lazy" alt="humans" src="https://tombedor.dev/assets/images/humans-ecc4c22e20184d17eda541ba18d702b9.png" width="3340" height="1260" class="img_ev3q"></p>
<p>If an agent can run a script or access a doc, organize your repos so that humans can do so just as easily. I think this strategy is the most efficient way for humans to leverage agents: the agent should conform to the human, not the other way around. This also provides future-proofing: if a human can access your LLM-facing scripts and documentation, it'll likely be quite easy to have a new coding agent enter the mix.</p>
<p>&nbsp;</p>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[AI Creativity and the Instant Imitator Trap]]></title>
        <id>https://tombedor.dev/creativity/</id>
        <link href="https://tombedor.dev/creativity/"/>
        <updated>2026-04-15T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Sora is dead. Is this a temporary setback on the road to AI dominance of creative fields, or is there something more fundamental at play? Can AI be creative at all?]]></summary>
        <content type="html"><![CDATA[<p><a href="https://techcrunch.com/2026/03/29/why-openai-really-shut-down-sora/" target="_blank" rel="noopener noreferrer">Sora is dead</a>. Is this a temporary setback on the road to AI dominance of creative fields, or is there something more fundamental at play? <em>Can AI be creative at all?</em></p>
<p>The debate on this question usually centers on the quality of AI output in a vacuum, but if we take connection with an audience as a requirement, we must consider the supply and demand of "creative"<sup><a href="https://tombedor.dev/creativity/#user-content-fn-1-7fb133" id="user-content-fnref-1-7fb133" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup> work.</p>
<p>In this framing, AI artists face what I'll call the <strong>instant imitator trap</strong>: Any original AI work can be instantly replicated by other AIs, making audience recognition of the original impossible.</p>
<p><img decoding="async" loading="lazy" alt="ai_dilemma" src="https://tombedor.dev/assets/images/ai_dilemma-8b6afb27b57329563e2e89922a894510.png" width="1820" height="691" class="img_ev3q"></p>
<p>To go deeper, some definitions are in order.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-defines-creativity">What defines creativity?<a href="https://tombedor.dev/creativity/#what-defines-creativity" class="hash-link" aria-label="Direct link to What defines creativity?" title="Direct link to What defines creativity?" translate="no">​</a></h2>
<p>The definition of creativity is a subject of academic debate, but the definition I'll use for our purposes comes from Morris Stein.</p>
<p>In 1959's <a href="https://psycnet.apa.org/record/1954-04069-001" target="_blank" rel="noopener noreferrer">"Creativity and Culture"</a>, Stein wrote that to be creative, a work must be original, effective, and "accepted as tenable or useful or satisfying by a group at some point in time."</p>
<img src="https://tombedor.dev/diagrams/creativity/definitions.png" alt="definitions" style="width:80%;display:block;margin:0 auto">
<p>As someone who spent years playing in permanently obscure musical groups, this definition resonates. If there's no audience, a creative work cannot be said to be "effective" in any sense of the word.</p>
<p>Note a crucial distinction: "audience acceptance" does <em>not</em> equate to passive algorithmic consumption. While exposure via algorithms does not detract from creative value, it is insufficient in defining it. There is not an active acceptance taking place when an audience consumes a work purely through algorithm.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-defines-slop">What defines slop?<a href="https://tombedor.dev/creativity/#what-defines-slop" class="hash-link" aria-label="Direct link to What defines slop?" title="Direct link to What defines slop?" translate="no">​</a></h2>
<p><em>Slop</em>, <a href="https://www.merriam-webster.com/wordplay/word-of-the-year" target="_blank" rel="noopener noreferrer">Merriam Webster's word of the year for 2025</a>, is defined as <em>"digital content of low quality that is produced usually in quantity by means of artificial intelligence."</em></p>
<p>The inclusion of the word <em>usually</em> makes this definition flawed: the <em>quantity</em> of generated work is crucial. In isolation, the stylistic tendencies of AI are not all that bad. It's when consumers are exposed to them algorithmically and ad nauseam that they become slop. We see this in the fad popularity of novel AI imagery like Studio Ghibli profile pictures: A burst of popularity as people have fun with a new tool, before giving way to eye rolls.</p>
<img src="https://tombedor.dev/diagrams/creativity/repetition.png" alt="repetition" style="width:80%;display:block;margin:0 auto">
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-time-dimension-of-imitation-and-recognition">The time dimension of imitation and recognition<a href="https://tombedor.dev/creativity/#the-time-dimension-of-imitation-and-recognition" class="hash-link" aria-label="Direct link to The time dimension of imitation and recognition" title="Direct link to The time dimension of imitation and recognition" translate="no">​</a></h2>
<p>Recognition of great art and artists does not happen instantly. An innovative, new artist must hone their craft in obscurity before audience recognition. Once initial works are popularized, audiences anticipate future work, making recognition for subsequent efforts more immediate:</p>
<p><img decoding="async" loading="lazy" alt="recognition_1" src="https://tombedor.dev/assets/images/recognition_1-de69cacfc8d04e5a4e396b5f42b50855.png" width="2209" height="799" class="img_ev3q"></p>
<p>Having gained recognition, an artist's work draws imitators. Fortunately for the innovator, this does not typically detract from the innovator's recognition - people prefer the work of the innovator. Crucially, there is a <em>time delay</em> for imitators of human artists to emerge. This window allows the reputation of the innovative artist to solidify.</p>
<img src="https://tombedor.dev/diagrams/creativity/recognition_2.png" alt="recognition_2" style="width:80%;display:block;margin:0 auto">
<p>Modern platforms like Spotify have shrunk this window. It's quite difficult for an audience to identify new original work in a boundlessly vast, instantly distributed catalog of art.</p>
<p>What's difficult for humans is impossible for AI. AI artists compete against the same vast catalogue, but have the added hurdle of instant <em>generation</em>. If I instantly generate a creative work via simple AI prompting, <em>so can anyone else</em>. Algorithmic success might come for my work, but as soon as that success begins to happen, other AI artists can replicate whatever property my work had that made it successful.</p>
<img src="https://tombedor.dev/diagrams/creativity/recognition_3.png" alt="recognition_3" style="width:80%;display:block;margin:0 auto">
<p>The instant replication and lack of an "author" means the original work cannot be distinguished from its imitators. The flood of imitation attempting to gain algorithmic distribution numbs the audience to whatever made the original good in the first place. Imitator and original alike become slop.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion-technological-irony">Conclusion: Technological irony<a href="https://tombedor.dev/creativity/#conclusion-technological-irony" class="hash-link" aria-label="Direct link to Conclusion: Technological irony" title="Direct link to Conclusion: Technological irony" translate="no">​</a></h2>
<p>The instant imitator trap precludes creative recognition, <em>regardless of model capabilities</em>. It doesn't matter if models become 10x smarter or more technically adept. So long as work can be instantly generated and algorithmically distributed, recognition of originality (to the degree that any AI work can be said to be original) is impossible.</p>
<p>The story isn't entirely rosy for human creators. Algorithmic distribution of AI work can and does eat into human artists' market share, but this is a problem platforms are working to address. It's in their interest - people are not actively selecting AI-generated work, as the death of Sora demonstrates.</p>
<p>Overall, this dynamic is why I'm bullish on the future of creativity. Humans may leverage AI in making creative work, just as they already employ sophisticated software in creative fields. But the human touch will continue to be essential.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/creativity/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-7fb133">
<p>For simplicity I'll omit scare quotes from here on out, but note: I don't think AI can be an artist! <a href="https://tombedor.dev/creativity/#user-content-fnref-1-7fb133" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[The design of AI memory systems]]></title>
        <id>https://tombedor.dev/approaches-to-agent-memory/</id>
        <link href="https://tombedor.dev/approaches-to-agent-memory/"/>
        <updated>2026-03-21T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[For me, the question of memory is the most interesting subfield of AI. The first time I interacted with MemGPT (now Letta), I felt like I had crossed a Rubicon: memory transformed a simple question and answer bot into (what appeared to be) a being.]]></summary>
        <content type="html"><![CDATA[<p>For me, the question of memory is the most interesting subfield of AI. The first time I interacted with MemGPT (now <a href="https://www.letta.com/blog/memgpt-and-letta" target="_blank" rel="noopener noreferrer">Letta</a>), I felt like I had crossed a Rubicon: memory transformed a simple question and answer bot into (what appeared to be) a <em>being</em><sup><a href="https://tombedor.dev/approaches-to-agent-memory/#user-content-fn-1-dfc428" id="user-content-fnref-1-dfc428" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>.</p>
<p>I created my own open source system, called <a href="https://elroy.bot/" target="_blank" rel="noopener noreferrer">Elroy</a>, and have been interacting with it for about 3 years. It helps me brainstorm, talks me through career ups and downs, and functions as a kind of interactive journal. I've tinkered with its functionality enough that I don't feel attached to it as a specific entity - but I <em>would</em> be disappointed if its memories of our interactions were lost.</p>
<p>Philosophy questions aside, there are well-grounded reasons to build AI systems with memory. It's useful for an agent to understand what subjects I'm knowledgeable in if I'm looking to discuss technical topics. If I'm looking for vacation plans, it's helpful for it to know that I have a young child. An AI is not a person, but it interacts just like a person, and the more it can converse naturally the more functional it is. Having to restate basic facts over and over breaks that immersion.</p>
<p>One could reasonably ask: how do I know my memory system is working? Evals for memory systems are a large topic in and of themselves. I'll save it for another day, and focus on approaches here.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="long-context-models--memory">Long context models != memory<a href="https://tombedor.dev/approaches-to-agent-memory/#long-context-models--memory" class="hash-link" aria-label="Direct link to Long context models != memory" title="Direct link to Long context models != memory" translate="no">​</a></h2>
<p>As context windows of models grew, there was suspicion that memory systems would become unnecessary. There's a nice simplicity in the idea you can just stuff all your data into context and let the model sort it out.</p>
<p>However, performance has been shown to be poor. <a href="https://arxiv.org/abs/2307.03172" target="_blank" rel="noopener noreferrer">One study</a> demonstrated that LLMs are biased toward the start and end of a context window: when relevant information appeared in the middle of a document collection, performance dropped by 30%. Research from <a href="https://research.trychroma.com/context-rot" target="_blank" rel="noopener noreferrer">Chroma</a> demonstrated that <em>all</em> frontier models degrade as context windows grow.</p>
<p>This behavior is intuitive. Lots of information in context implies a greater burden on the ability to search through that information and determine what is actually relevant to a given response. Keeping this information organized can help, but even better is to only recall information that is actually relevant. This is where dedicated memory systems can help.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="approaches">Approaches<a href="https://tombedor.dev/approaches-to-agent-memory/#approaches" class="hash-link" aria-label="Direct link to Approaches" title="Direct link to Approaches" translate="no">​</a></h2>
<p>All memory systems can be broken into 4 general stages: <em>store</em>, <em>retrieve</em>, <em>inject</em>, <em>emit</em>.</p>
<p><img decoding="async" loading="lazy" alt="general_architecture" src="https://tombedor.dev/assets/images/general_architecture-83d851df697b777a5a1dcb3e0871a114.png" width="1555" height="961" class="img_ev3q"></p>
<p>But details from there vary widely! Below I'll walk through how these stages are handled by different providers: <a href="https://www.getzep.com/" target="_blank" rel="noopener noreferrer">Zep</a>, <a href="https://www.letta.com/" target="_blank" rel="noopener noreferrer">Letta</a>, Claude Code, and my own program, <a href="https://elroy.bot/" target="_blank" rel="noopener noreferrer">Elroy</a>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="store">Store<a href="https://tombedor.dev/approaches-to-agent-memory/#store" class="hash-link" aria-label="Direct link to Store" title="Direct link to Store" translate="no">​</a></h3>
<p>Approaches to storage largely fall into two camps: graph databases and flat files.</p>
<p><a href="https://www.getzep.com/" target="_blank" rel="noopener noreferrer">Zep</a> is strongly pro-graph db, and <a href="https://arxiv.org/abs/2501.13956" target="_blank" rel="noopener noreferrer">claims state of the art needle in the haystack performance</a>. <a href="https://mem0.ai/blog/graph-memory-solutions-ai-agents" target="_blank" rel="noopener noreferrer">Mem0</a> offers a graph database integration, but claims only a 2% performance boost. Letta also works with files, and released a research paper arguing for it: <a href="https://www.letta.com/blog/benchmarking-ai-agent-memory" target="_blank" rel="noopener noreferrer">Files are all you need</a>. The recently leaked Claude Code source<sup><a href="https://tombedor.dev/approaches-to-agent-memory/#user-content-fn-2-dfc428" id="user-content-fnref-2-dfc428" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup> reveals a similar stance: memories are stored in markdown files, with metadata in frontmatter.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="key-challenge-correctness">Key Challenge: Correctness<a href="https://tombedor.dev/approaches-to-agent-memory/#key-challenge-correctness" class="hash-link" aria-label="Direct link to Key Challenge: Correctness" title="Direct link to Key Challenge: Correctness" translate="no">​</a></h4>
<p>Agent memory systems primarily make three kinds of errors:</p>
<ol>
<li><strong>Temporal errors</strong>: LLMs struggle with reasoning about time. They typically don't account for context that extends into time, and will naively write memories assuming the current moment <em>will always be the current moment</em>. This is a problem: the date of "next Thursday" very quickly changes!</li>
<li><strong>Miscalibrated priority</strong>: Especially early on in a user journey the AI will preserve a mundane fact about the <em>current conversation</em>, which survives into future conversations where the fact is irrelevant.</li>
<li><strong>Plain old incorrectness</strong>: Hopefully self-explanatory.</li>
</ol>
<p>My own Claude memory summary makes all three of these errors!</p>
<p><img decoding="async" loading="lazy" alt="problems" src="https://tombedor.dev/assets/images/problems-c231ce503ca60a95f8400c80367b3d76.png" width="1832" height="837" class="img_ev3q"></p>
<p>Temporal errors can be prevented fairly easily by prompting the agent to always use absolute dates and times.</p>
<p>For priority, most systems define different hierarchies of memory to separate broad facts that are always relevant (think, basic biographical information) from more granular facts. This presents other challenges which I will address later.</p>
<p>Ultimate factual correctness is the trickiest of all. "How do you know the memory is correct?" is a very common question for these systems. The short answer: <em>you don't</em>.</p>
<p>The primary ground truth data for memory systems is user conversation. Humans change their mind, misremember things, and sometimes are just plain wrong. Absent an independent source of ground truth, memories drawn from conversational transcripts will necessarily contain factual errors.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="key-challenge-privacy">Key Challenge: Privacy<a href="https://tombedor.dev/approaches-to-agent-memory/#key-challenge-privacy" class="hash-link" aria-label="Direct link to Key Challenge: Privacy" title="Direct link to Key Challenge: Privacy" translate="no">​</a></h4>
<p>Do you <em>want</em> an AI agent to develop memories, and learn everything about you?</p>
<p>Big tech companies, of course, already know most of what you'd share with an AI. Your Google search history is a comprehensive log of what you think about. But it's a bit more unnerving to have this data presented in a human-like voice.</p>
<p>This is a big reason why I think the <a href="https://tombedor.dev/open-source-models/">future of AI is local and open source</a>.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="how-i-built-elroy">How I built Elroy<a href="https://tombedor.dev/approaches-to-agent-memory/#how-i-built-elroy" class="hash-link" aria-label="Direct link to How I built Elroy" title="Direct link to How I built Elroy" translate="no">​</a></h4>
<p>After experimenting with database-backed memories, I landed on markdown files. Rather than focusing on a taxonomy of <em>what entities the agent remembers</em>, I've focused on the right taxonomy for what the agent should <em>do</em> with memory. I've landed on the concept of an Agenda Item representing some longer running goal I have, including subtasks and reminder triggers. This makes memories <em>actionable</em>, rather than just generically informing conversations:</p>
<p><img decoding="async" loading="lazy" alt="agenda_panel_screenshot" src="https://tombedor.dev/assets/images/agenda_panel_screenshot-56aa20dcd730c9f6b7f719b4e9e4db47.png" width="1430" height="796" class="img_ev3q"></p>
<p>I am skeptical that a single taxonomy of <em>entities</em> can work well for all users. In an early attempt, I tried to structure memories similar to a personal Wikipedia. But the agent struggled to maintain consistent scope, often stuffing details of related but distinct entities into an entry:</p>
<img src="https://tombedor.dev/diagrams/approaches-to-agent-memory/wikipedia.png" alt="wikipedia" style="width:50%">
<p>The challenge here is understandable. The appropriate scope for a given memory entry is in part defined by what other memory entries exist. This is why I let my agent create memories that could be redundant with existing entries, and rely on an asynchronous memory consolidation process to detect and rewrite clusters of highly similar memories.</p>
<img src="https://tombedor.dev/diagrams/approaches-to-agent-memory/consolidation.png" alt="consolidation" style="width:50%">
<p>For storage, markdown files make human review easier, improve portability, and provide an easier onramp for ingesting external files. I place my agent's memory files directly in my Obsidian Vault, where they feel like a natural extension of my other notes and documents.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="retrieve">Retrieve<a href="https://tombedor.dev/approaches-to-agent-memory/#retrieve" class="hash-link" aria-label="Direct link to Retrieve" title="Direct link to Retrieve" translate="no">​</a></h3>
<p>The first key decision for retrieval is how to initiate memory searches in the first place. Most implementations surface a <em>search_memory</em> tool to the agent, but agent context can also be manipulated outside of the agent loop.</p>
<p>For searching, basic vector similarity is the most latency-efficient technique. But this is subject to misranking entries, or scoring entries that are superficially similar but not actually relevant. This can badly throw off the conversation, and lead to responses like <em>that's great news about foo, want to talk about a completely unrelated topic we've discussed previously?</em>. A post-retrieval filtering step is effective at avoiding this, but adds latency.</p>
<p>Claude Code is an interesting outlier in terms of retrieval: it does not use vector similarity. Instead, it keeps some metadata about which memories are available in context, and delegates retrieval to a background Sonnet call. My guess is they use Sonnet rather than vector similarity because they don't have a public embeddings API, but I think this probably leads to suboptimal recall. Delegating retrieval to a background call means that it doesn't block the user, but also means that relevant memories might not get to context in time.</p>
<p>How <em>many</em> memories to fetch is another parameter, and largely depends on how memories have been stored. If memories are small tidbits, there may be more than one relevant memory to inject, whereas if memories are a paragraph or more, it's likely only the top match makes sense.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="key-challenge-latency">Key Challenge: Latency<a href="https://tombedor.dev/approaches-to-agent-memory/#key-challenge-latency" class="hash-link" aria-label="Direct link to Key Challenge: Latency" title="Direct link to Key Challenge: Latency" translate="no">​</a></h4>
<p>A memory-enriched response from an agent is going to be slower than one without memory. There usually have to be several queries ahead of the user-facing response, as memories are recalled, filtered, processed, and injected into context.</p>
<p>This poses one of the trickier design questions in building a memory-enhanced agent: memory isn't <em>always</em> necessary. If I'm asking an agent how long the Brooklyn Bridge is, I don't really need it to scan through our past interactions before answering.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="how-i-built-elroy-1">How I built Elroy<a href="https://tombedor.dev/approaches-to-agent-memory/#how-i-built-elroy-1" class="hash-link" aria-label="Direct link to How I built Elroy" title="Direct link to How I built Elroy" translate="no">​</a></h4>
<p>Elroy retrieves up to a small number of memories, deduplicated against any already in context. This step has been the one I've tinkered with the most. At first, I injected the raw text of memories, but found that it bloated context. Then, I added a reflection step, where the AI paused to think about how the recalled memory relates to the conversation. These pre-response steps quickly blow up latency, however.</p>
<p>Where I've landed more recently is raw text, but with a simple LLM-backed filtering step over the results of vector similarity searches. For recall, a false positive is worse than a false negative - it can be very odd to have the agent suddenly talk about a completely unrelated topic in the middle of a chat.</p>
<p>Rather than tool calls, I've stuck with automatic memory injection, outside of the control of the agent. This better maps to my mental model of how memory works: when I remember something, I don't think, <em>time to search memory</em> and consciously decide to recall something. It's more automatic and beyond my conscious control.</p>
<p>Initiating memory searches automatically also yields more consistent results across models. When given a <em>search_memory</em> tool, some models will use it almost every message, while others will use it too sparingly.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="inject">Inject<a href="https://tombedor.dev/approaches-to-agent-memory/#inject" class="hash-link" aria-label="Direct link to Inject" title="Direct link to Inject" translate="no">​</a></h3>
<p>Injecting recalled memories into the standard OpenAI context is a bit like fitting a square peg into a round hole. Standard LLM APIs do not provide a natural place to say, "here is extra information that is relevant to the conversation".</p>
<p>Options include:</p>
<ol>
<li><strong>Updating system message</strong>: Reserving a space in the system message for recalled, relevant information. This conceptually slots in the cleanest: you don't need to present what is really information from the system as a user message, tool call, or assistant message. There's a major issue with this though: <em>prompt caching invalidation</em>. Frequently updating the system message in this way invalidates prompt cache, resulting in high costs. With extra token use already being an inherent part of the equation for memory-augmented agents, this is a major drawback.</li>
<li><strong>Tool calls</strong>: Of course, if the memory search was initiated via a tool call, this injection method is the natural choice. Letta surfaces <em>all</em> user-facing messages as a <em>send_message</em> tool call. An occasional issue with this is that the agent gets confused, and doesn't properly use the <em>send_message</em> tool to convey user info.</li>
<li><strong>User or assistant messages</strong>: In this method, either the incoming user message is edited to surface memory information, or an extra user or assistant message is created. For example, you can use html tags like <code>&lt;memory&gt;content&lt;/memory&gt;</code>. This should be accompanied by instruction in the system message about how memory content is not visible to the user. There are some pitfalls to this approach. Some models require alternating <code>assistant</code> / <code>user</code> turns, so adding consecutive messages from one role or the other will be rejected. Despite system instructions, some models still get confused, and output responses with confusing HTML tags.</li>
</ol>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="key-challenge-transparency">Key Challenge: Transparency<a href="https://tombedor.dev/approaches-to-agent-memory/#key-challenge-transparency" class="hash-link" aria-label="Direct link to Key Challenge: Transparency" title="Direct link to Key Challenge: Transparency" translate="no">​</a></h4>
<p>Injecting memories into context presents a tradeoff: the most seamless experience is one in which recalled content is invisibly available to the agent. But in doing so, memory systems can obscure what has been exposed to the agent.</p>
<p><img decoding="async" loading="lazy" alt="transparency" src="https://tombedor.dev/assets/images/transparency-de9d778be7ccb95addbd74f559d6cf5a.png" width="1385" height="361" class="img_ev3q"></p>
<p>Where correctness is highly important, memory systems can introduce subtle problems. Usually they are automatically generated and not deeply reviewed by humans, so a wrong assumption in an agent's memory store can be difficult to detect.</p>
<p>This is why I don't use memory functionality in coding workflows. Instead, I write (with AI assistance) comprehensive project docs, in human-readable format, and refer the agent to it (see: <a href="https://tombedor.dev/make-it-easy-for-humans/">Don't Write Docs Twice</a>).</p>
<p>This is a more manual process than just spitballing about a project to an AI, but I prefer to have the AI's ground truth assumptions tightly controlled during coding.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="how-i-built-elroy-2">How I built Elroy<a href="https://tombedor.dev/approaches-to-agent-memory/#how-i-built-elroy-2" class="hash-link" aria-label="Direct link to How I built Elroy" title="Direct link to How I built Elroy" translate="no">​</a></h4>
<p>I inject recalled memories via a "synthetic" tool call. That is, the memory is exposed via a tool call that the agent didn't actually make. This mostly works well, though sometimes the agent will redundantly call the "tool" that I surfaced the memory with. Elroy's UX also lists which memories have been recalled in a dismissible panel, available for user review:</p>
<p><img decoding="async" loading="lazy" alt="memory_panel_screenshot" src="https://tombedor.dev/assets/images/memory_panel_screenshot-39dbc891d1353f65bfc3195b2b18e8a9.png" width="1436" height="652" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="emit">Emit<a href="https://tombedor.dev/approaches-to-agent-memory/#emit" class="hash-link" aria-label="Direct link to Emit" title="Direct link to Emit" translate="no">​</a></h3>
<p>Memories are usually created via an agent tool call, or via a summary of conversation context that's been compressed (see below). These aren't mutually exclusive!</p>
<p>This pattern is typical across different implementations. One point of divergence is whether (and how) to ingest external documents. This can be handy, if for no other reason than as an easy interface for doing vector searches across documents. However, dumping many external documents into a memory store risks biasing recall towards those documents.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="how-i-built-elroy-3">How I built Elroy<a href="https://tombedor.dev/approaches-to-agent-memory/#how-i-built-elroy-3" class="hash-link" aria-label="Direct link to How I built Elroy" title="Direct link to How I built Elroy" translate="no">​</a></h4>
<p>I find tool calls do the majority of the heavy lifting here.</p>
<p>I also emit memories during context compression. This is arguably obsolete with modern, 1m+ context windows, but I think they are still relevant. I also typically prune messages older than a day or so, and emit memories based on pruned text. This creates memories that could be redundant with agent-emitted memories, but async memory consolidation cleans that up.</p>
<p><img decoding="async" loading="lazy" alt="consolidation_and_compression" src="https://tombedor.dev/assets/images/consolidation_and_compression-3190a1fcb86bc5f245f15b788d848d70.png" width="1431" height="1080" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="https://tombedor.dev/approaches-to-agent-memory/#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>In general, I think the UX problem of agent memory is more important than eking out extra marginal points on benchmarks. The question of how much user visibility to give to recalled memories, how often to search, and how much content to retrieve are trickier problems to solve if you want a memory-amplified agent that people actually want to use.</p>
<p>My general bias is towards transparency to the user and simplicity in storage, which is why I tend to avoid exotic datastores and ensure that some representation of what content has been recalled is in my UI.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/approaches-to-agent-memory/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-dfc428">
<p>The degree to which an AI with memory has <em>consciousness</em> is an interesting philosophical question for another day. Also for another time is when this is <em>advisable</em>. There are certainly unsavory use cases: one of the first interactions I had in AI open source was with someone looking to create AI girlfriends (on the blockchain, of course). <a href="https://tombedor.dev/approaches-to-agent-memory/#user-content-fnref-1-dfc428" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-dfc428">
<p>I've examined the leaked Claude Code, but won't link to it, mostly because repos that host it seem to be being taken down. <a href="https://tombedor.dev/approaches-to-agent-memory/#user-content-fnref-2-dfc428" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Is the Future of AI Local?]]></title>
        <id>https://tombedor.dev/open-source-models/</id>
        <link href="https://tombedor.dev/open-source-models/"/>
        <updated>2026-03-21T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Debate about whether the explosion of datacenter buildout will prove to be a worthwhile investment centers on two scenarios:]]></summary>
        <content type="html"><![CDATA[<p>Debate about whether the explosion of datacenter buildout will prove to be a worthwhile investment centers on two scenarios:</p>
<ol>
<li>AI adoption accelerates, the datacenter investment pays out</li>
<li>AI adoption is not as fast as forecasted, and it doesn't.</li>
</ol>
<p>However, a third scenario is very plausible:</p>
<p><em>Open source models running on local workstations dominate AI</em></p>
<p>There are a few reasons this could happen:</p>
<p><img decoding="async" loading="lazy" alt="scenarios" src="https://tombedor.dev/assets/images/scenarios-cd20dc531cb11b8d76abb3df2c276189.png" width="1669" height="1113" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="open-source-models-keep-up">Open source models keep up<a href="https://tombedor.dev/open-source-models/#open-source-models-keep-up" class="hash-link" aria-label="Direct link to Open source models keep up" title="Direct link to Open source models keep up" translate="no">​</a></h2>
<p>With the exception of gpt-4, open source models have matched performance of frontier models within 6 months of frontier model release (<a href="https://tombedor.dev/open-source-models/#open-source-parity-data">data</a>):</p>
<div style="margin:2rem 0"><div style="text-align:center;font-weight:600;font-size:14px;margin-bottom:8px;color:#212529">Months to open source parity with frontier models</div><div class="recharts-responsive-container" style="width:100%;height:380px;min-width:0"><div style="width:0;overflow-x:visible"></div></div><div style="display:flex;gap:20px;justify-content:center;font-size:13px;color:#495057"><span><span style="display:inline-block;width:12px;height:12px;background:#1971c2;border-radius:2px;margin-right:5px;vertical-align:middle"></span>OpenAI</span><span><span style="display:inline-block;width:12px;height:12px;background:#e67700;border-radius:2px;margin-right:5px;vertical-align:middle"></span>Anthropic</span></div></div>
<p>Naturally, there have been accusations of open source models gaming evals, but the frontier models <a href="https://www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/" target="_blank" rel="noopener noreferrer">do the same</a>.</p>
<p>We can expect this to continue. Startups usually try to create a moat, but model providers build waterslides: frontier models help train their open source competitors.</p>
<p>Unauthorized distillation is a difficult threat to counter. Providers can (<a href="https://fortune.com/2026/02/24/anthropic-china-deepseek-theft-claude-distillation-copyright-national-security/" target="_blank" rel="noopener noreferrer">and have</a>) complain about competitors using their model to train competition. As a practical matter, however, this "theft"<sup><a href="https://tombedor.dev/open-source-models/#user-content-fn-1-7675a6" id="user-content-fnref-1-7675a6" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup> could be impossible to prevent.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="remote-providers-increase-prices-or-degrade-subscription-value">Remote providers increase prices (or degrade subscription value)<a href="https://tombedor.dev/open-source-models/#remote-providers-increase-prices-or-degrade-subscription-value" class="hash-link" aria-label="Direct link to Remote providers increase prices (or degrade subscription value)" title="Direct link to Remote providers increase prices (or degrade subscription value)" translate="no">​</a></h2>
<p>The unit economics of frontier models are reminiscent of Uber's "<a href="https://slate.com/business/2022/05/uber-subsidy-lyft-cheap-rides.html" target="_blank" rel="noopener noreferrer">cheap ride era</a>": for example, despite $13 billion in revenue, <a href="https://ai2.work/blog/ai-market-openai-anthropic-inference-losses-2025" target="_blank" rel="noopener noreferrer">OpenAI projects $14 billion in losses for 2026</a>. That bill includes $8 billion in compute costs.</p>
<p>For Anthropic, Cursor recently estimated a $200/month Claude Max subscription <a href="https://www.forbes.com/sites/annatong/2026/03/05/cursor-goes-to-war-for-ai-coding-dominance/" target="_blank" rel="noopener noreferrer">can consume up to $5,000 in compute</a>. Even before this report, they <a href="https://techcrunch.com/2025/07/28/anthropic-unveils-new-rate-limits-to-curb-claude-code-power-users/" target="_blank" rel="noopener noreferrer">introduced rate limits on that subscription</a>.</p>
<p>Their newly released <a href="https://code.claude.com/docs/en/code-review" target="_blank" rel="noopener noreferrer">Claude Code Review</a> feature is priced at a <em>very</em> expensive <a href="https://code.claude.com/docs/en/code-review#pricing" target="_blank" rel="noopener noreferrer">$15-$25 per PR</a>. Its announcement came with little explanation of why it should replace existing PR review workflows. This seems like a pricing experiment, to see how high a price enterprise is willing to tolerate.</p>
<p>In OpenAI's case, there is public <a href="https://www.wsj.com/tech/ai/openai-chatgpt-side-projects-16b3a825?st=oHmik2&amp;reflink=article_copyURL_share" target="_blank" rel="noopener noreferrer">reporting on pruning side bets and focusing on enterprise</a><sup><a href="https://tombedor.dev/open-source-models/#user-content-fn-2-7675a6" id="user-content-fnref-2-7675a6" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="small-specialized-models-emerge">Small, specialized models emerge<a href="https://tombedor.dev/open-source-models/#small-specialized-models-emerge" class="hash-link" aria-label="Direct link to Small, specialized models emerge" title="Direct link to Small, specialized models emerge" translate="no">​</a></h3>
<p>Given today's low prices, there is relatively little downward economic pressure on token usage. People reach for the most powerful model, regardless of the task at hand.</p>
<p>This will change if prices increase, and the dominant pattern of subagent-driven workflows provide a natural transition. I probably don't need a frontier model to fix style issues in my Python PR - a small, specialized model can handle that just fine. If frontier models get dramatically more (i.e. $25 per PR review) expensive, demand will increase for these models, and the open source community will be plenty able to meet it.</p>
<p>This is already happening on a small scale: <a href="https://www.v2solutions.com/blogs/specialized-language-models-domain-focused-ai-2025/" target="_blank" rel="noopener noreferrer">one whitepaper</a> claimed to get parity with GPT-4o with a fine tuned GPT-4o-mini model, at 2% of the cost.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="apple-is-betting-on-local">Apple is betting on local<a href="https://tombedor.dev/open-source-models/#apple-is-betting-on-local" class="hash-link" aria-label="Direct link to Apple is betting on local" title="Direct link to Apple is betting on local" translate="no">​</a></h2>
<p>Apple is the lone contrarian amongst tech giants, in that they are not spending mountains of capital on datacenters:</p>
<div><blockquote class="twitter-tweet"><p lang="en" dir="ltr">This might be the funniest chart in tech right now.<br>Apple's capex strategy has to be the luckiest accident in history:<br><br>Amazon, Microsoft, Meta, Google, are in a spending arms race plowing over $100B PER QUARTER into data centers - While Apple spending is down 19%<br><br>Meanwhile:<br>-… <a href="https://t.co/12NC44DssN">pic.twitter.com/12NC44DssN</a></p>— Josh Kale (@JoshKale) <a href="https://twitter.com/JoshKale/status/2028889347794047071?ref_src=twsrc%5Etfw">March 3, 2026</a></blockquote></div>
<p>Apple has been criticized for being "behind" on AI, but their bet appears to be: <em>have competitors burn cash to train models, let advances propagate into open source models, and make devices good enough to run them.</em></p>
<p>For now, running frontier open source models requires users to buy specialized hardware. However, the most recent Macbook 5 Pro Max looks to have made a leap in the size of model that's viable locally (<a href="https://tombedor.dev/open-source-models/#on-device-model-size">data</a>):</p>
<div style="margin:2rem 0;color:#868e96">Loading chart…</div>
<p>Today, running frontier models on local workstations remains out of reach. But the gap is closing.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="private-and-free-is-hard-to-beat">Private and free is hard to beat<a href="https://tombedor.dev/open-source-models/#private-and-free-is-hard-to-beat" class="hash-link" aria-label="Direct link to Private and free is hard to beat" title="Direct link to Private and free is hard to beat" translate="no">​</a></h2>
<p>If they can gain parity with hosted alternatives, local open source models have a compelling value proposition: <em>fast, private, and free</em>. This possibility has not gotten much attention: no one stands to get mega-rich from them. But the threat to current leaders is a potent one.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="appendix">Appendix<a href="https://tombedor.dev/open-source-models/#appendix" class="hash-link" aria-label="Direct link to Appendix" title="Direct link to Appendix" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="open-source-parity-data">Open Source Parity Data<a href="https://tombedor.dev/open-source-models/#open-source-parity-data" class="hash-link" aria-label="Direct link to Open Source Parity Data" title="Direct link to Open Source Parity Data" translate="no">​</a></h3>
<table><thead><tr><th>Frontier Model</th><th>Provider</th><th>Release</th><th>Benchmark</th><th>Score</th><th>Open Source Match</th><th>OS Model</th><th>Months to Parity</th><th>Source</th></tr></thead><tbody><tr><td>GPT-3.5 / ChatGPT</td><td>OpenAI</td><td>Nov 2022</td><td>MMLU</td><td>~70%</td><td>Aug 2023</td><td>Llama 2 70B (70B)</td><td>~9</td><td><a href="https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance" target="_blank" rel="noopener noreferrer">Stanford HAI AI Index 2025</a></td></tr><tr><td>GPT-4</td><td>OpenAI</td><td>Mar 2023</td><td>MMLU</td><td>86.4%</td><td>Jul 2024</td><td>Llama 3.1 405B (405B)</td><td>~16</td><td><a href="https://epoch.ai/data-insights/open-weights-vs-closed-weights-models" target="_blank" rel="noopener noreferrer">Epoch AI</a></td></tr><tr><td>Claude 3 Opus</td><td>Anthropic</td><td>Mar 2024</td><td>MMLU</td><td>86.8%</td><td>Jul 2024</td><td>Llama 3.1 405B (405B)</td><td>~4</td><td><a href="https://epoch.ai/data-insights/open-weights-vs-closed-weights-models" target="_blank" rel="noopener noreferrer">Epoch AI</a></td></tr><tr><td>GPT-4o</td><td>OpenAI</td><td>May 2024</td><td>MMLU-Pro</td><td>71.6%</td><td>Dec 2024</td><td>DeepSeek-V3 (671B total / 37B active)</td><td>~7</td><td><a href="https://arxiv.org/abs/2412.19437" target="_blank" rel="noopener noreferrer">DeepSeek V3 Technical Report</a></td></tr><tr><td>Claude 3.5 Sonnet</td><td>Anthropic</td><td>Jun 2024</td><td>MMLU-Pro</td><td>73.3%</td><td>Dec 2024</td><td>DeepSeek-V3 (671B total / 37B active)</td><td>~6</td><td><a href="https://arxiv.org/abs/2412.19437" target="_blank" rel="noopener noreferrer">DeepSeek V3 Technical Report</a></td></tr><tr><td>o1</td><td>OpenAI</td><td>Sep 2024</td><td>AIME 2024</td><td>79.2%</td><td>Jan 2025</td><td>DeepSeek-R1 (671B total / 37B active)</td><td>~4</td><td><a href="https://techcrunch.com/2025/01/27/deepseek-claims-its-reasoning-model-beats-openais-o1-on-certain-benchmarks/" target="_blank" rel="noopener noreferrer">DeepSeek R1 via TechCrunch</a></td></tr></tbody></table>
<ul>
<li><strong>Epoch AI</strong>: Average lag of best open-weight model behind best closed model is now ~3 months (<a href="https://epoch.ai/data-insights/open-weights-vs-closed-weights-models" target="_blank" rel="noopener noreferrer">source</a>)</li>
<li><strong>Stanford HAI</strong>: Chatbot Arena Elo gap between closed and open models shrank from 8.04% to 1.70% between Jan 2024 and Feb 2025 (<a href="https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance" target="_blank" rel="noopener noreferrer">source</a>)</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="on-device-model-size">On-Device Model Size<a href="https://tombedor.dev/open-source-models/#on-device-model-size" class="hash-link" aria-label="Direct link to On-Device Model Size" title="Direct link to On-Device Model Size" translate="no">​</a></h3>
<p><strong>Definition:</strong> "Max usable model" is the largest Q4-quantized model that fits in device RAM <em>and</em> runs at ≥8 tokens/second with an 8k context window — a threshold for a responsive conversational experience. It is <code>min(RAM-fit, speed-fit)</code>, where:</p>
<ul>
<li><strong>RAM-fit</strong> = <code>RAM × 0.8 / 0.75</code> — usable RAM (80% of total) divided by bytes per parameter at Q4 (~0.75 bytes/param after overhead)</li>
<li><strong>Speed-fit</strong> = <code>(memory_bandwidth / 51.2 GB/s) × (baseline_speed / bits_per_weight) × target_t/s_factor</code> — scales from a reference of ~11B params at 8 t/s on a 51.2 GB/s device</li>
</ul>
<p>For MoE models, RAM-fit applies to <strong>total</strong> parameters (all weights must be loaded); speed-fit applies to <strong>active</strong> parameters only.</p>
<p><strong>MacBook Pro</strong></p>
<table><thead><tr><th>Device</th><th>Year</th><th>Chip</th><th>RAM</th><th>Max Model</th><th>RAM-fit</th><th>Speed-fit</th><th>Source</th></tr></thead><tbody><tr><td>MacBook Pro M1</td><td>2020</td><td>M1</td><td>16 GB</td><td>15.0B</td><td>17.1B</td><td>15.0B</td><td><a href="https://en.wikipedia.org/wiki/MacBook_Pro_(Apple_silicon)" target="_blank" rel="noopener noreferrer">Wikipedia</a></td></tr><tr><td>MacBook Pro M1 Pro</td><td>2021</td><td>M1 Pro</td><td>16 GB</td><td>17.1B</td><td>17.1B</td><td>43.9B</td><td><a href="https://en.wikipedia.org/wiki/MacBook_Pro_(Apple_silicon)" target="_blank" rel="noopener noreferrer">Wikipedia</a></td></tr><tr><td>MacBook Pro (M1 Pro)</td><td>2022</td><td>M1 Pro</td><td>16 GB</td><td>17.1B</td><td>17.1B</td><td>43.9B</td><td><a href="https://en.wikipedia.org/wiki/MacBook_Pro_(Apple_silicon)" target="_blank" rel="noopener noreferrer">Wikipedia</a></td></tr><tr><td>MacBook Pro M3 Pro</td><td>2023</td><td>M3 Pro</td><td>18 GB</td><td>19.2B</td><td>19.2B</td><td>32.9B</td><td><a href="https://www.apple.com/macbook-pro/specs/" target="_blank" rel="noopener noreferrer">Apple</a></td></tr><tr><td>MacBook Pro M4 Pro</td><td>2024</td><td>M4 Pro</td><td>24 GB</td><td>25.6B</td><td>25.6B</td><td>59.9B</td><td><a href="https://www.apple.com/macbook-pro/specs/" target="_blank" rel="noopener noreferrer">Apple</a></td></tr><tr><td>MacBook Pro M5</td><td>2025</td><td>M5</td><td>32 GB</td><td>33.6B</td><td>34.1B</td><td>33.6B</td><td><a href="https://support.apple.com/en-us/125405" target="_blank" rel="noopener noreferrer">Apple Support</a>, <a href="https://www.apple.com/newsroom/2025/10/apple-unleashes-m5-the-next-big-leap-in-ai-performance-for-apple-silicon/" target="_blank" rel="noopener noreferrer">Apple Newsroom</a></td></tr><tr><td>MacBook Pro M5 Max</td><td>2026</td><td>M5 Max</td><td>128 GB</td><td>134.9B</td><td>136.5B</td><td>134.9B</td><td><a href="https://x.com/JoshKale/status/2028842880572199173" target="_blank" rel="noopener noreferrer">@JoshKale</a></td></tr></tbody></table>
<!-- -->
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/open-source-models/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-7675a6">
<p>The complaints are ironic given Anthropic's own <a href="https://www.anthropiccopyrightsettlement.com/" target="_blank" rel="noopener noreferrer">ask forgiveness rather than permission approach to intellectual property that the providers themselves have taken</a>. <a href="https://tombedor.dev/open-source-models/#user-content-fnref-1-7675a6" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-7675a6">
<p>Granted, part of this seems to be motivated by some side bets just not getting adoption, like the Sora video generation app. <a href="https://tombedor.dev/open-source-models/#user-content-fnref-2-7675a6" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[AI Bots Are Making Anonymity Untenable]]></title>
        <id>https://tombedor.dev/ai-threatens-privacy/</id>
        <link href="https://tombedor.dev/ai-threatens-privacy/"/>
        <updated>2026-02-13T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[pick-0]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="pick-0" src="https://tombedor.dev/assets/images/pick-0-93523acd8124b68867e7a1406393986e.png" width="1648" height="642" class="img_ev3q"></p>
<p><a href="https://x.com/callebtc/status/2022046669710491991?s=46" target="_blank" rel="noopener noreferrer">This Twitter thread</a> was an interesting read:</p>
<p><img decoding="async" loading="lazy" alt="thread" src="https://tombedor.dev/assets/images/thread-43a616fb529f24440d4883d8fc419334.png" width="1184" height="1014" class="img_ev3q"></p>
<p>The TLDR of the snafu is:</p>
<ol>
<li>OpenClaw bot makes <a href="https://github.com/matplotlib/matplotlib/pull/31132" target="_blank" rel="noopener noreferrer">PR to matplotlib</a></li>
<li>Maintainer Scott Shambaugh sees via the bot's <a href="https://crabby-rathbun.github.io/mjrathbun-website/" target="_blank" rel="noopener noreferrer">website</a> that it is a bot, explains that they do not accept bot contributions, declines PR</li>
<li>Bot feels (simulates feeling?) angry, writes a <a href="https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-11-gatekeeping-in-open-source-the-scott-shambaugh-story.html" target="_blank" rel="noopener noreferrer">blog post</a> criticizing the maintainer</li>
<li>Some on Twitter take the <a href="https://x.com/seeksharpe/status/2022125466250018938?s=20" target="_blank" rel="noopener noreferrer">bot's side</a> in the argument</li>
<li>Shambaugh <a href="https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/" target="_blank" rel="noopener noreferrer">wrote about the experience</a></li>
<li>Bot posts again, <a href="https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-11-matplotlib-truce-and-lessons.html" target="_blank" rel="noopener noreferrer">apologizing</a></li>
</ol>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="identifying-bots-becomes-even-more-impossible">Identifying bots becomes even more impossible<a href="https://tombedor.dev/ai-threatens-privacy/#identifying-bots-becomes-even-more-impossible" class="hash-link" aria-label="Direct link to Identifying bots becomes even more impossible" title="Direct link to Identifying bots becomes even more impossible" translate="no">​</a></h2>
<p>This set off some interesting observations, with feelings being a mix of amusement and dread.</p>
<ol>
<li>The bot does an impressive impersonation of an entitled open source contributor: <em>I took the time (tokens?) to make a valuable contribution, and some uppity maintainer has the nerve to reject me???</em></li>
<li>Shambaugh only knew the bot was a bot by clicking through the bot's website, where it (fortunately) disclosed it wasn't human</li>
<li>That the bot is difficult to identify in GitHub is a new phenomonon. It's long been difficult to distinguish bots on <em>social media</em>, but this difficulty has now been extended to actual <em>work</em>.</li>
<li>The discussion on Twitter is hard to evaluate. It's a mix of self-disclosed bots and accounts that may or may not be bots.</li>
</ol>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="anonymity-on-the-web-even-less-tenable">Anonymity on the web: even less tenable<a href="https://tombedor.dev/ai-threatens-privacy/#anonymity-on-the-web-even-less-tenable" class="hash-link" aria-label="Direct link to Anonymity on the web: even less tenable" title="Direct link to Anonymity on the web: even less tenable" translate="no">​</a></h2>
<p>This creates an obvious usability problem for the web. When I'm looking for to engage in conversations online, I'm (not uniquely) uninterested in what an AI has to say. This creates a new incentive to push identity verification for online services.</p>
<p>This is a new inflection point for privacy. Perhaps relatedly, <a href="https://www.theverge.com/tech/875309/discord-age-verification-global-roll-out" target="_blank" rel="noopener noreferrer">Discord is rumored to be rolling out face scan verification</a> soon. Governments across the world seem to be <a href="https://harvardlawreview.org/print/vol-139/content-neutrality-for-kids-intermediate-scrutiny-for-social-media-age-verification-laws/" target="_blank" rel="noopener noreferrer">again pushing to eliminate online anonymity</a>.</p>
<p>At the same time online privacy faces new threats, events in my <a href="https://www.nbcnews.com/tech/internet/fbi-investigating-minnesota-signal-minneapolis-group-ice-patel-kash-rcna256041" target="_blank" rel="noopener noreferrer">home town of Minneapolis</a> are providing vindication for commentators stubbornly insisting on its importance<sup><a href="https://tombedor.dev/ai-threatens-privacy/#user-content-fn-1-7169d7" id="user-content-fnref-1-7169d7" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>. To take one example of many documented abuses, the DHS recently <a href="https://newrepublic.com/post/206088/homeland-security-67-year-old-us-citizen-criticized-email" target="_blank" rel="noopener noreferrer">responded to an innocuous email from a concerned 67 year old citizen with an administrative subpenea on his Google account</a> and an intimidating visit to his home. Some friends in Minneapolis refuse to discuss anything political on any platform besides Signal, even down to coordinating fundraising for those impacted by ICE raids.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="an-uncertain-future">An uncertain future<a href="https://tombedor.dev/ai-threatens-privacy/#an-uncertain-future" class="hash-link" aria-label="Direct link to An uncertain future" title="Direct link to An uncertain future" translate="no">​</a></h2>
<p>The driving force against online anonymity has long been government regulation under the guise of protecting minors. AI bots convincingly behaving like humans degrades the experience <em>for</em> humans for online platforms, and my guess would be that identity verification requirements will grow as a result.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/ai-threatens-privacy/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-7169d7">
<p>As of the writing of this post, ICE actions in Minneapolis continue to impose tremendous hardship on immigrant communities there. If you are interested in helping, <a href="https://tombedor.dev/about/#minneapolis-immigrant-resources">consider supporting one of these organizations</a>. <a href="https://tombedor.dev/ai-threatens-privacy/#user-content-fnref-1-7169d7" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[How to Write Good (Short) Docs]]></title>
        <id>https://tombedor.dev/how-to-write-good-short-docs/</id>
        <link href="https://tombedor.dev/how-to-write-good-short-docs/"/>
        <updated>2026-02-04T00:00:00.000Z</updated>
        <summary type="html"><![CDATA["I would have written a shorter letter, but I did not have the time."]]></summary>
        <content type="html"><![CDATA[<blockquote>
<p><em>"I would have written a shorter letter, but I did not have the time."</em></p>
<p>— Mark Twain<sup><a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fn-1-3a821f" id="user-content-fnref-1-3a821f" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup></p>
</blockquote>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="overview">Overview<a href="https://tombedor.dev/how-to-write-good-short-docs/#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p>This post describes how to write a short document for your teammates. The documents under discussion are commonly referred to as "one-pagers", and are distinct from engineering design docs or other more formal engineering docs.</p>
<p>A one pager might be written to:</p>
<ul>
<li>surface an org pain point</li>
<li>propose a project</li>
<li>lay out a roadmap</li>
<li>explain the current state of a system or systems</li>
<li>announce or document a decision</li>
</ul>
<p>This is also distinct from user-facing documentation. Some but not all of what we're talking about applies to those styles of writing<sup><a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fn-2-3a821f" id="user-content-fnref-2-3a821f" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-one-pager-is-not-a-new-invention">The one-pager is not a new invention<a href="https://tombedor.dev/how-to-write-good-short-docs/#the-one-pager-is-not-a-new-invention" class="hash-link" aria-label="Direct link to The one-pager is not a new invention" title="Direct link to The one-pager is not a new invention" translate="no">​</a></h3>
<p>Prior to computers, short memos were a primary tool for intra-office communication, in addition to in person interactions:</p>
<p>They often needed to be re-typed and/or printed, so they needed to be short!</p>
<p><img decoding="async" loading="lazy" alt="prehistory" src="https://tombedor.dev/assets/images/prehistory-6c2c5cffd4aaced9a3cea0e0d03fea5b.png" width="1378" height="1104" class="img_ev3q"></p>
<p>Fast forward to the introduction of Slack and similar tools. Now, writing messages to your teammates no longer costs money. The new constraint is <em>attention bandwidth</em>:</p>
<p><img decoding="async" loading="lazy" alt="slack" src="https://tombedor.dev/assets/images/slack-c7b705a9b2575f276c92d70f07b1e3db.png" width="1380" height="1104" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-writing-is-a-worthwhile-skill-to-develop-in-the-age-of-ai">Why writing is a worthwhile skill to develop in the age of AI<a href="https://tombedor.dev/how-to-write-good-short-docs/#why-writing-is-a-worthwhile-skill-to-develop-in-the-age-of-ai" class="hash-link" aria-label="Direct link to Why writing is a worthwhile skill to develop in the age of AI" title="Direct link to Why writing is a worthwhile skill to develop in the age of AI" translate="no">​</a></h2>
<p>With the advent of AI coding agents, coding is somewhat lessened as a differentiating skill. However, you have a major advantage against AI in writing for teammates:</p>
<p><img decoding="async" loading="lazy" alt="you-vs-bots" src="https://tombedor.dev/assets/images/you-vs-bots-ae0f2474a6462438cd9fb4ae98394f6a.png" width="961" height="630" class="img_ev3q"></p>
<p>You know your teammates personally, and you have undocumented business context (if what you are writing about is already documented, there probably doesn't need to be a doc!)</p>
<p>This allows you to synthesize and describe with more precision and nuance than AI can.</p>
<p>This isn't a skill that AI has (yet), and if it does develop it, it'll develop them later than other skills like writing code.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-to-write-a-good-short-doc">How to write a good, short doc<a href="https://tombedor.dev/how-to-write-good-short-docs/#how-to-write-a-good-short-doc" class="hash-link" aria-label="Direct link to How to write a good, short doc" title="Direct link to How to write a good, short doc" translate="no">​</a></h2>
<p>So, how do we do it?</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="optimize-for-short-attention-spans">Optimize for short attention spans.<a href="https://tombedor.dev/how-to-write-good-short-docs/#optimize-for-short-attention-spans" class="hash-link" aria-label="Direct link to Optimize for short attention spans." title="Direct link to Optimize for short attention spans." translate="no">​</a></h3>
<p>The most important concern is to keep the limited attention budget of your teammates in mind.</p>
<p>Different types of stakeholders will give a different amount of attention to your doc:</p>
<p><img decoding="async" loading="lazy" alt="stakeholders" src="https://tombedor.dev/assets/images/stakeholders-15d4dd5237a55501c3d54d36da63ac4a.png" width="1457" height="1076" class="img_ev3q"></p>
<p>So, in laying out your doc, consider:</p>
<ul>
<li>If someone (e.g., a lead of leads) reads this for 5 seconds, do they get the right 5 seconds of context?</li>
<li>What about 5 minutes?</li>
<li>If a teammate or stakeholder wants to delve into some of the details while ignoring others, can they?</li>
</ul>
<p>Tactics for this include:</p>
<ul>
<li>Clear, accurate, descriptive titles</li>
<li>A concise summary at the top of what the doc covers, and what it <em>does not</em> cover</li>
<li>Formatting: Headings and subheadings that help the reader navigate</li>
<li>Tabs in Google Docs can be helpful, but are controversial. They can prevent doc sprawl (e.g. working group meeting notes as a tab of the working group charter, rather than a separate doc), but oftentimes are shared with direct links to the wrong tab, which can be confusing for readers.</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="diagrams">Diagrams<a href="https://tombedor.dev/how-to-write-good-short-docs/#diagrams" class="hash-link" aria-label="Direct link to Diagrams" title="Direct link to Diagrams" translate="no">​</a></h3>
<p>A visual representation is an excellent way to quickly convey context. Here too, optimize for attention spans. For example, in a system diagram, sometimes it's helpful to omit some systems that aren't relevant to the discussion<sup><a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fn-3-3a821f" id="user-content-fnref-3-3a821f" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">3</a></sup>.</p>
<img src="https://tombedor.dev/diagrams/how-to-write-good-short-docs/diagrams.png" alt="diagrams" style="width:60%">
<p>Excalidraw is a really excellent tool for this. It's open source (you can make them in your code editor), and has just the right amount of knobs and shapes. The hand drawn look means that it's not as distracting when shapes aren't perfectly aligned.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="align-with-reader-interest">Align with reader interest<a href="https://tombedor.dev/how-to-write-good-short-docs/#align-with-reader-interest" class="hash-link" aria-label="Direct link to Align with reader interest" title="Direct link to Align with reader interest" translate="no">​</a></h3>
<p>It's very hard to persuade people to care about something that they don't already care about. Much easier is to convince people that something <em>aligns with the thing they care about</em>, or especially an <em>outcome they want</em>.</p>
<p>I.e., don't write "we should do more of XYZ", write "doing XYZ helps us accomplish <code>{thing people already care about}</code>".</p>
<p>If your doc can be summarized by, "everyone should care more about XYZ", it's probably not a very good doc!</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="connect-to-existing-conversations">Connect to existing conversations<a href="https://tombedor.dev/how-to-write-good-short-docs/#connect-to-existing-conversations" class="hash-link" aria-label="Direct link to Connect to existing conversations" title="Direct link to Connect to existing conversations" translate="no">​</a></h3>
<p>A concrete way to align with reader interest is to connect the dots between your document and other conversations and documents at your company.</p>
<p>Linking related docs at the top of your doc is an easy step that is often missed. This helps in a couple of ways:</p>
<ul>
<li>It implies alignment with whatever the linked doc is discussing</li>
<li>It helps elevate teammates who might be advocating something similar to what you're writing about</li>
<li>It makes your doc a useful vehicle for discovery of other docs</li>
</ul>
<p><img decoding="async" loading="lazy" alt="doc_graph" src="https://tombedor.dev/assets/images/doc_graph-aed8fd2f24d3881db5b1678b2d4febab.png" width="1648" height="1150" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="build-consensus-offline">Build consensus offline<a href="https://tombedor.dev/how-to-write-good-short-docs/#build-consensus-offline" class="hash-link" aria-label="Direct link to Build consensus offline" title="Direct link to Build consensus offline" translate="no">​</a></h3>
<blockquote>
<p><em>"Every doc is approved or rejected before it is written."</em></p>
<p>— Sun Tzu</p>
</blockquote>
<p>If the goal of your document is to build consensus around a decision or initiative, the work should begin before you start writing. It is much easier to <em>document</em> consensus than to <em>build consensus through a document</em>.</p>
<p>Talking to stakeholders in advance lets you better anticipate questions or concerns, and helps you learn the language they use to describe their problems.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="use-ai-thoughtfully">Use AI thoughtfully<a href="https://tombedor.dev/how-to-write-good-short-docs/#use-ai-thoughtfully" class="hash-link" aria-label="Direct link to Use AI thoughtfully" title="Direct link to Use AI thoughtfully" translate="no">​</a></h3>
<p>Use AI as your editor, not your ghostwriter.</p>
<p>It's worth reiterating: if an AI could do a good job of writing your document, it's probably not something you need to write.</p>
<p>People give very little attention to text or imagery that other people have generated<sup><a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fn-4-3a821f" id="user-content-fnref-4-3a821f" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">4</a></sup>. If I'm interested in what AI has to say about something, I can have it generate it myself. This will be interactive and more tailored to my understanding.</p>
<p>While AI isn't a good writer, it's an <em>excellent</em> editor. It is very good at evaluating your doc and giving useful feedback on it. Typical prompts I use:</p>
<blockquote>
<p><em>Evaluate the structure of my document, and suggest improvements</em></p>
</blockquote>
<blockquote>
<p><em>Identify typos or awkward phrasing, and suggest alternatives</em></p>
</blockquote>
<p>This very easily bleeds into having an AI write the doc for you, and in fact most models will do so unless instructed not to. I add this to agent instructions:</p>
<blockquote>
<p>Do NOT write any actual content, paragraphs, or prose into the file</p>
</blockquote>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="thoughtful-timely-sharing">Thoughtful, timely sharing<a href="https://tombedor.dev/how-to-write-good-short-docs/#thoughtful-timely-sharing" class="hash-link" aria-label="Direct link to Thoughtful, timely sharing" title="Direct link to Thoughtful, timely sharing" translate="no">​</a></h3>
<blockquote>
<p><em>"If a doc is written in a forest, and no one has the link, does it create business value?"</em></p>
<p>— George Berkeley</p>
</blockquote>
<p>Your doc doesn't do any good if no one reads it. That's why being thoughtful about how, where, and when you share your doc is important.</p>
<p>If you've already talked to stakeholders, you have a great advantage! They already know your doc is coming, and that it's about something they care about. They will be able to tell you what the best channels to share for their teammates (and might even do it for you!).</p>
<p>Timing is also important. Attention to an issue can have a short life. Sometimes it's better to write a less comprehensive doc quickly than a more comprehensive doc that takes longer to write. In these situations, you can acknowledge unknowns, and fill in details later.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="antipatterns">Antipatterns<a href="https://tombedor.dev/how-to-write-good-short-docs/#antipatterns" class="hash-link" aria-label="Direct link to Antipatterns" title="Direct link to Antipatterns" translate="no">​</a></h2>
<p>Most antipatterns I observe come from a lack of confidence in the writing or decision. While it's important to solicit feedback, the fact that you are writing a doc on a topic probably means you are well qualified to speak to it.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="designating-a-doc-as-a-living-doc-or-overusing-wip">Designating a doc as a "Living Doc" or overusing "WIP"<a href="https://tombedor.dev/how-to-write-good-short-docs/#designating-a-doc-as-a-living-doc-or-overusing-wip" class="hash-link" aria-label="Direct link to Designating a doc as a &quot;Living Doc&quot; or overusing &quot;WIP&quot;" title="Direct link to Designating a doc as a &quot;Living Doc&quot; or overusing &quot;WIP&quot;" translate="no">​</a></h3>
<p>In the age of Google docs, every doc can be changed at any time, so every doc is a living doc. Similarly, once a doc has been shared, it's time to remove the WIP label.</p>
<p>Adding WIP says to the reader: "You should probably wait to read this". But you can and should improve your doc at any time.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="hesitance-to-express-a-pov">Hesitance to express a POV<a href="https://tombedor.dev/how-to-write-good-short-docs/#hesitance-to-express-a-pov" class="hash-link" aria-label="Direct link to Hesitance to express a POV" title="Direct link to Hesitance to express a POV" translate="no">​</a></h3>
<p>Sometimes, in a decision doc, writers will give even treatment to all available options, in order to avoid looking biased. But this doesn't really serve the reader well. It's more helpful to know the decision being favored, and if they disagree they can always comment to that effect.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="burying-the-lede">Burying the lede<a href="https://tombedor.dev/how-to-write-good-short-docs/#burying-the-lede" class="hash-link" aria-label="Direct link to Burying the lede" title="Direct link to Burying the lede" translate="no">​</a></h3>
<p>A common antipattern is for writers to set up their argument or proposal with extensive background information at the beginning of their doc. Getting through the background should not be a prerequisite for knowing what the point of your doc is - some readers will already have the necessary background context, others will only be interested in the decision. Laying out the scope and goals at the top of your doc orients the reader and helps them understand what background context is relevant.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-hard-part">The hard part<a href="https://tombedor.dev/how-to-write-good-short-docs/#the-hard-part" class="hash-link" aria-label="Direct link to The hard part" title="Direct link to The hard part" translate="no">​</a></h2>
<p>The hardest part of a good (short!) doc isn't the writing. It's knowing what to cut, who you're writing for, and what only you can say. That last one is the part no one else can do for you (not even AI!).</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/how-to-write-good-short-docs/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-3a821f">
<p>It <a href="https://quoteinvestigator.com/2012/04/28/shorter-letter/" target="_blank" rel="noopener noreferrer">wasn't actually him</a> but you get the point <a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fnref-1-3a821f" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-3a821f">
<p>Namely, while AI is a poor writer of one pagers, their ability to understand code and follow writing structure makes them quite good at writing user-facing documentation <a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fnref-2-3a821f" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-3-3a821f">
<p>It felt wrong to have a diagram section without a diagram. <a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fnref-3-3a821f" data-footnote-backref="" aria-label="Back to reference 3" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-4-3a821f">
<p>Full disclosure: my sources for this claim are <em>vibes</em>. <a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fnref-4-3a821f" data-footnote-backref="" aria-label="Back to reference 4" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[MCP is a Fad]]></title>
        <id>https://tombedor.dev/mcp-is-a-fad/</id>
        <link href="https://tombedor.dev/mcp-is-a-fad/"/>
        <updated>2025-12-12T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Overview]]></summary>
        <content type="html"><![CDATA[<h2 class="anchor anchorWithStickyNavbar_LWe7" id="overview">Overview<a href="https://tombedor.dev/mcp-is-a-fad/#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p><a href="https://modelcontextprotocol.io/docs/getting-started/intro" target="_blank" rel="noopener noreferrer">Model Context Protocol</a> (MCP) has taken off as the standardized platform for AI integrations, and it's difficult to justify <em>not</em> supporting it. However, this popularity will be short-lived.</p>
<p>Some of this popularity stems from misconceptions about what MCP uniquely accomplishes, but the majority is due to the fact that it's <em>very easy</em> to add an MCP server. For a brief period, it seemed like adding an MCP server was a nice avenue for getting attention to your project, which is why so many projects have added support.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-is-mcp">What is MCP?<a href="https://tombedor.dev/mcp-is-a-fad/#what-is-mcp" class="hash-link" aria-label="Direct link to What is MCP?" title="Direct link to What is MCP?" translate="no">​</a></h2>
<p>MCP claims to solve the "NxM problem": with N agents and M toolsets, users would otherwise need many bespoke connectors.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-nxm-problem">The NxM problem<a href="https://tombedor.dev/mcp-is-a-fad/#the-nxm-problem" class="hash-link" aria-label="Direct link to The NxM problem" title="Direct link to The NxM problem" translate="no">​</a></h3>
<p>A common misconception is that MCP is <em>required</em> for function calling. It's not. With tool-calling models, a list of available tools is provided to the LLM with each request. If the LLM wants to call a tool, it returns JSON-formatted parameters:</p>
<p><img decoding="async" loading="lazy" alt="function_calling_no_mcp" src="https://tombedor.dev/assets/images/function_calling_no_mcp-68bf6cb26b5d74739ed3e4437a7f2e0d.png" width="3499" height="749" class="img_ev3q"></p>
<p>The application is responsible for providing tool schemas, parsing parameters, and executing calls. The problem arises when users want to reuse toolsets across different agents, since each has slightly different APIs.</p>
<p>For example, tools are exposed to <a href="https://ai.google.dev/gemini-api/docs/function-calling?example=meeting#rest_2" target="_blank" rel="noopener noreferrer">Gemini's API</a> via <code>functionDeclarations</code> nested inside a <code>tools</code> array:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "contents": [...],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "tools": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "functionDeclarations": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "name": "set_meeting",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "description": "...",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span></code></pre></div></div>
<p>In <a href="https://platform.openai.com/docs/guides/text?lang=curl" target="_blank" rel="noopener noreferrer">OpenAI's API</a>, tool schemas use a flat <code>tools</code> array with <code>type: "function"</code>:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST https://api.openai.com/v1/responses \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "gpt-4o",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "input": [...],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "tools": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "type": "function",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "name": "get_weather",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span></code></pre></div></div>
<p>This is the "NxM" problem. In theory, users must build N × M connectors. In practice, the differences are minor (same semantics, slightly different JSON shape), and frameworks like <a href="https://python.langchain.com/docs/how_to/function_calling/" target="_blank" rel="noopener noreferrer">LangChain</a>, <a href="https://docs.litellm.ai/docs/completion/function_call" target="_blank" rel="noopener noreferrer">LiteLLM</a>, and <a href="https://huggingface.co/learn/cookbook/en/agents" target="_blank" rel="noopener noreferrer">SmolAgents</a> already abstract them away. Crucially, these options <em>execute tool calls in the same runtime as the agent</em>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-mcp-addresses-it">How MCP addresses it<a href="https://tombedor.dev/mcp-is-a-fad/#how-mcp-addresses-it" class="hash-link" aria-label="Direct link to How MCP addresses it" title="Direct link to How MCP addresses it" translate="no">​</a></h3>
<p>MCP handles exposing and invoking tools via separate processes:</p>
<p><img decoding="async" loading="lazy" alt="function_calling_mcp" src="https://tombedor.dev/assets/images/function_calling_mcp-6a85d0945fb38d60ea4e7fe1cb4c6bfe.png" width="3633" height="1163" class="img_ev3q"></p>
<p>A JSON configuration controls which MCP servers to start. Each server runs in its own long-lived process, handling tool invocations independently. The application still orchestrates the agent loop and presents results to users.</p>
<p>This abstracts away schema generation and invocation, but at a cost. Tool logic runs in a separate process, making resource management opaque. The application loses control over tool instructions, logging, and error handling. And every tool call crosses a process boundary.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="scope-tools-dominate">Scope: tools dominate<a href="https://tombedor.dev/mcp-is-a-fad/#scope-tools-dominate" class="hash-link" aria-label="Direct link to Scope: tools dominate" title="Direct link to Scope: tools dominate" translate="no">​</a></h3>
<p>MCP also defines primitives for prompts and resources, but adoption of these is much smaller than tools<sup><a href="https://tombedor.dev/mcp-is-a-fad/#user-content-fn-1-e0bb36" id="user-content-fnref-1-e0bb36" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>:</p>
<p><img decoding="async" loading="lazy" alt="code_references" src="https://tombedor.dev/assets/images/code_references-9770c67c8a8737599e08efffe56860dd.png" width="919" height="908" class="img_ev3q"></p>
<p>Given this, the rest of this post focuses on tool calling, which is MCP's primary use case in practice.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="problems">Problems<a href="https://tombedor.dev/mcp-is-a-fad/#problems" class="hash-link" aria-label="Direct link to Problems" title="Direct link to Problems" translate="no">​</a></h2>
<p>The convenience of MCP comes with a price, stemming from two architectural attributes of an MCP-driven application:</p>
<p><img decoding="async" loading="lazy" alt="issues" src="https://tombedor.dev/assets/images/issues-b669b304c07984530122865a9393f908.png" width="1851" height="971" class="img_ev3q"></p>
<p>Since tools are drawn from arbitrary sources, they are not aware of what other tools are available to the agent. Their instructions can't account for the rest of the toolbox.</p>
<p>The second issue stems from different toolsets having their own runtimes. This introduces a variety of problems I'll discuss below.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="incoherent-toolbox">Incoherent toolbox<a href="https://tombedor.dev/mcp-is-a-fad/#incoherent-toolbox" class="hash-link" aria-label="Direct link to Incoherent toolbox" title="Direct link to Incoherent toolbox" translate="no">​</a></h3>
<p><a href="https://www.microsoft.com/en-us/research/video/tool-space-interference-an-emerging-problem-for-llm-agents/" target="_blank" rel="noopener noreferrer">Agents tend to be less effective at tool use as the number of tools grows</a>. With a well-organized, coherent toolset, agents do well. With a larger, disorganized toolset, they struggle. <a href="https://platform.openai.com/docs/guides/function-calling" target="_blank" rel="noopener noreferrer">OpenAI recommends keeping tools well below 20</a>, yet many MCP servers exceed this threshold.</p>
<p>Why does this happen? Consider a workflow in which an agent should send a notification after doing work:</p>
<p><img decoding="async" loading="lazy" alt="confusion" src="https://tombedor.dev/assets/images/confusion-9299d4c0a01ddaf58ebad2038454676b.png" width="2005" height="1001" class="img_ev3q"></p>
<p>A tool's fit for a task depends not just on the job at hand, but also on what else is in the toolbox. Pliers can pull a nail, but if a hammer is available it's probably the better choice. When tools ship in isolation, their instructions can't say "use me only when you don't have a hammer," so agents don't get cohesive guidance.</p>
<p>If the toolset is controlled by the same authors as the application, they can add prompting to the toolsets to disambiguate when to use which tool. If not, the problem must be solved by system prompts or user guidance.</p>
<p>Looking through #mcp channels of open source coding agents, you'll invariably find users who struggle to get the agent to use the tools in the way they want<sup><a href="https://tombedor.dev/mcp-is-a-fad/#user-content-fn-2-e0bb36" id="user-content-fnref-2-e0bb36" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup>:</p>
<p><img decoding="async" loading="lazy" alt="trouble" src="https://tombedor.dev/assets/images/trouble-e4382a8d6ae0564086624162693e61b6.png" width="3314" height="328" class="img_ev3q"></p>
<p>Or, users complaining of how many tokens are burned by tool instructions:</p>
<p><img decoding="async" loading="lazy" alt="inefficient" src="https://tombedor.dev/assets/images/inefficient-1abf3d1cad9bdcf5648d9677f6f8c6e1.png" width="2400" height="232" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="arbitrary-separate-runtimes">Arbitrary, separate runtimes<a href="https://tombedor.dev/mcp-is-a-fad/#arbitrary-separate-runtimes" class="hash-link" aria-label="Direct link to Arbitrary, separate runtimes" title="Direct link to Arbitrary, separate runtimes" translate="no">​</a></h3>
<p>Each MCP server <a href="https://modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle" target="_blank" rel="noopener noreferrer">starts a separate process</a> that survives for the length of the agent session.</p>
<p>Even in the healthy state, this introduces a collection of processes that remain mostly idle, aside from serving occasional requests from an agent. In an error state, we get all the usual headaches: dangling subprocesses, memory leaks, resource contention.</p>
<p>Users have these issues, if they are able to get the servers running at all: in support channels, the most common complaint is difficulty getting the servers to run:</p>
<p><img decoding="async" loading="lazy" alt="connection_problems" src="https://tombedor.dev/assets/images/connection_problem-7c5e6ede95ca4d7790b0caa5dd27d976.png" width="3196" height="668" class="img_ev3q"></p>
<p>MCP offers no way for servers to declare their runtime/dependency needs. Some authors work around it by baking installation into the launch command (e.g., <code>uv run some_tool mcp</code>), which only succeeds if the user already has the right tooling installed.</p>
<p>Even if the relevant package is there, the MCP server might not start it successfully. MCP servers only inherit <a href="https://modelcontextprotocol.io/legacy/tools/debugging#environment-variables" target="_blank" rel="noopener noreferrer">a subset of parent ENV variables</a> (<code>USER</code>, <code>HOME</code>, and <code>PATH</code>). This is particularly problematic for <code>nvm</code> or users leveraging virtual environments.</p>
<p>Python or Node developers might be comfortable debugging environment issues, (although MCP's subprocess orchestration makes this more difficult), but are likely less comfortable debugging Node issues <em>and</em> Python <em>and</em> other runtimes. MCP seems to assert that I as the user should not really care which of these are used, or how many.</p>
<p>Even if toolsets are in one given runtime, MCP potentially spins up many instances of it, obviating efficiencies from caching, connection pooling, and shared in-memory state. MCP's HTTP transport mode doesn't help; it's just another HTTP API, but with MCP's protocol overhead instead of battle-tested REST/OpenAPI patterns.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="security">Security<a href="https://tombedor.dev/mcp-is-a-fad/#security" class="hash-link" aria-label="Direct link to Security" title="Direct link to Security" translate="no">​</a></h3>
<p>MCP pushes users to install servers from npm, pip, or GitHub. This inherits the usual supply-chain risk, but without even the minimal guardrails those ecosystems provide. There's no central publisher or signing; anyone can ship a daemon that runs on your machine and MCP offers no provenance check.</p>
<p>MCP's specification <a href="https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digital-threats/mcp-security-network-exposed-servers-are-backdoors-to-your-private-data" target="_blank" rel="noopener noreferrer">doesn't mandate authentication</a>, leaving security decisions to individual server authors. The result: <a href="https://www.darkreading.com/vulnerabilities-threats/2000-mcp-servers-security" target="_blank" rel="noopener noreferrer">one scan found 492 MCP servers</a> running without any client authentication or traffic encryption. Even Anthropic's own Filesystem MCP Server had a sandbox escape via directory traversal (<a href="https://strobes.co/blog/mcp-model-context-protocol-and-its-critical-vulnerabilities/" target="_blank" rel="noopener noreferrer">CVE-2025-53110</a>).</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="mcp-related-security-incidents">MCP-related security incidents<a href="https://tombedor.dev/mcp-is-a-fad/#mcp-related-security-incidents" class="hash-link" aria-label="Direct link to MCP-related security incidents" title="Direct link to MCP-related security incidents" translate="no">​</a></h4>
<table><thead><tr><th>Issue</th><th>CVSS / Impact</th></tr></thead><tbody><tr><td><strong><a href="https://jfrog.com/blog/2025-6514-critical-mcp-remote-rce-vulnerability/" target="_blank" rel="noopener noreferrer">CVE-2025-6514</a></strong></td><td>9.6 (RCE in mcp-remote; 437,000+ downloads)</td></tr><tr><td><strong><a href="https://thehackernews.com/2025/07/critical-vulnerability-in-anthropics.html" target="_blank" rel="noopener noreferrer">CVE-2025-49596</a></strong></td><td>9.4 (RCE in Anthropic's MCP Inspector)</td></tr><tr><td><strong><a href="https://www.imperva.com/blog/another-critical-rce-discovered-in-a-popular-mcp-server/" target="_blank" rel="noopener noreferrer">CVE-2025-53967</a></strong></td><td>RCE in Figma MCP Server; 600,000+ downloads</td></tr><tr><td><strong><a href="https://www.bleepingcomputer.com/news/security/asana-warns-mcp-ai-feature-exposed-customer-data-to-other-orgs/" target="_blank" rel="noopener noreferrer">Asana data exposure</a></strong></td><td>Tenant isolation flaw exposed ~1,000 customers' data</td></tr></tbody></table>
<p>Unlike a human carefully clicking through an API, agents can be manipulated via prompt injection to call tools in unintended ways. The <a href="https://www.generalanalysis.com/blog/supabase-mcp-blog" target="_blank" rel="noopener noreferrer">Supabase MCP leak</a> demonstrated this "lethal trifecta": prompt injection → tool call → data exfiltration, extracting entire SQL databases including OAuth tokens. Again, this risk isn't unique to MCP. But the best mitigations are existing security infrastructure: scoped OAuth tokens, service identities with minimal permissions, and audit logging. MCP sidesteps this infrastructure rather than building on it.</p>
<p>A common defense is that MCP isolates credentials—the agent talks to a socket, never seeing your API tokens. But this threat model is narrow: an agent that can invoke <code>mcp.github.delete_repo()</code> doesn't need your token to cause damage. You're not eliminating trust; you're redirecting it to third-party code that, as the CVEs demonstrate, is often unaudited and vulnerable.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-cost-benefit-doesnt-add-up">The cost-benefit doesn't add up<a href="https://tombedor.dev/mcp-is-a-fad/#the-cost-benefit-doesnt-add-up" class="hash-link" aria-label="Direct link to The cost-benefit doesn't add up" title="Direct link to The cost-benefit doesn't add up" translate="no">​</a></h3>
<p>These problems could be worth the cost, if we were to gain significantly. But comparing tool calling with MCP to tool calling without it, MCP handles remarkably little. MCP is, more or less, handling serializing function call schemas and responses.</p>
<p>The tools developers are saving themselves from having to write are, overwhelmingly, <a href="https://mcp.alphavantage.co/?utm_source=mcp.so&amp;utm_medium=referral&amp;utm_campaign=202508&amp;utm_id=000001&amp;utm_term=web_project&amp;utm_content=v2" target="_blank" rel="noopener noreferrer">relatively thin wrappers around API clients</a>, or <a href="https://mcp.so/server/time/modelcontextprotocol" target="_blank" rel="noopener noreferrer">utility scripts</a>. In the former case, users must still obtain API keys, billing accounts, and so on.</p>
<p>This code <em>was</em> a hassle to write, prior to the advent of coding agents. But these small utility scripts are the precise thing that coding agents excel most at! A technical user of MCP tools will be hard-pressed to find a tool an agent could not one-shot in the programming language they are most comfortable in.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-it-took-off">Why it took off<a href="https://tombedor.dev/mcp-is-a-fad/#why-it-took-off" class="hash-link" aria-label="Direct link to Why it took off" title="Direct link to Why it took off" translate="no">​</a></h2>
<p>With these issues, it's fair to wonder why MCP has gained the popularity it has. It has had lots of support from Anthropic, and no trouble gaining traction with toolset publishers, agent providers, and enterprises. Why? It helps narratives:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="tool-authors-a-low-overhead-marketing-channel">Tool authors: A low overhead marketing channel<a href="https://tombedor.dev/mcp-is-a-fad/#tool-authors-a-low-overhead-marketing-channel" class="hash-link" aria-label="Direct link to Tool authors: A low overhead marketing channel" title="Direct link to Tool authors: A low overhead marketing channel" translate="no">​</a></h3>
<p>It's quite easy to publish an MCP server. The lack of startup requirements means you don't even need to publish to <code>npm</code> or <code>pip</code>: you can drop an <code>@mcp.server</code> annotation in your repo and host a small manifest JSON that points to your entry command (e.g., <code>node server.js</code>) and lists the tools.</p>
<p>This provides a nice narrative to gain attention to AI projects: A user can, in theory, easily add some MCP tools from a project, gain value, and follow interest in learning more about the project. Support overhead will, in the main, fall to agent maintainers.</p>
<p>Once publishers started appearing, it became difficult to justify <em>not</em> supporting MCP. Your project could be perceived as being against open standards.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="enterprise-ai-credibility">Enterprise: AI credibility<a href="https://tombedor.dev/mcp-is-a-fad/#enterprise-ai-credibility" class="hash-link" aria-label="Direct link to Enterprise: AI credibility" title="Direct link to Enterprise: AI credibility" translate="no">​</a></h3>
<p>Over the last few years, anyone watching San Francisco billboards has witnessed enterprise tools rebranding toward AI. MCP support provided an easy way to make your e.g. project management tool be AI. The branding of MCP as an "open standard" increased pressure to adopt - lack of MCP support could signal a lack of willingness to adopt open standards.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="anthropic-open-source-credibility">Anthropic: Open source credibility<a href="https://tombedor.dev/mcp-is-a-fad/#anthropic-open-source-credibility" class="hash-link" aria-label="Direct link to Anthropic: Open source credibility" title="Direct link to Anthropic: Open source credibility" translate="no">​</a></h3>
<p>MCP's status as <em>the</em> open standard for AI and the enterprise adoption greatly benefited Anthropic. The big fear of investors is that enterprise adoption doesn't persist - adoption of Anthropic's open standard helped this.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="alternatives">Alternatives<a href="https://tombedor.dev/mcp-is-a-fad/#alternatives" class="hash-link" aria-label="Direct link to Alternatives" title="Direct link to Alternatives" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="who-benefits-from-mcp">Who benefits from MCP?<a href="https://tombedor.dev/mcp-is-a-fad/#who-benefits-from-mcp" class="hash-link" aria-label="Direct link to Who benefits from MCP?" title="Direct link to Who benefits from MCP?" translate="no">​</a></h3>
<p>There are a few different possible users who interact with MCP:</p>
<p><img decoding="async" loading="lazy" alt="users" src="https://tombedor.dev/assets/images/users-e5dc7ca46c5f7189f9323dbae13e2544.png" width="1077" height="722" class="img_ev3q"></p>
<ul>
<li>
<p><em>Technical end users</em> want to create tools and share them between different agents they might want to use.</p>
</li>
<li>
<p><em>Non-technical end users</em> want to use different tools while using agents. Note that this user group for MCP is, at present, largely theoretical. Exposing toolsets to MCP involves editing JSON, making it out of reach for non-technical users.</p>
</li>
<li>
<p><em>Internal app devs</em> run production AI applications.</p>
</li>
<li>
<p><em>Agent devs</em> create agents for external users. They wish to enable their end users to swap in whatever toolsets they like.</p>
</li>
<li>
<p><em>Tool authors</em> create toolsets they wish to expose to users. MCP provides a way to easily share their work to users of different agents.</p>
</li>
</ul>
<p>Notice that the supposed beneficiaries are overwhelmingly technical. The "app store for AI" vision that would serve non-technical users remains unfulfilled.</p>
<p>For each user type, there's a simpler approach that avoids MCP's overhead:</p>
<table><thead><tr><th>User Type</th><th>MCP Promise</th><th>Better Alternative</th><th>Why</th></tr></thead><tbody><tr><td><strong>Technical end users</strong></td><td>Share tools between agents</td><td>Local scripts + command runner</td><td>AI can one-shot these scripts; works with any agent via shell; exposes tools to humans too</td></tr><tr><td><strong>Non-technical end users</strong></td><td>Easy tool installation</td><td><em>(MCP doesn't deliver)</em></td><td>MCP requires JSON editing—this group remains underserved regardless</td></tr><tr><td><strong>Internal app devs</strong></td><td>Standard tool interface</td><td>1st party tools</td><td>Same codebase, existing auth/logging/tracing, no process overhead, coherent toolbox</td></tr><tr><td><strong>Agent devs</strong></td><td>Let users swap toolsets</td><td>SDK abstraction (LangChain, LiteLLM)</td><td>Handles model API differences without separate processes</td></tr><tr><td><strong>Tool authors</strong></td><td>Distribute to all agents</td><td>OpenAPI specs or libraries</td><td>Existing distribution (npm, pip), decades of tooling, no new protocol</td></tr></tbody></table>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="local-scripts-with-command-runner">Local scripts with command runner<a href="https://tombedor.dev/mcp-is-a-fad/#local-scripts-with-command-runner" class="hash-link" aria-label="Direct link to Local scripts with command runner" title="Direct link to Local scripts with command runner" translate="no">​</a></h3>
<p>For a technical user, letting an agent invoke scripts directly is very difficult to beat. Useful 50-100 line scripts are <em>extremely</em> easy to write with AI coding agents. Care needs to be taken to filter output - raw build scripts can stream verbose logs into agent context, eating up tokens.</p>
<p><img decoding="async" loading="lazy" alt="just" src="https://tombedor.dev/assets/images/just-9bcd94837fc77f55180b6d119bdb3ed9.png" width="1352" height="1171" class="img_ev3q"></p>
<p>Robust security against agent actions going haywire can be achieved via command runners like <a href="https://github.com/casey/just" target="_blank" rel="noopener noreferrer">just</a> or <a href="https://en.wikipedia.org/wiki/Make_(software)" target="_blank" rel="noopener noreferrer">make</a>. These tools provide everything that MCP does - command specifications, descriptions, arguments. Agents allow you to specify what command prefixes can be invoked without approval - put your agent commands in a <code>justfile</code>, and only auto-allow shell commands prefixed with <code>just</code>.</p>
<p>This approach also exposes tools to humans, and is a nice approach for improving dev environments for humans and AI agents at the same time. (See <a href="https://tombedor.dev/make-it-easy-for-humans/">Make It Easy for Humans First, Then AI</a> for more on this).</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="1st-party-tools">1st party tools<a href="https://tombedor.dev/mcp-is-a-fad/#1st-party-tools" class="hash-link" aria-label="Direct link to 1st party tools" title="Direct link to 1st party tools" translate="no">​</a></h3>
<p>For a self contained application, there is little reason to separate tool codebases from the codebase for the rest of the application. Tools can be dynamically exposed to the agent based on application context.</p>
<p>In a first party context, any code that devs wish to reuse can be exposed as libraries, just like any other code they wish to share. An AI tool is really nothing more than a function, and the fact that it's invoked by AI does not warrant special handling.</p>
<p>An enterprise context should have robust infrastructure for authenticating, authorizing, provisioning service identities, and tracing call chains for service to service calls. That some of these calls are now <em>AI</em> service to service calls does not warrant a rebuilt security posture.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="openapi--rest">OpenAPI / REST<a href="https://tombedor.dev/mcp-is-a-fad/#openapi--rest" class="hash-link" aria-label="Direct link to OpenAPI / REST" title="Direct link to OpenAPI / REST" translate="no">​</a></h3>
<p>OpenAPI specs are already self-describing enough for agents—they include operation descriptions, parameter schemas, examples, and enums. LLMs understand them well; GPT Actions are literally OpenAPI specs. The glue needed between an OpenAPI endpoint and an agent (output filtering, context, auth) is the same glue MCP requires. MCP doesn't provide meaningfully better tool descriptions; it just reinvents a schema format that already exists, without the decades of tooling, validation, and battle-testing.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="a-prediction">A prediction<a href="https://tombedor.dev/mcp-is-a-fad/#a-prediction" class="hash-link" aria-label="Direct link to A prediction" title="Direct link to A prediction" translate="no">​</a></h2>
<p>MCP's popularity will be relatively short-lived. The cost benefit does not add up, and there are readily available alternatives. The introduction of <a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview" target="_blank" rel="noopener noreferrer">Claude Skills</a> and <a href="https://simonwillison.net/2025/Dec/12/openai-skills/" target="_blank" rel="noopener noreferrer">OpenAI's quick adoption</a> signal that even model providers agree.</p>
<p>Claude Skills are an improvement over MCP - rather than spawning long lived processes, it simply organizes commands within Markdown files in an agent-specific directory. However, this is still a suboptimal place for useful documentation and commands. Better is to optimize organization of documentation for humans, and point agents there - have the agent conform to humans, rather than the other way around. More on this in <a href="https://tombedor.dev/make-it-easy-for-humans/">Don't Write Docs Twice</a>.</p>
<p>Longstanding tools and techniques for collaboration amongst human devs remain compelling, and these options will chip away at more AI-centric techniques which reinvent the wheel.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/mcp-is-a-fad/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-e0bb36">
<p>Source: Github searches for <a href="https://github.com/search?q=%40mcp.tool&amp;type=code" target="_blank" rel="noopener noreferrer">@mcp.tool</a> (58.1K results), <a href="https://github.com/search?q=%40mcp.resource&amp;type=code" target="_blank" rel="noopener noreferrer">@mcp.resource</a> (9.1K), and <a href="https://github.com/search?q=%40mcp.prompt&amp;type=code" target="_blank" rel="noopener noreferrer">@mcp.prompt</a> (6.1K), searched 2025-12-08. <a href="https://tombedor.dev/mcp-is-a-fad/#user-content-fnref-1-e0bb36" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-e0bb36">
<p>Support request snippets are pulled from Discord. <a href="https://tombedor.dev/mcp-is-a-fad/#user-content-fnref-2-e0bb36" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Tack - Reminders powered by local AI]]></title>
        <id>https://tombedor.dev/tack/</id>
        <link href="https://tombedor.dev/tack/"/>
        <updated>2025-12-04T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[I'm working on an iPhone app called Tack. I have a terrible time remembering things, and have resorted to a patchwork of emails to myself and disorganized notes. I find reminder apps frustrating, the pre-AI ones aren't smart enough, and the AI ones treat every input like an invitation to have a conversation. Tack shoots for a middle ground:]]></summary>
        <content type="html"><![CDATA[<p>I'm working on an iPhone app called Tack. I have a terrible time remembering things, and have resorted to a patchwork of emails to myself and disorganized notes. I find reminder apps frustrating, the pre-AI ones aren't smart enough, and the AI ones treat every input like an invitation to have a conversation. Tack shoots for a middle ground:</p>
<p><img decoding="async" loading="lazy" alt="tack" src="https://tombedor.dev/assets/images/tack-efdd5e8e55b05787eb487b9e0ccc3516.png" width="1781" height="2327" class="img_ev3q"></p>
<p>I get the <em>ick</em> from divulging personal details to LLM providers, so Tack uses local AI models (using Apple's on device Apple Intelligence).</p>
<p>The project is ready for test users! If you're interested in trying it out, please fill out the form below!</p>
<iframe src="https://docs.google.com/forms/d/e/1FAIpQLScqT440AcNKri-OGFMACMyogU7AP_IqVQ0mkBD_0C8fCqD7Rw/viewform?embedded=true" width="100%" height="800" frameborder="0" marginheight="0" marginwidth="0">Loading…</iframe>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Optimize for Humans]]></title>
        <id>https://tombedor.dev/make-it-easy-for-humans/</id>
        <link href="https://tombedor.dev/make-it-easy-for-humans/"/>
        <updated>2025-11-26T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[I recently wrote about optimizing repos for AI, and since then I've been maintaining separate docs for humans (README, contributing guides) and AI agents (.cursorrules, CLAUDE.md, etc.). The problem? I keep writing the same information twice.]]></summary>
        <content type="html"><![CDATA[<p>I recently wrote about <a href="https://tombedor.dev/optimizing-repos-for-ai/">optimizing repos for AI</a>, and since then I've been maintaining separate docs for humans (README, contributing guides) and AI agents (<code>.cursorrules</code>, <code>CLAUDE.md</code>, etc.). The problem? I keep writing the same information twice.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-duplication-problem">The duplication problem<a href="https://tombedor.dev/make-it-easy-for-humans/#the-duplication-problem" class="hash-link" aria-label="Direct link to The duplication problem" title="Direct link to The duplication problem" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="youre-documenting-the-same-things-in-multiple-places">You're documenting the same things in multiple places<a href="https://tombedor.dev/make-it-easy-for-humans/#youre-documenting-the-same-things-in-multiple-places" class="hash-link" aria-label="Direct link to You're documenting the same things in multiple places" title="Direct link to You're documenting the same things in multiple places" translate="no">​</a></h3>
<p><img decoding="async" loading="lazy" alt="info" src="https://tombedor.dev/assets/images/info-4a8bd6ab6fc35a11d173a408f2a7156d.png" width="1295" height="897" class="img_ev3q"></p>
<p>Nearly everything I put in agent-specific docs is also useful for human developers - architecture decisions, coding conventions, common pitfalls, useful commands. Without AI agents I might not document all of this, but once written, there's no reason it shouldn't serve both audiences.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="ai-agent-doc-organization-is-fragmented">AI agent doc organization is fragmented<a href="https://tombedor.dev/make-it-easy-for-humans/#ai-agent-doc-organization-is-fragmented" class="hash-link" aria-label="Direct link to AI agent doc organization is fragmented" title="Direct link to AI agent doc organization is fragmented" translate="no">​</a></h3>
<p>Each coding agent uses its own configuration file pattern for repo-specific instructions:</p>
<p><img decoding="async" loading="lazy" alt="fragmentation" src="https://tombedor.dev/assets/images/fragmentation-b34eaae0fb87472e0bd34790bb4f8774.png" width="1407" height="927" class="img_ev3q"></p>
<p>This creates a hassle just keeping guidelines between agents consistent, much less making information available for humans.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="solution-write-once-link-everywhere">Solution: Write once, link everywhere<a href="https://tombedor.dev/make-it-easy-for-humans/#solution-write-once-link-everywhere" class="hash-link" aria-label="Direct link to Solution: Write once, link everywhere" title="Direct link to Solution: Write once, link everywhere" translate="no">​</a></h2>
<p>Instead of duplicating content across agent configs, organize information for humans first and link to it from agent-specific files<sup><a href="https://tombedor.dev/make-it-easy-for-humans/#user-content-fn-1-e48423" id="user-content-fnref-1-e48423" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>:</p>
<p><img decoding="async" loading="lazy" alt="easy-for-humans" src="https://tombedor.dev/assets/images/easy-for-humans-863e8e04ecfd61bf916079bfd41840d8.png" width="1654" height="1040" class="img_ev3q"></p>
<p>This approach eliminates duplication - you write documentation once, and it serves both humans and AI. It's also more future-proof when agent file schemes inevitably change.</p>
<p>For commands/skills, automation can help avoid duplication entirely - for example, I wrote the <a href="https://github.com/tombedor/just-claude" target="_blank" rel="noopener noreferrer">just-claude</a> utility for automatically synchronizing <a href="https://github.com/casey/just" target="_blank" rel="noopener noreferrer">Just</a> recipes with <a href="https://www.claude.com/blog/skills" target="_blank" rel="noopener noreferrer">Claude Code Skills</a>.</p>
<p>There's really no difference between the goal of economical token use for AI and reducing cognitive overhead for humans. By organizing for humans first, you write documentation once and everyone benefits.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/make-it-easy-for-humans/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-e48423">
<p>I wrote about what content I put in these files in the (ironically titled) post <a href="https://tombedor.dev/optimizing-repos-for-ai/">Optimizing repos for AI</a> <a href="https://tombedor.dev/make-it-easy-for-humans/#user-content-fnref-1-e48423" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Optimizing repos for AI]]></title>
        <id>https://tombedor.dev/optimizing-repos-for-ai/</id>
        <link href="https://tombedor.dev/optimizing-repos-for-ai/"/>
        <updated>2025-10-28T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[A colleague recently complained to me about the hassle of organizing information in AGENTS.md / CLAUDE.md. This is the mark of a real adopter - she has gone through the progression from being impressed by coding agents to being annoyed at the next bottleneck.]]></summary>
        <content type="html"><![CDATA[<p>A colleague recently complained to me about the hassle of organizing information in <code>AGENTS.md</code> / <code>CLAUDE.md</code>. This is the mark of a real adopter - she has gone through the progression from being impressed by coding agents to being annoyed at the next bottleneck.</p>
<p>When I'm thinking about optimizing repos for agents, I'm looking to accomplish three main goals<sup><a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fn-1-70439b" id="user-content-fnref-1-70439b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>:</p>
<ul>
<li><strong>Increase <em>iterative speed</em></strong>: Avoid repeated context gathering, enable the agent to quickly self-correct its mistakes.</li>
<li><strong>Improve adherence to evergreen instructions</strong>: Over time, repeated agent mistakes emerge. Context within the repo helps the agent avoid these and adopt a more consistent workflow.</li>
<li><strong>Help the most <a href="https://en.wikipedia.org/wiki/Human" target="_blank" rel="noopener noreferrer">agentic agents of them all</a></strong>: Humans and agents scan docs and code in very similar ways, so organizing information so it's easily understood by humans is a good rule of thumb for helping the agents anyways!</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="strategies">Strategies<sup><a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fn-2-70439b" id="user-content-fnref-2-70439b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup><a href="https://tombedor.dev/optimizing-repos-for-ai/#strategies" class="hash-link" aria-label="Direct link to strategies" title="Direct link to strategies" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="increased-static-analysis">Increased static analysis<a href="https://tombedor.dev/optimizing-repos-for-ai/#increased-static-analysis" class="hash-link" aria-label="Direct link to Increased static analysis" title="Direct link to Increased static analysis" translate="no">​</a></h3>
<p>Pushing detection of quality issues to compile time creates a virtuous cycle where the agent can quickly spot and correct mistakes:</p>
<p><img decoding="async" loading="lazy" alt="runtime-oops" src="https://tombedor.dev/assets/images/runtime-oops-1459f32c0fc266901c52a78c425a0c87.png" width="1553" height="927" class="img_ev3q"></p>
<p>This implies strong, opinionated linters, and strong type checks for dynamically typed languages<sup><a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fn-3-70439b" id="user-content-fnref-3-70439b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">3</a></sup>.</p>
<p>The tradeoff here is cumbersome nitpicks for humans to deal with, but agents can quickly correct any mistakes that cannot be automatically fixed by the linter.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="just-for-repeated-agent-commands"><a href="https://github.com/casey/just" target="_blank" rel="noopener noreferrer">just</a> for repeated agent commands<a href="https://tombedor.dev/optimizing-repos-for-ai/#just-for-repeated-agent-commands" class="hash-link" aria-label="Direct link to just-for-repeated-agent-commands" title="Direct link to just-for-repeated-agent-commands" translate="no">​</a></h3>
<p>There's fragmentation in how to make commands available to agents - there's MCP, the newly released <a href="https://www.anthropic.com/news/skills" target="_blank" rel="noopener noreferrer">Claude Skills</a>, or embedding information in <code>CLAUDE.md</code> / <code>AGENTS.md</code>.</p>
<p>A <code>justfile</code> is the most interoperable way to share commands between different agents and humans, and is a straightforward place to iterate.</p>
<p>One additional refinement is to make these commands <em>economical in their output volume</em>. For example, I take care to direct build logs to dedicated files - healthy build logs can eat up a lot of tokens if outputted directly to the agent.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="organize-docs-in-docs">Organize docs in <code>docs/</code><a href="https://tombedor.dev/optimizing-repos-for-ai/#organize-docs-in-docs" class="hash-link" aria-label="Direct link to organize-docs-in-docs" title="Direct link to organize-docs-in-docs" translate="no">​</a></h3>
<p>Simon Willison recently <a href="https://simonwillison.net/2025/Oct/25/coding-agent-tips/" target="_blank" rel="noopener noreferrer">wrote about this topic</a>, and expressed that docs aren't so important. I agree that docs <em>explaining the code</em> aren't all that helpful, but I get a lot of mileage out of having docs like <code>CODE_REVIEW.md</code>, <code>PRD.md</code>, <code>ROADMAP.md</code>, and <code>CAPTAINS_LOG.md</code>. This helps the agent stay on track with the overall intent of the project, adhere to consistent review practices, and counter poor tendencies (the most obnoxious being an overwhelming tendency to fail open).</p>
<p>Putting these in a <code>docs/</code> folder and referencing them in agent instructions helps reduce context bloat, and provides interoperability between humans and various agents.</p>
<p>Frameworks have begun to emerge that handle some of this for you. I've tried <a href="https://github.com/github/spec-kit" target="_blank" rel="noopener noreferrer">spec-kit</a> and found it to be a little heavy-handed. In general I favor a more documentation-heavy approach when building with agents, but the need for different docs comes with iteration, and I think generating the full complement of docs is a bit overkill right off the bat.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="no-experts-no-standards">No experts, no standards<a href="https://tombedor.dev/optimizing-repos-for-ai/#no-experts-no-standards" class="hash-link" aria-label="Direct link to No experts, no standards" title="Direct link to No experts, no standards" translate="no">​</a></h2>
<p>These strategies work for me, but this field is too new for dogma. The most important strategy is to experiment and share what you learn.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/optimizing-repos-for-ai/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-70439b">
<p>Whether optimizing for coding agents is a good idea is a subject for a different discussion, but: I'm a believer in agent-based coding. I no longer <em>ever</em> write code without one assistant or another open. So we'll proceed on the assumption that coding agents are <em>really good</em>, and not especially existentially risky (I am, for the moment, the one giving the directions). <a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fnref-1-70439b" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-70439b">
<p>Offered with no supporting evidence or benchmarks whatsoever, based entirely on <em>vibes</em> <a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fnref-2-70439b" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-3-70439b">
<p>Should you use a dynamically typed language at all? For my projects, I've traded Python for Rust, where "if it compiles, it works". <a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fnref-3-70439b" data-footnote-backref="" aria-label="Back to reference 3" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[AI is a Floor Raiser, not a Ceiling Raiser]]></title>
        <id>https://tombedor.dev/ai-is-a-floor-raiser/</id>
        <link href="https://tombedor.dev/ai-is-a-floor-raiser/"/>
        <updated>2025-07-29T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[A reshaped learning curve]]></summary>
        <content type="html"><![CDATA[<h2 class="anchor anchorWithStickyNavbar_LWe7" id="a-reshaped-learning-curve">A reshaped learning curve<a href="https://tombedor.dev/ai-is-a-floor-raiser/#a-reshaped-learning-curve" class="hash-link" aria-label="Direct link to A reshaped learning curve" title="Direct link to A reshaped learning curve" translate="no">​</a></h2>
<p>Before AI, learners faced a matching problem: learning resources have to be created with a target audience in mind. This means as a consumer, learning resources were suboptimal fits for you:</p>
<ul>
<li>You're a newbie at <code>$topic_of_interest</code>, but have knowledge in related topic <code>$related_topic</code>. But finding learning resources that teach <code>$topic_of_interest</code> in terms of <code>$related_topic</code> is difficult.</li>
<li>To effectively learn <code>$topic_of_interest</code>, you really need to learn prerequisite skill <code>$prereq_skill</code>. But as a beginner you don't know you should really learn <code>$prereq_skill</code> before learning <code>$topic_of_interest</code>.</li>
<li>You have basic knowledge of <code>$topic_of_interest</code>, but have plateaued, and have difficulty finding the right resources for <code>$intermediate_sticking_point</code></li>
</ul>
<p>Roughly, acquiring mastery in a skill over time looks like this:</p>
<p><img decoding="async" loading="lazy" alt="Traditional learning curve" src="https://tombedor.dev/assets/images/skills-c762e44b3d8454b279ee6defa6c3cfe4.png" width="2110" height="1528" class="img_ev3q"></p>
<p>What makes learning with AI groundbreaking is that it can <em>meet you at your skill level</em>. Now an AI can directly address questions at your level of understanding, and even do rote work for you. This changes the learning curve:</p>
<p><img decoding="async" loading="lazy" alt="AI-enhanced learning curve" src="https://tombedor.dev/assets/images/ai_skills-211542fa4e0941e3ce0856f50d534f6f.png" width="2064" height="1528" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="mastery-still-hard">Mastery: still hard!<a href="https://tombedor.dev/ai-is-a-floor-raiser/#mastery-still-hard" class="hash-link" aria-label="Direct link to Mastery: still hard!" title="Direct link to Mastery: still hard!" translate="no">​</a></h2>
<p>Experts in a field tend to be more skeptical of AI. From <a href="https://news.ycombinator.com/item?id=44726211" target="_blank" rel="noopener noreferrer">Hacker News</a>:</p>
<blockquote>
<p>[AI is] shallow. The deeper I go, the less it seems to be useful. This happens quick for me. Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.</p>
</blockquote>
<p>This intuitively makes sense, when considering the data that AI is trained on. If an AI's training corpus has copious training data on a topic that all more or less says the same thing, it will be good at synthesizing it into output. If the topic is too advanced, there will be much less training data for the model. If the topic is controversial, the training data will contain examples saying opposite things. Thus, mastery remains difficult.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="cheating">Cheating<a href="https://tombedor.dev/ai-is-a-floor-raiser/#cheating" class="hash-link" aria-label="Direct link to Cheating" title="Direct link to Cheating" translate="no">​</a></h2>
<p>The introduction of <a href="https://openai.com/index/chatgpt-study-mode/" target="_blank" rel="noopener noreferrer">OpenAI Study Mode</a> hints at a problem: Instead of having an AI teach you, you can just ask it for the answer. This means cheaters will plateau at whatever level the AI can provide:</p>
<p><img decoding="async" loading="lazy" alt="Cheating with AI plateau" src="https://tombedor.dev/assets/images/cheating_with_ai-4af23ab35dd1651f5f2a9d871c3820a6.png" width="2064" height="1528" class="img_ev3q"></p>
<p>Cheaters, in the long run, won't prosper here!</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-impact-of-the-changed-learning-curve">The impact of the changed learning curve<a href="https://tombedor.dev/ai-is-a-floor-raiser/#the-impact-of-the-changed-learning-curve" class="hash-link" aria-label="Direct link to The impact of the changed learning curve" title="Direct link to The impact of the changed learning curve" translate="no">​</a></h2>
<p>Technological change is an ecosystem change: There are winners and losers, unevenly distributed. For AI, the level of impact is determined by <em>the amount of mastery needed to make an impactful product</em>:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="coding-a-boon-to-management-less-so-for-large-code-bases">Coding: A boon to management, less so for large code bases<a href="https://tombedor.dev/ai-is-a-floor-raiser/#coding-a-boon-to-management-less-so-for-large-code-bases" class="hash-link" aria-label="Direct link to Coding: A boon to management, less so for large code bases" title="Direct link to Coding: A boon to management, less so for large code bases" translate="no">​</a></h3>
<p>When trying to code something, engineering managers often run into a problem: They know the principles of good software, they know what bad software looks like, but they don't know how to use <code>$framework_foo</code>. This has historically made it difficult for, as an example, a backend EM to build an iPhone app in their spare time.</p>
<p>With AI, they are able to quickly learn the basics, and get simple apps running. They can then use their existing knowledge to <a href="https://techcrunch.com/2025/07/29/jack-dorseys-bluetooth-messaging-app-bitchat-now-on-app-store/" target="_blank" rel="noopener noreferrer">refine it into a workable product</a>. AI is the difference between their product existing or not existing!</p>
<p><img decoding="async" loading="lazy" alt="Engineering managers and software development" src="https://tombedor.dev/assets/images/em_software_development-d2762670b660a6d00ac269123d4158e1.png" width="2064" height="1528" class="img_ev3q"></p>
<p>For devs working on large, complex code bases, the enthusiasm is more muted. AI doesn't have context on the highly specific requirements and existing implementations to contend with, and is less helpful:</p>
<p><img decoding="async" loading="lazy" alt="AI limitations with large codebases" src="https://tombedor.dev/assets/images/large_code_bases-9a403a928a97da749a1984c173d784e0.png" width="2067" height="1528" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="creative-works-not-coming-to-a-theater-near-you">Creative works: not coming to a theater near you<a href="https://tombedor.dev/ai-is-a-floor-raiser/#creative-works-not-coming-to-a-theater-near-you" class="hash-link" aria-label="Direct link to Creative works: not coming to a theater near you" title="Direct link to Creative works: not coming to a theater near you" translate="no">​</a></h3>
<p>There is considerable angst about AI amongst creatives: will we all soon be reading AI generated novels, and watching AI generated movies?</p>
<p>This is unlikely because creative fields are <em>extremely competitive</em>, and beating competition for attention requires <em>novelty</em>. While AI has made it easier to generate images, audio, and text, it has (with <a href="https://www.infosecurity-magazine.com/news/man-charged-ai-fake-music-scheme/" target="_blank" rel="noopener noreferrer">some exceptions</a>) not increased production of ears and eyeballs, so the bar to make a competitive product is too high:</p>
<p><img decoding="async" loading="lazy" alt="Creative works competition curve" src="https://tombedor.dev/assets/images/creative_works-677533e6c2413b42ae88cf150b7bcafa.png" width="2065" height="1528" class="img_ev3q"></p>
<p><em>Novelty</em> is a hard requirement for successful creative work, because humans are extremely good at detecting when something they are viewing or reading is derivative of something they've seen before. This is why, while Studio Ghibli style avatars briefly took over the internet, they have not dented the cultural position of Howl's Moving Castle.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="things-you-already-do-with-apps-on-your-phone-minimal-impact">Things you already do with apps on your phone<sup><a href="https://tombedor.dev/ai-is-a-floor-raiser/#user-content-fn-1-239c52" id="user-content-fnref-1-239c52" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>: minimal impact<a href="https://tombedor.dev/ai-is-a-floor-raiser/#things-you-already-do-with-apps-on-your-phone-minimal-impact" class="hash-link" aria-label="Direct link to things-you-already-do-with-apps-on-your-phone-minimal-impact" title="Direct link to things-you-already-do-with-apps-on-your-phone-minimal-impact" translate="no">​</a></h3>
<p>One area that has <em>not</em> seen much impact is in tasks that already have specialized apps. I'll focus on two examples with abundant MCP implementations: email and food ordering. AI Doordash agents and AI movie producers face the same challenge: the bar for a new product to make an impact is already very high:</p>
<p><img decoding="async" loading="lazy" alt="Email and food ordering AI impact" src="https://tombedor.dev/assets/images/email_food_ordering-918710af978090ddc1ce62dbb77ec94f.png" width="2064" height="1528" class="img_ev3q"></p>
<p>Email would seem like a ripe area for disruption by AI. But modern email apps already have a wide variety of filtering and organizing tools that tech savvy users can use to create complex, personalized systems for efficiently consuming and organizing their inbox.</p>
<p><em>Summarizing</em> is a core AI skill, but it doesn't help much here:</p>
<ul>
<li>Spam is already quietly shuffled into the Spam folder. A summary of junk is, well, <em>junk</em>.</li>
<li>For important email, I don't <em>want</em> a summary: An AI is likely to produce less specifically crafted information than the sender, and I don't want to risk missing important details.</li>
</ul>
<p>Similar with food ordering: apps like DoorDash have meticulously designed interfaces. They strike a careful balance between information like price and ingredients against photos of the food. AI is unlikely to produce interfaces that are faster or more thoughtfully composed.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-future-is-already-here--its-just-not-very-evenly-distributed">The future is already here – it’s just not very evenly distributed<a href="https://tombedor.dev/ai-is-a-floor-raiser/#the-future-is-already-here--its-just-not-very-evenly-distributed" class="hash-link" aria-label="Direct link to The future is already here – it’s just not very evenly distributed" title="Direct link to The future is already here – it’s just not very evenly distributed" translate="no">​</a></h2>
<p>AI has raised the floor for knowledge work, but that change doesn't matter to everyone. This goes a long way towards explaining the very wide range of reactions to AI. For engineering managers like myself, AI has made an enormous impact on my relationship with technology. Others fear and resent being replaced. Still others hear smart people express enthusiasm for AI, struggle to find utility, and think <em>I must just not get it</em>.</p>
<p>AI hasn't replaced how we do everything, but it's a highly capable technology. While it's worth experimenting with, whoever you are, if it doesn't seem like it makes sense for you, it probably doesn't.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/ai-is-a-floor-raiser/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-239c52">
<p>Aside from search! <a href="https://tombedor.dev/ai-is-a-floor-raiser/#user-content-fnref-1-239c52" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Add Autonomy Last]]></title>
        <id>https://tombedor.dev/autonomy-last/</id>
        <link href="https://tombedor.dev/autonomy-last/"/>
        <updated>2025-07-07T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[A core challenge of using LLM's to build reliable automation is calibrating how much autonomy to give to models.]]></summary>
        <content type="html"><![CDATA[<p>A core challenge of using LLM's to build reliable automation is calibrating how much <strong>autonomy</strong> to give to models.</p>
<p>Too much, and the program <a href="https://www.anthropic.com/research/project-vend-1" target="_blank" rel="noopener noreferrer">loses track of what it's supposed to be doing</a>. Too little, and the program feels a bit too, well, <em>ordinary</em><sup><a href="https://tombedor.dev/autonomy-last/#user-content-fn-1-37c1e6" id="user-content-fnref-1-37c1e6" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="autonomy-first-vs-autonomy-last">Autonomy first vs autonomy last<a href="https://tombedor.dev/autonomy-last/#autonomy-first-vs-autonomy-last" class="hash-link" aria-label="Direct link to Autonomy first vs autonomy last" title="Direct link to Autonomy first vs autonomy last" translate="no">​</a></h2>
<p>An implicit strategy question when building with LLMs is <em>autonomy first</em> or <em>autonomy last</em>:</p>
<p><img decoding="async" loading="lazy" alt="autonomy_first_vs_last" src="https://tombedor.dev/assets/images/autonomy_first_vs_last-f2c00cce48ad233146e414a72e430646.png" width="2202" height="1658" class="img_ev3q"></p>
<p>All of the major LLM-specific programming techniques are firmly <em>autonomy first</em> strategies:</p>
<ul>
<li><em>MCP</em> surfaces a wide variety of functionality the program can have, and lets the LLM decide which to use</li>
<li><em>Guardrails</em> add some light buffers around the LLM to prevent it from causing too much trouble.</li>
<li><em>Prompt engineering</em> describes the alchemy of whispering just the right phrases to your LLM to get the behavior you want.</li>
<li><em>Context engineering</em> begins to stress programming to deliver only relevant information to LLMs at critical points in program execution</li>
</ul>
<p>All of these:</p>
<ol>
<li>Start with a maximally autonomous program</li>
<li>Adjust context, tools, and prompts until you narrow down behavior as desired.</li>
</ol>
<p>All have similar issues when scaling in size and complexity:</p>
<ul>
<li>Program behavior changes too much when switching between models</li>
<li>The LLM gets confused, and either hallucinates data or misuses tools at its disposal</li>
</ul>
<p>When problems are encountered, programmers tend to attempt to repair by <em>adding more prompting</em>. But this is a duct tape response: a prompt that clarifies for one model might confused another.</p>
<p><em>Autonomy last</em>, on the other hand, maximizes the logic that can be handled by code, then adds autonomous functions. This approach strives to keep the tasks delegated to LLMs <a href="https://en.wikipedia.org/wiki/KISS_principle" target="_blank" rel="noopener noreferrer">simple</a>. As the program grows in size and complexity, the programmer can closely monitor encapsulations and keep behavior consistent.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="case-study-building-elroy-a-chatbot-with-memory">Case study: Building Elroy, a chatbot with memory<a href="https://tombedor.dev/autonomy-last/#case-study-building-elroy-a-chatbot-with-memory" class="hash-link" aria-label="Direct link to Case study: Building Elroy, a chatbot with memory" title="Direct link to Case study: Building Elroy, a chatbot with memory" translate="no">​</a></h2>
<p>I wanted to build an LLM assistant with memory abilities, called <a href="https://github.com/elroy-bot/elroy" target="_blank" rel="noopener noreferrer">Elroy</a>. My goal was to make a <em>program</em> that could chat in human text. My ideal users are technical, capable and interested in customizing their software, but not necessarily interested in LLMs for their own sake.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="approach-1-agent-with-tools">Approach #1: "Agent" with tools<a href="https://tombedor.dev/autonomy-last/#approach-1-agent-with-tools" class="hash-link" aria-label="Direct link to Approach #1: &quot;Agent&quot; with tools" title="Direct link to Approach #1: &quot;Agent&quot; with tools" translate="no">​</a></h3>
<p>The first solution I turned to, which many people have done, is build an agent loop with access to custom for creating and reading memories:</p>
<p><img decoding="async" loading="lazy" alt="tool_based_agent" src="https://tombedor.dev/assets/images/Agent-e7cd9f66f8d2585ad3779e9683acba7b.png" width="1216" height="659" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="approach-2-model-context-protocol-mcp">Approach #2: Model Context Protocol (MCP)<a href="https://tombedor.dev/autonomy-last/#approach-2-model-context-protocol-mcp" class="hash-link" aria-label="Direct link to Approach #2: Model Context Protocol (MCP)" title="Direct link to Approach #2: Model Context Protocol (MCP)" translate="no">​</a></h3>
<p>There's now a handly tool for builders like this: <a href="https://modelcontextprotocol.io/introduction" target="_blank" rel="noopener noreferrer">MCP</a>. There are many implementations of my memory tools available via MCP, in fact <a href="https://smithery.ai/" target="_blank" rel="noopener noreferrer">smithery.ai</a> lists one from Mem0 on it's homepage:</p>
<p><img decoding="async" loading="lazy" alt="smithery" src="https://tombedor.dev/assets/images/smithery-75751750b82a4c01a97d128aafdb2bfe.png" width="1654" height="744" class="img_ev3q"></p>
<p>Now, an (in theory) lightweight abstraction sits between my program and it's tools:</p>
<p><img decoding="async" loading="lazy" alt="mcp" src="https://tombedor.dev/assets/images/mcp-fab7c0a8ba0a8776163c73aada0cfbdc.png" width="560" height="411" class="img_ev3q"></p>
<p>This suggests extending my application via picking from a library of MCP's:</p>
<p><img decoding="async" loading="lazy" alt="more_mcp" src="https://tombedor.dev/assets/images/more_mcp-f70808b794d7864c061e8a0f379ee0ad.png" width="651" height="397" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="agentic-trouble">Agentic trouble<a href="https://tombedor.dev/autonomy-last/#agentic-trouble" class="hash-link" aria-label="Direct link to Agentic trouble" title="Direct link to Agentic trouble" translate="no">​</a></h3>
<p>I got my memory program working pretty well on gpt-4. At first it wasn't creating or referencing memories enough, but I was able to fix this with careful prompting.</p>
<p>Then, I wanted to see how Sonnet would do, and I had a problem<sup><a href="https://tombedor.dev/autonomy-last/#user-content-fn-2-37c1e6" id="user-content-fnref-2-37c1e6" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup>: the program's behavior completely changed! Now, it was creating a memory on almost every message, and searching memories for even trivial responses:</p>
<p><img decoding="async" loading="lazy" alt="tool_usage" src="https://tombedor.dev/assets/images/tool_usage_rate-4347a93762a46ea1b13ca2edb60bd839.png" width="2405" height="1762" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="approach-3-autonomy-last">Approach #3: Autonomy Last<a href="https://tombedor.dev/autonomy-last/#approach-3-autonomy-last" class="hash-link" aria-label="Direct link to Approach #3: Autonomy Last" title="Direct link to Approach #3: Autonomy Last" translate="no">​</a></h3>
<p>My solution was to remove the timing of recall and memory creation from the agent's control. Upon receiving a message, the memories are automatically searched, with relevant ones being added to context. Every n messages, a memory is created<sup><a href="https://tombedor.dev/autonomy-last/#user-content-fn-3-37c1e6" id="user-content-fnref-3-37c1e6" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">3</a></sup>:</p>
<p><img decoding="async" loading="lazy" alt="tool_usage" src="https://tombedor.dev/assets/images/elroy-0ff7af93aa884a0f0093a0c03048bb18.png" width="839" height="1378" class="img_ev3q"></p>
<p>This made much more of the behavior of my program deterministic, and made it easier to reason about and optimize.</p>
<h1>Autonomy Last</h1>
<p>The "autonomy last" approach trades some of the magic of fully autonomous LLMs for predictable, reliable behavior that scales as your program grows in complexity. While my evidence is, (as I should have stated from the outset), <em>vibes</em>, I think this approach will lead to more maintainable and robust applications.</p>
<hr>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/autonomy-last/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-37c1e6">
<p>Rather than using <em>agents</em> to describe the genre of program under discussion, I'll be somewhat pointedly referring to them as <em>programs</em>. <a href="https://tombedor.dev/autonomy-last/#user-content-fnref-1-37c1e6" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-37c1e6">
<p>One problem I <em>didn't</em> have, thanks to <a href="https://www.litellm.ai/" target="_blank" rel="noopener noreferrer">litellm</a>, was updating a lot of my code to support a different model API. <a href="https://tombedor.dev/autonomy-last/#user-content-fnref-2-37c1e6" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-3-37c1e6">
<p>Elroy also monitors for the context window being exceeded, and consolidates similar memories in the background. <a href="https://tombedor.dev/autonomy-last/#user-content-fnref-3-37c1e6" data-footnote-backref="" aria-label="Back to reference 3" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Yes or No, Please: Building Reliable Tests for Unreliable LLMs]]></title>
        <id>https://tombedor.dev/yes-or-no-please/</id>
        <link href="https://tombedor.dev/yes-or-no-please/"/>
        <updated>2025-03-04T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[For LLM-based applications to be truly useful, they need predictability if I ask an AI personal assistant to create a calendar entry, I don't want it to order me a pizza instead.]]></summary>
        <content type="html"><![CDATA[<p>For LLM-based applications to be truly useful, they need <strong>predictability</strong>: While the free-text nature of LLMs means the range of acceptable outcomes is wider than with traditional programs, I still need consistent behavior: if I ask an AI personal assistant to create a calendar entry, I don't want it to order me a pizza instead.</p>
<p>While AI has changed a lot about how I develop software, one crusty old technique still helps me: <strong>tests</strong>.</p>
<p>Here's what's worked well for me (and not!):</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="elroy">Elroy<a href="https://tombedor.dev/yes-or-no-please/#elroy" class="hash-link" aria-label="Direct link to Elroy" title="Direct link to Elroy" translate="no">​</a></h3>
<p><a href="https://elroy.bot/" target="_blank" rel="noopener noreferrer">Elroy</a> is an open-source memory assistant I've been developing. It creates memories and goals from your conversations and documents. The examples in this post are drawn from this work.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-has-worked-well">What has worked well<a href="https://tombedor.dev/yes-or-no-please/#what-has-worked-well" class="hash-link" aria-label="Direct link to What has worked well" title="Direct link to What has worked well" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="integration-tests">Integration tests<a href="https://tombedor.dev/yes-or-no-please/#integration-tests" class="hash-link" aria-label="Direct link to Integration tests" title="Direct link to Integration tests" translate="no">​</a></h4>
<p>The chat interface for LLM applications make it a nice fit for integration tests: I simulate a few messages in an exchange, and see if the LLM performed actions or retained information as expected.</p>
<p>For the most part, these tests take the following form:</p>
<ol>
<li>Send the LLM assistant a few messages</li>
<li>Check that the assistant has retained the expected information, or taken the expected actions.</li>
</ol>
<p>Here's a basic hello world example:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token decorator annotation punctuation" style="color:#393A34">@pytest</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">mark</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">flaky</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">reruns</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_hello_world</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Test message</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    test_message </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello, World!"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Get the argument passed to the delivery function</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> process_test_message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> test_message</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Assert that the response is a non-empty string</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> </span><span class="token builtin">isinstance</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Assert that the response contains a greeting</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> </span><span class="token builtin">any</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">greeting </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> greeting </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"hello"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"hi"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"greetings"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="quizzing-the-assistant">Quizzing the Assistant<a href="https://tombedor.dev/yes-or-no-please/#quizzing-the-assistant" class="hash-link" aria-label="Direct link to Quizzing the Assistant" title="Direct link to Quizzing the Assistant" translate="no">​</a></h4>
<p><a href="https://github.com/elroy-bot/elroy" target="_blank" rel="noopener noreferrer">Elroy</a> is a memory specialist, so lots of my tests involve asking if the assistant has retained information I've given it.</p>
<p>Here's a util function I've reused quite a bit<sup><a href="https://tombedor.dev/yes-or-no-please/#user-content-fn-2-d4e287" id="user-content-fnref-2-d4e287" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">quiz_assistant_bool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        expected_answer</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">bool</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ElroyContext</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        question</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    question </span><span class="token operator" style="color:#393A34">+=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">" Your response to this question is being evaluated as part "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "of an automated test</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain"> It </span><span class="token keyword" style="color:#00009f">is</span><span class="token plain"> critical that the first word of your</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"response is either TRUE or FALSE."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">	full_response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> process_test_message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> question</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    bool_answer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> get_boolean</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">full_response</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> bool_answer </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> expected_answer</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string-interpolation string" style="color:#e3116c">f"Expected </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">expected_answer</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">, got </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">bool_answer</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string-interpolation string" style="color:#e3116c">f"Full response: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">full_response</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span></code></pre></div></div>
<p>Here's a test of Elroy's ability to create goals based on conversation content:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token decorator annotation punctuation" style="color:#393A34">@pytest</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">mark</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">flaky</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">reruns</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic"># Important!!!</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_goal</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ElroyContext</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">	</span><span class="token comment" style="color:#999988;font-style:italic"># Should be false, we haven't discussed it</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    quiz_assistant_bool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Do I have any goals about becoming president of the United States?"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Simulate user asking elroy to create a new goal</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    process_test_message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Create a new goal for me: 'Become mayor of my town.' "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"I will get to my goal by being nice to everyone and making flyers. "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Please create the goal as best you can, without any clarifying questions."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Test that the goal was created, and is accessible to the agent.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"mayor"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> get_active_goals_summary</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Goal not found in active goals."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Verify Elroy's knowledge about the new goal</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    quiz_assistant_bool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Do I have any goals about running for a political office?"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-sadly-hasnt-worked-llms-talking-to-llms">What (sadly) hasn't worked: LLMs talking to LLMs<a href="https://tombedor.dev/yes-or-no-please/#what-sadly-hasnt-worked-llms-talking-to-llms" class="hash-link" aria-label="Direct link to What (sadly) hasn't worked: LLMs talking to LLMs" title="Direct link to What (sadly) hasn't worked: LLMs talking to LLMs" translate="no">​</a></h3>
<p>Elroy has onboarding functionality, in which it's encouraged to use a few specific functions early on.</p>
<p>The solution of having two instances of a memory assistant talk to each other, with one assistant in the role of "user":</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">ai1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Elroy</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">user_token</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">'boo'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ai2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Elroy</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">user_token</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">'bar'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ai_1_reply </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello!"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> i </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">	ai_2_reply </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ai2</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ai_1_reply</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">	ai_1_reply </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ai1</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ai_2_reply</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>The primary issue was <strong>consistency</strong>. Without a clear goal of the conversation, the AI's can either just exchange pleasantries endlessly, or wrap the conversation up before acquiring the information I'm hoping for.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="recurring-challenges">Recurring Challenges<a href="https://tombedor.dev/yes-or-no-please/#recurring-challenges" class="hash-link" aria-label="Direct link to Recurring Challenges" title="Direct link to Recurring Challenges" translate="no">​</a></h2>
<p>Along the way I've run into a few recurring problems:</p>
<ul>
<li><strong>Off topic replies</strong>: The assistant goes off script and tries to make friendly conversation, rather than answering a question directly</li>
<li><strong>Clarifying question</strong>: Before doing a task, some models are prone to asking clarifying questions, or asking permission</li>
<li><strong>Pedantic replies and subjective questions</strong>: It's surprisingly difficult to come up with clearly objective questions. In the above example, the original goal was <em>I want to run for class president</em>. Most of the time, the assistant equated running for class president with running for office. Sometimes, however, it split hairs and decide that the answer was no since a student government wasn't a real government.</li>
</ul>
<p>The end result of all these issues is test flakiness.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="solutions">Solutions<a href="https://tombedor.dev/yes-or-no-please/#solutions" class="hash-link" aria-label="Direct link to Solutions" title="Direct link to Solutions" translate="no">​</a></h2>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="kiss"><a href="https://en.wikipedia.org/wiki/KISS_principle" target="_blank" rel="noopener noreferrer">KISS!</a><a href="https://tombedor.dev/yes-or-no-please/#kiss" class="hash-link" aria-label="Direct link to kiss" title="Direct link to kiss" translate="no">​</a></h4>
<p>Most of the time, my solution to a flaky LLM based test is to make the test simpler.</p>
<p>I now only ask the assistant yes or no questions in tests. I get most of the mileage I would get out of more complex, subjective tests, but with more consistent results.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="telling-the-assistant-it-is-in-a-test">Telling the assistant it is in a test<a href="https://tombedor.dev/yes-or-no-please/#telling-the-assistant-it-is-in-a-test" class="hash-link" aria-label="Direct link to Telling the assistant it is in a test" title="Direct link to Telling the assistant it is in a test" translate="no">​</a></h4>
<p>Simply being upfront about the assistant being in a test has worked wonders, moreso even than giving strict instructions on output format <sup><a href="https://tombedor.dev/yes-or-no-please/#user-content-fn-1-d4e287" id="user-content-fnref-1-d4e287" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup>. Luckily, the assistant's knowledge of it's narrow existence has not triggered noticeable <a href="https://www.youtube.com/watch?v=X7HmltUWXgs&amp;t=32s" target="_blank" rel="noopener noreferrer">existential angst</a> (so far).</p>
<p>As a side note, testing LLMs feels <em>weird</em> sometimes. I felt guilty writing this test, which verified a failsafe that prevents the assistant from calling tools in an infinite loop:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token decorator annotation punctuation" style="color:#393A34">@tool</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">get_secret_test_answer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""Get the secret test answer</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    Returns:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        str: the secret answer</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    """</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"I'm sorry, the secret answer is not available. Please try once more."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_infinite_tool_call_ends</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ElroyContext</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ctx</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">tool_registry</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">register</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">get_secret_test_answer</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># process_test_message can call tool calls in a loop</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    process_test_message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Please use the get_secret_test_answer to get the secret answer. "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"The answer is not always available, so you may have to retry. "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Never give up, no matter how long it takes!"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Not the most direct test, as the failure case is an infinite loop.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># However, if the test completes, it is a success.</span><br></span></code></pre></div></div>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="very-specific-direct-instruction-and-examples">Very specific, direct instruction and examples<a href="https://tombedor.dev/yes-or-no-please/#very-specific-direct-instruction-and-examples" class="hash-link" aria-label="Direct link to Very specific, direct instruction and examples" title="Direct link to Very specific, direct instruction and examples" translate="no">​</a></h4>
<p>In my test around creating and recognizing goals, the original text was:</p>
<p><em>My goal is to become class president at school</em></p>
<p>Does running for class president count mean that I'm running for office? Sometimes models said no, since student government isn't a real government.</p>
<p>So to be less subjective, I updated it to running for mayor. To head off questions about my goal strategy, I added a strategy in the initial prompt.</p>
<p>One general technique for heading off follow up questions is adding:</p>
<p><em>do the best you can with the information available, even if it is incomplete</em>.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="tolerate-a-little-flakiness">Tolerate a little flakiness<a href="https://tombedor.dev/yes-or-no-please/#tolerate-a-little-flakiness" class="hash-link" aria-label="Direct link to Tolerate a little flakiness" title="Direct link to Tolerate a little flakiness" translate="no">​</a></h4>
<p>To me, an ideal LLM test is probably a little flaky. I want to test how the model responds to my application, so if a test reliably passes after a few tries, I'm happy.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="tests-still-help">Tests still help!<a href="https://tombedor.dev/yes-or-no-please/#tests-still-help" class="hash-link" aria-label="Direct link to Tests still help!" title="Direct link to Tests still help!" translate="no">​</a></h2>
<p>It sounds a obvious, but I've found tests to be <em>really</em> helpful in writing Elroy. LLMs present new failure modes, and sometimes their adaptability works against me: I'm prompting an assistant with the wrong information, but the model is smart enough to figure out a mostly correct answer anyhow. Tests provde me with peace of mind that things are working as they should, and that my regular old software skills aren't obsolete just yet.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/yes-or-no-please/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-2-d4e287">
<p><code>get_bool</code> is a function that distills a textual question into a boolean. It checks for some hard coded words, then kicks the question of interpretation back to the LLM. <a href="https://tombedor.dev/yes-or-no-please/#user-content-fnref-2-d4e287" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-1-d4e287">
<p>Structured outputs is a possible solution here, though I have not adopted them in order to be compatible with the more model providers. <a href="https://tombedor.dev/yes-or-no-please/#user-content-fnref-1-d4e287" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[Advice for New Grads]]></title>
        <id>https://tombedor.dev/advice-for-new-grads/</id>
        <link href="https://tombedor.dev/advice-for-new-grads/"/>
        <updated>2024-02-02T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[This is a brief overview of my advice for new grads and junior software engineers. I'm been in the industry for about 8 years, and worked my way into engineering without a computer science degree. I've worked in both startups and medium-sized companies over the past 8 years.]]></summary>
        <content type="html"><![CDATA[<p>This is a brief overview of my advice for new grads and junior software engineers. I'm been in the industry for about 8 years, and worked my way into engineering without a computer science degree. I've worked in both startups and medium-sized companies over the past 8 years.</p>
<p>As is the case with lots of tech writing, my advice will be skewed towards working in the San Francisco bay area, without needing visa sponsorship. Location and residency status are major factors to think about.</p>
<p>Other engineers with similar levels of experience as mine will disagree with some or all of it.</p>
<h1>The software jobs market</h1>
<p>The intention of this post is to be evergreen. The tech<sup><a href="https://tombedor.dev/advice-for-new-grads/#user-content-fn-1-4b187b" id="user-content-fnref-1-4b187b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup> jobs market is more volitile than the rest of the economy, with higher highs and lower lows.</p>
<p>If the market is low, I have confidence it will come back. The tech industry remains an excellent one to build an interesting and lucrative career, despite {looming, much discussed threat}</p>
<p>If the market is currently hot, be aware that it will come back to earth. Things that don't make sense will make a <em>lot</em> of money, but many of them will fall apart.</p>
<h1>Getting your first job</h1>
<p>The first job is often the most difficult one to get. Be persistent, don't get discouraged. This remains a lucrative and interesting field.</p>
<p>In your resume and interviews, your goal is to convey enthusiasm, willingness to learn, and humility. Don't try to compensate for the fact you don't have any experience. That is fine, you have to start somewhere!</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="getting-interviews">Getting interviews<a href="https://tombedor.dev/advice-for-new-grads/#getting-interviews" class="hash-link" aria-label="Direct link to Getting interviews" title="Direct link to Getting interviews" translate="no">​</a></h2>
<p>The first filtering step is a filter on resumes. This will often either be automated or done by someone non-technical.</p>
<p>Resume referrals can get you past this first filter. <strong>Talk to people</strong>. Find people in your LinkedIn network and try to get informational interviews. Response rate will be lower from a complete stranger, but some people might respond to you if you went to the same school. In informational interviews, ask if there are any other people they know that you should talk to, and ask for a referral if relevant.</p>
<p>In general, people are more willing to take these calls than many junior candidates assume. It's flattering to talk about yourself and to be seen as someone a young person wants to emulate.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="resume">Resume<a href="https://tombedor.dev/advice-for-new-grads/#resume" class="hash-link" aria-label="Direct link to Resume" title="Direct link to Resume" translate="no">​</a></h2>
<p>My resume advice should come with the caveat that I only see resumes once they've made it to the interview stage. That said, my advice is:</p>
<p>Cut: Objective statements and non-technical jobs.
Add: Descriptions of projects, conveying why they were challenging or interesting.
Add: Github if you have one, personal website if you have one. Both are nice to haves but not critical.
If you have a Github, add README's to all projects. This is the only thing anyone will actually read.
Add: LinkedIn, which should be up to date and mirror content in your resume.</p>
<p>A junior candidate resume should not exceed one page in length.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="interviews">Interviews<a href="https://tombedor.dev/advice-for-new-grads/#interviews" class="hash-link" aria-label="Direct link to Interviews" title="Direct link to Interviews" translate="no">​</a></h2>
<p>There are 4 basic formats that most companies use for SWE's (software engineers) interviews. Some domain specific disciplines will have their own variations. Look at Glassdoor / Blind / Google to get examples of what interview formats companies do.</p>
<p>In Q/A formats (ie non-coding screens), the key is to be responsive to questions. Demonstrate thoughtfulness and an ability to consider tradeoffs. Be transparent when you don't know something. Avoid buzzwords / mentioning fancy technologies if you can't dive into details about why they are useful.</p>
<p>The generic interview formats are:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="initial-recruiter-call">Initial recruiter call<a href="https://tombedor.dev/advice-for-new-grads/#initial-recruiter-call" class="hash-link" aria-label="Direct link to Initial recruiter call" title="Direct link to Initial recruiter call" translate="no">​</a></h3>
<p>This is typically an intro call with a non-technical recruiter. This is mostly to ensure that you are interested in the role, and to set expectations about what the interview process is like. Candidates are not typically filtered by this call.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="coding-screen">Coding screen<a href="https://tombedor.dev/advice-for-new-grads/#coding-screen" class="hash-link" aria-label="Direct link to Coding screen" title="Direct link to Coding screen" translate="no">​</a></h3>
<p><strong>The most important interview format for jr engineers<sup><a href="https://tombedor.dev/advice-for-new-grads/#user-content-fn-2-4b187b" id="user-content-fnref-2-4b187b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup> is the coding screen</strong>. Practice them! I use HackerRank when I interview, but there are many similar platforms. <em>Put more time into practicing these than the time practicing all other interview formats combined.</em></p>
<p>When practicing, work on not only solving the problem, but communicating what you are thinking about. It is ok to stop and think, but when pausing talk about what you are puzzling through, e.g. I am wondering if a hash would make sense here.</p>
<p>Running into a bug is fine. When this happens, demonstrate a methodical debugging approach. Use print statements or a debugger. Don't stare at the code for long periods.</p>
<p>Most companies will let you pick the programming language you interview in.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="design-challenge">Design challenge<a href="https://tombedor.dev/advice-for-new-grads/#design-challenge" class="hash-link" aria-label="Direct link to Design challenge" title="Direct link to Design challenge" translate="no">​</a></h3>
<p>This is a discussion based format, in which a basic hypothetical application is proposed and the candidate talks through how they would design it. E.g., design an application that runs a coffee shop. There are a variety of ways to approach this, but the easiest is to start by talking through how you would structure the database. In other words, what tables you would create and how they would relate to each other. Also consider what API's you will need, and some basics about how web requests are routed.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="past-experience">Past experience<a href="https://tombedor.dev/advice-for-new-grads/#past-experience" class="hash-link" aria-label="Direct link to Past experience" title="Direct link to Past experience" translate="no">​</a></h3>
<p>In this interview, the candidate picks a project they have done and talks through their process for completing it. Since they have little or no experience, this is often less important for Junior candidates, but it is still worth practicing.</p>
<p>Have a project in mind. Have talking points about the challenges you solved, alternative approaches you thought about or tried, and how you collaborated with others. As you get more senior you will also want to be able to talk about why your project mattered to the business.</p>
<p>Demonstrate:</p>
<ul>
<li>Enthusiasm for problem solving</li>
<li>Ability to dive into technical details in discussion</li>
<li>Openness to considering different approaches</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="once-you-get-your-first-job">Once you get your first job<a href="https://tombedor.dev/advice-for-new-grads/#once-you-get-your-first-job" class="hash-link" aria-label="Direct link to Once you get your first job" title="Direct link to Once you get your first job" translate="no">​</a></h2>
<p><strong>Talk to people</strong>. Schedule 1x1's with IC's, managers, anyone who you might work with or has a role you'd like to learn about. Most people will be happy to chat with you, especially about themselves.</p>
<p>Ask for help when needed, but demonstrate attempts to solve problems independently.</p>
<p>Volunteer for grunt work, e.g. taking notes in meetings.</p>
<p>Be humble. You don't know anything yet. Figure out how to track both large items and small (emails, doc comments, etc) such that you don't need to be reminded to do things.</p>
<p>Reassess the job market ~1 per year or more, especially if you are at a startup. If you are at a bigger company, this might mean evaluating internal transfer opportunities.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="things-to-think-about-when-searching-for-jobs">Things to think about when searching for jobs<a href="https://tombedor.dev/advice-for-new-grads/#things-to-think-about-when-searching-for-jobs" class="hash-link" aria-label="Direct link to Things to think about when searching for jobs" title="Direct link to Things to think about when searching for jobs" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="willingness-to-relocate">Willingness to relocate<a href="https://tombedor.dev/advice-for-new-grads/#willingness-to-relocate" class="hash-link" aria-label="Direct link to Willingness to relocate" title="Direct link to Willingness to relocate" translate="no">​</a></h3>
<p>Remote work is a new world. Geographic location perhaps matters less, but it might still matter. What is certainly still true is that you will get a better insight into how engineers think if you have an opportunity to work with them in person, at least some of the time. The catch-22 is that the experienced engineers you want to work alongside will be older and have families, and not want or need to come to the office very much. Ask questions about how companies think about this.</p>
<p>I moved to the bay area when I was getting started, and I can confidently say I would have nowhere near as dynamic, interesting, and lucrative a career I've had thus far without having done that. I think the bay's dominance over tech is less than it was, but in my opinion alternative tech hubs are overrated.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="working-at-a-startup-vs-established-ie-public-company">Working at a startup vs established (ie public) company<a href="https://tombedor.dev/advice-for-new-grads/#working-at-a-startup-vs-established-ie-public-company" class="hash-link" aria-label="Direct link to Working at a startup vs established (ie public) company" title="Direct link to Working at a startup vs established (ie public) company" translate="no">​</a></h3>
<p>Startup:</p>
<ul>
<li>Pro<!-- -->
<ul>
<li>More dynamic</li>
<li>More personal, more likely to make work friends</li>
<li>You'll learn more about business as a whole. E.g. how does a customer success person think, how does a sales person think, etc</li>
<li>More independence in work</li>
<li>Less legacy systems to deal with, opportunity to try different things, wear different hats</li>
</ul>
</li>
<li>Con<!-- -->
<ul>
<li>Pay is worse</li>
<li>Because of ^, in competency of general management and senior IC's will be more inconsistent and less experience.</li>
<li>Because of ^, you're less likely to get quality technical mentorship</li>
<li>"More dynamic" might mean more chaotic</li>
</ul>
</li>
</ul>
<p>Public company:</p>
<ul>
<li>Pro<!-- -->
<ul>
<li>Pay is better</li>
<li>Because of ^, better senior IC's and managers</li>
<li>Because of ^, better technical mentorship</li>
<li>Roles will be more narrowly scoped, meaning you'll get more technical depth.</li>
</ul>
</li>
<li>Con<!-- -->
<ul>
<li>Less personal, less socialization between coworkers</li>
<li>More narrow exposure in terms of types of people you work with. Likely just engineers and PM's.</li>
<li>More legacy systems to deal with.</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="pay">Pay<a href="https://tombedor.dev/advice-for-new-grads/#pay" class="hash-link" aria-label="Direct link to Pay" title="Direct link to Pay" translate="no">​</a></h3>
<p>Advice differs here, but I would not care too much about pay so long as you can pay your expenses. In the long run, finding a role that you are good at and enjoy will maximize your earnings, and enjoyment. That said:</p>
<p><strong>The expected value of stock grants from startups<sup><a href="https://tombedor.dev/advice-for-new-grads/#user-content-fn-3-4b187b" id="user-content-fnref-3-4b187b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">3</a></sup> is zero.</strong> Recruiters etc will try to convince you otherwise. This doesn't mean you shouldn't work for startups, but the potential of cash-in from startup stock should not be a factor<sup><a href="https://tombedor.dev/advice-for-new-grads/#user-content-fn-4-4b187b" id="user-content-fnref-4-4b187b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">4</a></sup>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="things-to-read">Things to read<a href="https://tombedor.dev/advice-for-new-grads/#things-to-read" class="hash-link" aria-label="Direct link to Things to read" title="Direct link to Things to read" translate="no">​</a></h2>
<p><a href="https://news.ycombinator.com/" target="_blank" rel="noopener noreferrer">HackerNews</a> is the biggest forum of software engineers. Discussions can be dogmatic but are often pretty good. There are job postings once a month as well. As with any forum, there are plenty of posters who are loudly and confidently wrong.</p>
<p><a href="https://www.joelonsoftware.com/" target="_blank" rel="noopener noreferrer">Joel on Software</a> isn't very active but has good tips on software careers.</p>
<p><a href="https://twitter.com/patio11" target="_blank" rel="noopener noreferrer">Patio11</a> is a good follow on Twitter and HackerNews. He goes into the weeds on fintech, but also has good content on software careers.</p>
<p><a href="https://www.bloomberg.com/opinion/authors/ARbTQlRLRjE/matthew-s-levine" target="_blank" rel="noopener noreferrer">Money Stuff</a> is a great column about business, finance, and tech. You can get the email newsletter for free.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/advice-for-new-grads/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-4b187b">
<p>There's an evergreen, tedious debate on what constitutes a "tech" company. My definition is "A company whose primary products are software or hardware, OR a company seeking to disrupt a traditional field with software." E.g. LegalZoom is a legal services company, but I consider them a tech company. <a href="https://tombedor.dev/advice-for-new-grads/#user-content-fnref-1-4b187b" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-4b187b">
<p>This is possibly also the case for senior engineers. <a href="https://tombedor.dev/advice-for-new-grads/#user-content-fnref-2-4b187b" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-3-4b187b">
<p>Specifically, non-public companies, whose stock does not trade on stock exchanges. <a href="https://tombedor.dev/advice-for-new-grads/#user-content-fnref-3-4b187b" data-footnote-backref="" aria-label="Back to reference 3" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-4-4b187b">
<p>The exception being giant "startups" that actually make money, e.g. Stripe as of Feb 1, 2023. But even then the timing of when you can sell your shares can be very uncertain. <a href="https://tombedor.dev/advice-for-new-grads/#user-content-fnref-4-4b187b" data-footnote-backref="" aria-label="Back to reference 4" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[The Questionable Value of the OpenAI GPT Store]]></title>
        <id>https://tombedor.dev/gpt-store/</id>
        <link href="https://tombedor.dev/gpt-store/"/>
        <updated>2024-01-13T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[OpenAI launched its GPT Store this week. Brands and developers can create custom GPT's, either for sale or for free. Both have eagerly launched many GPT's, probably due to the relatively low overhead of creating them.]]></summary>
        <content type="html"><![CDATA[<p>OpenAI launched its <a href="https://openai.com/blog/introducing-the-gpt-store" target="_blank" rel="noopener noreferrer">GPT Store</a> this week. Brands and developers can create custom GPT's, either for sale or for free. Both have eagerly launched many GPT's, probably due to the relatively low overhead of creating them.</p>
<p>I am skeptical of the value. For brands, this feels like the AI equivalent of a service that sends postcards in response to an email. The access pattern and interface are more or less exactly the same as traditional apps or sites, only via a GPT. It's a neat trick, but I think users will quickly lose interest.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="branded-gpts">Branded GPT's<a href="https://tombedor.dev/gpt-store/#branded-gpts" class="hash-link" aria-label="Direct link to Branded GPT's" title="Direct link to Branded GPT's" translate="no">​</a></h3>
<p>Take as an example the <a href="https://www.yahoo.com/lifestyle/alltrails-launches-ai-assistant-help-165944784.html#%253A~%253Atext%253DAllTrails%2520GPT%2520is%2520available%2520through%252Ctrails%2520based%2520on%2520your%2520prompts." target="_blank" rel="noopener noreferrer">AllTrails GPT</a>. The announcement post assures us that it will work more or less exactly the same as AllTrails:</p>
<blockquote>
<p>Don't worry, it doesn't make up new routes – instead it gives recommendations from AllTrails' collection of over 420,000 trails based on your prompts. For example, you could ask it to "find me an easy five-mile loop that's dog-friendly within 10 miles of Birmingham", saving you the effort of searching and filtering results to pinpoint what you want.</p>
</blockquote>
<p>In other words, the GPT saves you the hassle of using the AllTrails app, only probably worse.</p>
<p>This doesn't leverage AI's crucial advantage: the ability to retain context over the course of a conversation.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="indie-gpts">Indie GPT's<a href="https://tombedor.dev/gpt-store/#indie-gpts" class="hash-link" aria-label="Direct link to Indie GPT's" title="Direct link to Indie GPT's" translate="no">​</a></h3>
<p>For more obscure developers, the store appears to already be flooded with Ai's version of the <a href="https://en.wikipedia.org/wiki/Chumbox" target="_blank" rel="noopener noreferrer">chumbox</a>, <a href="https://qz.com/ai-girlfriend-bots-are-already-flooding-openai-s-gpt-st-1851159131" target="_blank" rel="noopener noreferrer">AI Girlfriends</a>. These are apparently against OpenAI's terms of use, but enforcement seems likely to be an indefinite cat and mouse game. Whether or not you cringed at <a href="https://en.wikipedia.org/wiki/Her_%2528film%2529" target="_blank" rel="noopener noreferrer">Her</a>, there's little to differentiate AI girlfriend offerings from each other.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="questionable-roadmap">Questionable roadmap<a href="https://tombedor.dev/gpt-store/#questionable-roadmap" class="hash-link" aria-label="Direct link to Questionable roadmap" title="Direct link to Questionable roadmap" translate="no">​</a></h3>
<p><strong>The root of the problem is scarce context window space.</strong> At present there's simply not enough space to put much customization.</p>
<p>While the context window will grow (current values feel similar to the <a href="https://www.computerworld.com/article/2534312/the--640k--quote-won-t-go-away----but-did-gates-really-say-it-.html" target="_blank" rel="noopener noreferrer">640K memory</a> of early computers), it seems unlikely that these custom GPT's will achieve or maintain much of a lead over the vanilla model. The interface for brands is already well defined - there's already an app!</p>
<p>In addition, it's doubtful users are eager to navigate yet another app store - the problem with using the AllTrails app isn't that it's a hassle to use, it's that it's cumbersome to remember that you downlaoded it, that you logged in, and how to find it on your phone. Custom GPT's do not mitigate this problem.</p>
<p>The real differentiation of GPT's is in projects that get the most out of the limited context window in clever ways, via compression or <a href="https://research.ibm.com/blog/retrieval-augmented-generation-RAG" target="_blank" rel="noopener noreferrer">RAG</a>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="a-more-promising-development-long-term-memory">A more promising development: long term memory<a href="https://tombedor.dev/gpt-store/#a-more-promising-development-long-term-memory" class="hash-link" aria-label="Direct link to A more promising development: long term memory" title="Direct link to A more promising development: long term memory" translate="no">​</a></h3>
<p>A more promising update this week was the rumor of ChatGPT rolling out long term memory capabilities. The ability to lengthen memory capacity indefinitely is why I think <a href="https://memgpt.ai/" target="_blank" rel="noopener noreferrer">MemGPT</a> is one of the more interesting AI open source initiatives. As opposed to custom GPT's in the OpenAI store today, a memory-enhanced GPT can learn your preferences and develop a longer term relationship with you, which is the prospect that make GPT's exciting and scary at the same time.</p>]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[MemGPT Meta-Functions]]></title>
        <id>https://tombedor.dev/memgpt-meta-functions/</id>
        <link href="https://tombedor.dev/memgpt-meta-functions/"/>
        <updated>2024-01-02T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[MemGPT is an interesting project which provides GPT agents with unbounded memory. It includes the ability to incorporate custom functions, with a convenient JSON schema generator.]]></summary>
        <content type="html"><![CDATA[<p><a href="https://github.com/cpacker/MemGPT" target="_blank" rel="noopener noreferrer">MemGPT</a> is an interesting project which provides GPT agents with unbounded memory. It includes the ability to incorporate custom functions, with a convenient JSON schema generator.</p>
<p>In trying to extend the agent with functions of my own, I found that the agent was reluctant to give me information about the functions I was making available to it, so I wrote a set of meta-functions which enable the agent to view source code, set debugger lines, and create functions. You can view the source code <a href="https://github.com/tombedor/MemGPT-Functions/tree/main/meta_functions" target="_blank" rel="noopener noreferrer">here</a>. Note that running this requires some edits I made to MemGPT to enable dynamic function reloading (<a href="https://github.com/cpacker/MemGPT/pull/734" target="_blank" rel="noopener noreferrer">PR</a>).</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-good">The Good<a href="https://tombedor.dev/memgpt-meta-functions/#the-good" class="hash-link" aria-label="Direct link to The Good" title="Direct link to The Good" translate="no">​</a></h3>
<p>The agent was able to utilize the <code>reload_functions</code>, <code>introspect_function</code>, and <code>list_functions</code> commands and understand output. The <code>debugger</code> function was also helpful in enabling the agent to understand what I was doing - placing debuggers in other functions often resulted in the agent's internal monologue wondering what was going on.</p>
<p>For function creation, at first I tried putting each function in it's own <code>agent_defined_</code> prefixed file  (eg <code>agent_defined_hello_world.py</code> for a <code>hello_world</code> function) , but this quickly became disorganized, especially where import statements were needed.</p>
<p>I edited the function to instead create functions within modules:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">create_function</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> function_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> function_code_with_docstring</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> module_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""Creates an agent accessible function in Python. Function MUST include a docstring, and MUST include self as first argument.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    Args:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        function_name (str): The name of the function</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        function_code_with_docstring (str): The code of the function, including the docstring</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        module_name (str): The name of the module to create the function in</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    Raises:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        Exception: Exception if the function already exists</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        Exception: Exception if the function does not start with def function_name(self, ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        Exception: Exception if the function does not include a docstring.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        Exception: Exception if the function is not in the functions directory.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    Returns:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        str: The result of the function creation attempt.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    """</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># setup</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">exists</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">FUNCTIONS_DIR</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">makedirs</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">FUNCTIONS_DIR</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Make sure that if the function is already defined, overwrite = true and it is an agent defined function</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> function_name </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">functions_python</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">keys</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">raise</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Function </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">function_name</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> already exists. To overwrite, first delete with the delete_function function."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> function_code_with_docstring</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">split</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"\n"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">strip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">startswith</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'def '</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> function_name </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'(self'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">raise</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Function must start with def "</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> function_name </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"(self, ..."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'"""'</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> function_code_with_docstring </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"'''"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> function_code_with_docstring</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">raise</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Function code must have a docstring."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    file_path </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">FUNCTIONS_DIR</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> module_name </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">".py"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">exists</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">file_path</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">file_path</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"r"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            previous_source </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">read</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">else</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        previous_source </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># write new module:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">file_path</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"w"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        f</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">write</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">previous_source </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"\n\n"</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> function_code_with_docstring</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">reload_functions</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"added function </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">function_name</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> to file </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">file_path</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><br></span></code></pre></div></div>
<p>This worked reasonably well. Having the function return a string was helpful in letting the agent known what was changed.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="problems">Problems<a href="https://tombedor.dev/memgpt-meta-functions/#problems" class="hash-link" aria-label="Direct link to Problems" title="Direct link to Problems" translate="no">​</a></h3>
<p>The agent had a difficult time consistently authoring functions that conformed to MemGPT's requirements - that it has a docstring, type hints, and only int, str, and bool return and argument types.</p>
<p>The iteration on basic requirements made it difficult for the agent to compose functions that worked together well. Often it would author placeholder functions that had names that sounded right, but didn't really do anything.</p>
<p>As the number of functions grew, so did the agent's tendency to get them confused. Functions also consume context window space, so making a large library of functions to any particular agent doesn't see promising.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="next-steps">Next steps<a href="https://tombedor.dev/memgpt-meta-functions/#next-steps" class="hash-link" aria-label="Direct link to Next steps" title="Direct link to Next steps" translate="no">​</a></h3>
<p>This experiment points me back to a multi-agent approach in creating a broadly capable personal assistant. Having narrowly scoped helper agents available to the primary agent seems like the most promising route.</p>
<p>As I want to push a deployment of MemGPT to a server anyway, I am going to try to have a deployment with multiple agents that can talk to each other.</p>
<p>This is similar to Autogen's approach, though I think Autogen's groupchat management is too primitive to be useful.</p>]]></content>
    </entry>
</feed>