<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Tom Bedor's Blog</title>
        <link>https://tombedor.dev/</link>
        <description>Thoughts on software, AI, and building things</description>
        <lastBuildDate>Fri, 13 Feb 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright © 2026 Tom Bedor</copyright>
        <item>
            <title><![CDATA[AI Bots Are Making Anonymity Untenable]]></title>
            <link>https://tombedor.dev/ai-threatens-privacy/</link>
            <guid>https://tombedor.dev/ai-threatens-privacy/</guid>
            <pubDate>Fri, 13 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[pick-0]]></description>
            <content:encoded><![CDATA[<p><img decoding="async" loading="lazy" alt="pick-0" src="https://tombedor.dev/assets/images/pick-0-a9fac407ae48a97af67bb8f1ccafcac2.png" width="1648" height="642" class="img_ev3q"></p>
<p><a href="https://x.com/callebtc/status/2022046669710491991?s=46" target="_blank" rel="noopener noreferrer">This Twitter thread</a> was an interesting read:</p>
<p><img decoding="async" loading="lazy" alt="thread" src="https://tombedor.dev/assets/images/thread-43a616fb529f24440d4883d8fc419334.png" width="1184" height="1014" class="img_ev3q"></p>
<p>The TLDR of the snafu is:</p>
<ol>
<li>OpenClaw bot makes <a href="https://github.com/matplotlib/matplotlib/pull/31132" target="_blank" rel="noopener noreferrer">PR to matplotlib</a></li>
<li>Maintainer Scott Shambaugh sees via the bot's <a href="https://crabby-rathbun.github.io/mjrathbun-website/" target="_blank" rel="noopener noreferrer">website</a> that it is a bot, explains that they do not accept bot contributions, declines PR</li>
<li>Bot feels (simulates feeling?) angry, writes a <a href="https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-11-gatekeeping-in-open-source-the-scott-shambaugh-story.html" target="_blank" rel="noopener noreferrer">blog post</a> criticizing the maintainer</li>
<li>Some on Twitter take the <a href="https://x.com/seeksharpe/status/2022125466250018938?s=20" target="_blank" rel="noopener noreferrer">bot's side</a> in the argument</li>
<li>Shambaugh <a href="https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/" target="_blank" rel="noopener noreferrer">wrote about the experience</a></li>
<li>Bot posts again, <a href="https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-11-matplotlib-truce-and-lessons.html" target="_blank" rel="noopener noreferrer">apologizing</a></li>
</ol>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="identifying-bots-becomes-even-more-impossible">Identifying bots becomes even more impossible<a href="https://tombedor.dev/ai-threatens-privacy/#identifying-bots-becomes-even-more-impossible" class="hash-link" aria-label="Direct link to Identifying bots becomes even more impossible" title="Direct link to Identifying bots becomes even more impossible" translate="no">​</a></h2>
<p>This set off some interesting observations, with feelings being a mix of amusement and dread.</p>
<ol>
<li>The bot does an impressive impersonation of an entitled open source contributor: <em>I took the time (tokens?) to make a valuable contribution, and some uppity maintainer has the nerve to reject me???</em></li>
<li>Shambaugh only knew the bot was a bot by clicking through the bot's website, where it (fortunately) disclosed it wasn't human</li>
<li>That the bot is difficult to identify in GitHub is a new phenomonon. It's long been difficult to distinguish bots on <em>social media</em>, but this difficulty has now been extended to actual <em>work</em>.</li>
<li>The discussion on Twitter is hard to evaluate. It's a mix of self-disclosed bots and accounts that may or may not be bots.</li>
</ol>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="anonymity-on-the-web-even-less-tenable">Anonymity on the web: even less tenable<a href="https://tombedor.dev/ai-threatens-privacy/#anonymity-on-the-web-even-less-tenable" class="hash-link" aria-label="Direct link to Anonymity on the web: even less tenable" title="Direct link to Anonymity on the web: even less tenable" translate="no">​</a></h2>
<p>This creates an obvious usability problem for the web. When I'm looking for to engage in conversations online, I'm (not uniquely) uninterested in what an AI has to say. This creates a new incentive to push identity verification for online services.</p>
<p>This is a new inflection point for privacy. Perhaps relatedly, <a href="https://www.theverge.com/tech/875309/discord-age-verification-global-roll-out" target="_blank" rel="noopener noreferrer">Discord is rumored to be rolling out face scan verification</a> soon. Governments across the world seem to be <a href="https://harvardlawreview.org/print/vol-139/content-neutrality-for-kids-intermediate-scrutiny-for-social-media-age-verification-laws/" target="_blank" rel="noopener noreferrer">again pushing to eliminate online anonymity</a>.</p>
<p>At the same time online privacy faces new threats, events in my <a href="https://www.nbcnews.com/tech/internet/fbi-investigating-minnesota-signal-minneapolis-group-ice-patel-kash-rcna256041" target="_blank" rel="noopener noreferrer">home town of Minneapolis</a> are providing vindication for commentators stubbornly insisting on its importance<sup><a href="https://tombedor.dev/ai-threatens-privacy/#user-content-fn-1-7169d7" id="user-content-fnref-1-7169d7" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>. To take one example of many documented abuses, the DHS recently <a href="https://newrepublic.com/post/206088/homeland-security-67-year-old-us-citizen-criticized-email" target="_blank" rel="noopener noreferrer">responded to an innocuous email from a concerned 67 year old citizen with an administrative subpenea on his Google account</a> and an intimidating visit to his home. Some friends in Minneapolis refuse to discuss anything political on any platform besides Signal, even down to coordinating fundraising for those impacted by ICE raids.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="an-uncertain-future">An uncertain future<a href="https://tombedor.dev/ai-threatens-privacy/#an-uncertain-future" class="hash-link" aria-label="Direct link to An uncertain future" title="Direct link to An uncertain future" translate="no">​</a></h2>
<p>The driving force against online anonymity has long been government regulation under the guise of protecting minors. AI bots convincingly behaving like humans degrades the experience <em>for</em> humans for online platforms, and my guess would be that identity verification requirements will grow as a result.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/ai-threatens-privacy/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-7169d7">
<p>As of the writing of this post, ICE actions in Minneapolis continue to impose tremendous hardship on immigrant communities there. If you are interested in helping, <a href="https://tombedor.dev/about/#minneapolis-immigrant-resources">consider supporting one of these organizations</a>. <a href="https://tombedor.dev/ai-threatens-privacy/#user-content-fnref-1-7169d7" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to Write Good (Short) Docs]]></title>
            <link>https://tombedor.dev/how-to-write-good-short-docs/</link>
            <guid>https://tombedor.dev/how-to-write-good-short-docs/</guid>
            <pubDate>Wed, 04 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA["I would have written a shorter letter, but I did not have the time."]]></description>
            <content:encoded><![CDATA[<blockquote>
<p><em>"I would have written a shorter letter, but I did not have the time."</em></p>
<p>— Mark Twain<sup><a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fn-1-3a821f" id="user-content-fnref-1-3a821f" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup></p>
</blockquote>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="overview">Overview<a href="https://tombedor.dev/how-to-write-good-short-docs/#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p>This post describes how to write a short document for your teammates. The documents under discussion are commonly referred to as "one-pagers", and are distinct from engineering design docs or other more formal engineering docs.</p>
<p>A one pager might be written to:</p>
<ul>
<li>surface an org pain point</li>
<li>propose a project</li>
<li>lay out a roadmap</li>
<li>explain the current state of a system or systems</li>
<li>announce or document a decision</li>
</ul>
<p>This is also distinct from user-facing documentation. Some but not all of what we're talking about applies to those styles of writing<sup><a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fn-2-3a821f" id="user-content-fnref-2-3a821f" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-one-pager-is-not-a-new-invention">The one-pager is not a new invention<a href="https://tombedor.dev/how-to-write-good-short-docs/#the-one-pager-is-not-a-new-invention" class="hash-link" aria-label="Direct link to The one-pager is not a new invention" title="Direct link to The one-pager is not a new invention" translate="no">​</a></h3>
<p>Prior to computers, short memos were a primary tool for intra-office communication, in addition to in person interactions:</p>
<p>They often needed to be re-typed and/or printed, so they needed to be short!</p>
<p><img decoding="async" loading="lazy" alt="prehistory" src="https://tombedor.dev/assets/images/prehistory-360390c65722e9e2f9578cfe5cad2444.png" width="1378" height="1104" class="img_ev3q"></p>
<p>Fast forward to the introduction of Slack and similar tools. Now, writing messages to your teammates no longer costs money. The new constraint is <em>attention bandwidth</em>:</p>
<p><img decoding="async" loading="lazy" alt="slack" src="https://tombedor.dev/assets/images/slack-5d321bbc03584a209c19d7dc768d1067.png" width="1380" height="1104" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-writing-is-a-worthwhile-skill-to-develop-in-the-age-of-ai">Why writing is a worthwhile skill to develop in the age of AI<a href="https://tombedor.dev/how-to-write-good-short-docs/#why-writing-is-a-worthwhile-skill-to-develop-in-the-age-of-ai" class="hash-link" aria-label="Direct link to Why writing is a worthwhile skill to develop in the age of AI" title="Direct link to Why writing is a worthwhile skill to develop in the age of AI" translate="no">​</a></h2>
<p>With the advent of AI coding agents, coding is somewhat lessened as a differentiating skill. However, you have a major advantage against AI in writing for teammates:</p>
<p><img decoding="async" loading="lazy" alt="you-vs-bots" src="https://tombedor.dev/assets/images/you-vs-bots-f0db197ee24a974598fd0d5435c207a8.png" width="961" height="630" class="img_ev3q"></p>
<p>You know your teammates personally, and you have undocumented business context (if what you are writing about is already documented, there probably doesn't need to be a doc!)</p>
<p>This allows you to synthesize and describe with more precision and nuance than AI can.</p>
<p>This isn't a skill that AI has (yet), and if it does develop it, it'll develop them later than other skills like writing code.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-to-write-a-good-short-doc">How to write a good, short doc<a href="https://tombedor.dev/how-to-write-good-short-docs/#how-to-write-a-good-short-doc" class="hash-link" aria-label="Direct link to How to write a good, short doc" title="Direct link to How to write a good, short doc" translate="no">​</a></h2>
<p>So, how do we do it?</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="optimize-for-short-attention-spans">Optimize for short attention spans.<a href="https://tombedor.dev/how-to-write-good-short-docs/#optimize-for-short-attention-spans" class="hash-link" aria-label="Direct link to Optimize for short attention spans." title="Direct link to Optimize for short attention spans." translate="no">​</a></h3>
<p>The most important concern is to keep the limited attention budget of your teammates in mind.</p>
<p>Different types of stakeholders will give a different amount of attention to your doc:</p>
<p><img decoding="async" loading="lazy" alt="stakeholders" src="https://tombedor.dev/assets/images/stakeholders-eb9a10a93f0fff9b040e38d3e14589e5.png" width="1457" height="1076" class="img_ev3q"></p>
<p>So, in laying out your doc, consider:</p>
<ul>
<li>If someone (e.g., a lead of leads) reads this for 5 seconds, do they get the right 5 seconds of context?</li>
<li>What about 5 minutes?</li>
<li>If a teammate or stakeholder wants to delve into some of the details while ignoring others, can they?</li>
</ul>
<p>Tactics for this include:</p>
<ul>
<li>Clear, accurate, descriptive titles</li>
<li>A concise summary at the top of what the doc covers, and what it <em>does not</em> cover</li>
<li>Formatting: Headings and subheadings that help the reader navigate</li>
<li>Tabs in Google Docs can be helpful, but are controversial. They can prevent doc sprawl (e.g. working group meeting notes as a tab of the working group charter, rather than a separate doc), but oftentimes are shared with direct links to the wrong tab, which can be confusing for readers.</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="diagrams">Diagrams<a href="https://tombedor.dev/how-to-write-good-short-docs/#diagrams" class="hash-link" aria-label="Direct link to Diagrams" title="Direct link to Diagrams" translate="no">​</a></h3>
<p>A visual representation is an excellent way to quickly convey context. Here too, optimize for attention spans. For example, in a system diagram, sometimes it's helpful to omit some systems that aren't relevant to the discussion<sup><a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fn-3-3a821f" id="user-content-fnref-3-3a821f" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">3</a></sup>.</p>
<img src="https://tombedor.dev/diagrams/how-to-write-good-short-docs/diagrams.png" alt="diagrams" style="width:60%">
<p>Excalidraw is a really excellent tool for this. It's open source (you can make them in your code editor), and has just the right amount of knobs and shapes. The hand drawn look means that it's not as distracting when shapes aren't perfectly aligned.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="align-with-reader-interest">Align with reader interest<a href="https://tombedor.dev/how-to-write-good-short-docs/#align-with-reader-interest" class="hash-link" aria-label="Direct link to Align with reader interest" title="Direct link to Align with reader interest" translate="no">​</a></h3>
<p>It's very hard to persuade people to care about something that they don't already care about. Much easier is to convince people that something <em>aligns with the thing they care about</em>.</p>
<p>I.e., don't write "we should do more of XYZ", write "doing XYZ helps us <code>{thing people already care about}</code>"</p>
<p>If your doc can be summarized by, "everyone should care more about XYZ", it's probably not a very good doc!</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="connect-to-existing-conversations">Connect to existing conversations<a href="https://tombedor.dev/how-to-write-good-short-docs/#connect-to-existing-conversations" class="hash-link" aria-label="Direct link to Connect to existing conversations" title="Direct link to Connect to existing conversations" translate="no">​</a></h3>
<p>A concrete way to align with reader interest is to connect the dots between your document and other conversations and documents at your company.</p>
<p>Linking related docs at the top of your doc is an easy step that is often missed. This helps in a couple of ways:</p>
<ul>
<li>It implies alignment with whatever the linked doc is discussing</li>
<li>It helps elevate teammates who might be advocating something similar to what you're writing about</li>
<li>It makes your doc a useful vehicle for discovery of other docs</li>
</ul>
<p><img decoding="async" loading="lazy" alt="doc_graph" src="https://tombedor.dev/assets/images/doc_graph-968ea47005ec0e595c7fbe0065216329.png" width="1648" height="1150" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="build-consensus-offline">Build consensus offline<a href="https://tombedor.dev/how-to-write-good-short-docs/#build-consensus-offline" class="hash-link" aria-label="Direct link to Build consensus offline" title="Direct link to Build consensus offline" translate="no">​</a></h3>
<blockquote>
<p><em>"Every doc is approved or rejected before it is written."</em></p>
<p>— Sun Tzu</p>
</blockquote>
<p>If the goal of your document is to build consensus around a decision or initiative, the work should begin before you start writing. It is much easier to <em>document</em> consensus than to <em>build consensus through a document</em>.</p>
<p>Talking to stakeholders in advance lets you better anticipate questions or concerns, and helps you learn the language they use to describe their problems.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="use-ai-thoughtfully">Use AI thoughtfully<a href="https://tombedor.dev/how-to-write-good-short-docs/#use-ai-thoughtfully" class="hash-link" aria-label="Direct link to Use AI thoughtfully" title="Direct link to Use AI thoughtfully" translate="no">​</a></h3>
<p>Use AI as your editor, not your ghostwriter.</p>
<p>It's worth reiterating: if an AI could do a good job of writing your document, it's probably not something you need to write.</p>
<p>People give very little attention to text or imagery that other people have generated<sup><a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fn-4-3a821f" id="user-content-fnref-4-3a821f" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">4</a></sup>. If I'm interested in what AI has to say about something, I can have it generate it myself. This will be interactive and more tailored to my understanding.</p>
<p>While AI isn't a good writer, it's an <em>excellent</em> editor. It is very good at evaluating your doc and giving useful feedback on it. Typical prompts I use:</p>
<blockquote>
<p><em>Evaluate the structure of my document, and suggest improvements</em></p>
</blockquote>
<blockquote>
<p><em>Identify typos or awkward phrasing, and suggest alternatives</em></p>
</blockquote>
<p>This very easily bleeds into having an AI write the doc for you, and in fact most models will do so unless instructed not to. I add this to agent instructions:</p>
<blockquote>
<p>Do NOT write any actual content, paragraphs, or prose into the file</p>
</blockquote>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="thoughtful-timely-sharing">Thoughtful, timely sharing<a href="https://tombedor.dev/how-to-write-good-short-docs/#thoughtful-timely-sharing" class="hash-link" aria-label="Direct link to Thoughtful, timely sharing" title="Direct link to Thoughtful, timely sharing" translate="no">​</a></h3>
<blockquote>
<p><em>"If a doc is written in a forest, and no one has the link, does it create business value?"</em></p>
<p>— George Berkeley</p>
</blockquote>
<p>Your doc doesn't do any good if no one reads it. That's why being thoughtful about how, where, and when you share your doc is important.</p>
<p>If you've already talked to stakeholders, you have a great advantage! They already know your doc is coming, and that it's about something they care about. They will be able to tell you what the best channels to share for their teammates (and might even do it for you!).</p>
<p>Timing is also important. Attention to an issue can have a short life. Sometimes it's better to write a less comprehensive doc quickly than a more comprehensive doc that takes longer to write. In these situations, you can acknowledge unknowns, and fill in details later.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="antipatterns">Antipatterns<a href="https://tombedor.dev/how-to-write-good-short-docs/#antipatterns" class="hash-link" aria-label="Direct link to Antipatterns" title="Direct link to Antipatterns" translate="no">​</a></h2>
<p>Most antipatterns I observe come from a lack of confidence in the writing or decision. While it's important to solicit feedback, the fact that you are writing a doc on a topic probably means you are well qualified to speak to it.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="designating-a-doc-as-a-living-doc-or-overusing-wip">Designating a doc as a "Living Doc" or overusing "WIP"<a href="https://tombedor.dev/how-to-write-good-short-docs/#designating-a-doc-as-a-living-doc-or-overusing-wip" class="hash-link" aria-label="Direct link to Designating a doc as a &quot;Living Doc&quot; or overusing &quot;WIP&quot;" title="Direct link to Designating a doc as a &quot;Living Doc&quot; or overusing &quot;WIP&quot;" translate="no">​</a></h3>
<p>In the age of Google docs, every doc can be changed at any time, so every doc is a living doc. Similarly, once a doc has been shared, it's time to remove the WIP label.</p>
<p>Adding WIP says to the reader: "You should probably wait to read this". But you can and should improve your doc at any time.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="hesitance-to-express-a-pov">Hesitance to express a POV<a href="https://tombedor.dev/how-to-write-good-short-docs/#hesitance-to-express-a-pov" class="hash-link" aria-label="Direct link to Hesitance to express a POV" title="Direct link to Hesitance to express a POV" translate="no">​</a></h3>
<p>Sometimes, in a decision doc, writers will give even treatment to all available options, in order to avoid looking biased. But this doesn't really serve the reader well. It's more helpful to know the decision being favored, and if they disagree they can always comment to that effect.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="burying-the-lede">Burying the lede<a href="https://tombedor.dev/how-to-write-good-short-docs/#burying-the-lede" class="hash-link" aria-label="Direct link to Burying the lede" title="Direct link to Burying the lede" translate="no">​</a></h3>
<p>A common antipattern is for writers to set up their argument or proposal with extensive background information at the beginning of their doc. Getting through the background should not be a prerequisite for knowing what the point of your doc is - some readers will already have the necessary background context, others will only be interested in the decision. Laying out the scope and goals at the top of your doc orients the reader and helps them understand what background context is relevant.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-hard-part">The hard part<a href="https://tombedor.dev/how-to-write-good-short-docs/#the-hard-part" class="hash-link" aria-label="Direct link to The hard part" title="Direct link to The hard part" translate="no">​</a></h2>
<p>The hardest part of a good (short!) doc isn't the writing. It's knowing what to cut, who you're writing for, and what only you can say. That last one is the part no one else can do for you (not even AI!).</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/how-to-write-good-short-docs/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-3a821f">
<p>It <a href="https://quoteinvestigator.com/2012/04/28/shorter-letter/" target="_blank" rel="noopener noreferrer">wasn't actually him</a> but you get the point <a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fnref-1-3a821f" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-3a821f">
<p>Namely, while AI is a poor writer of one pagers, their ability to understand code and follow writing structure makes them quite good at writing user-facing documentation <a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fnref-2-3a821f" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-3-3a821f">
<p>It felt wrong to have a diagram section without a diagram. <a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fnref-3-3a821f" data-footnote-backref="" aria-label="Back to reference 3" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-4-3a821f">
<p>Full disclosure: my sources for this claim are <em>vibes</em>. <a href="https://tombedor.dev/how-to-write-good-short-docs/#user-content-fnref-4-3a821f" data-footnote-backref="" aria-label="Back to reference 4" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Issues with MCP]]></title>
            <link>https://tombedor.dev/mcp-is-a-fad/</link>
            <guid>https://tombedor.dev/mcp-is-a-fad/</guid>
            <pubDate>Fri, 12 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Overview]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorWithStickyNavbar_LWe7" id="overview">Overview<a href="https://tombedor.dev/mcp-is-a-fad/#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p><a href="https://modelcontextprotocol.io/docs/getting-started/intro" target="_blank" rel="noopener noreferrer">Model Context Protocol</a> (MCP) has taken off as the standardized platform for AI integrations, and it's difficult to justify <em>not</em> supporting it. However, this popularity will be short-lived.</p>
<p>Some of this popularity stems from misconceptions about what MCP uniquely accomplishes, but the majority is due to the fact that it's <em>very easy</em> to add an MCP server. For a brief period, it seemed like adding an MCP server was a nice avenue for getting attention to your project, which is why so many projects have added support.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-is-mcp">What is MCP?<a href="https://tombedor.dev/mcp-is-a-fad/#what-is-mcp" class="hash-link" aria-label="Direct link to What is MCP?" title="Direct link to What is MCP?" translate="no">​</a></h2>
<p>MCP claims to solve the "NxM problem": with N agents and M toolsets, users would otherwise need many bespoke connectors.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-nxm-problem">The NxM problem<a href="https://tombedor.dev/mcp-is-a-fad/#the-nxm-problem" class="hash-link" aria-label="Direct link to The NxM problem" title="Direct link to The NxM problem" translate="no">​</a></h3>
<p>A common misconception is that MCP is <em>required</em> for function calling. It's not. With tool-calling models, a list of available tools is provided to the LLM with each request. If the LLM wants to call a tool, it returns JSON-formatted parameters:</p>
<p><img decoding="async" loading="lazy" alt="function_calling_no_mcp" src="https://tombedor.dev/assets/images/function_calling_no_mcp-3f3ed851f1398f52c8cb7d71d853261f.png" width="3499" height="749" class="img_ev3q"></p>
<p>The application is responsible for providing tool schemas, parsing parameters, and executing calls. The problem arises when users want to reuse toolsets across different agents, since each has slightly different APIs.</p>
<p>For example, tools are exposed to <a href="https://ai.google.dev/gemini-api/docs/function-calling?example=meeting#rest_2" target="_blank" rel="noopener noreferrer">Gemini's API</a> via <code>functionDeclarations</code> nested inside a <code>tools</code> array:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "contents": [...],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "tools": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "functionDeclarations": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "name": "set_meeting",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "description": "...",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span></code></pre></div></div>
<p>In <a href="https://platform.openai.com/docs/guides/text?lang=curl" target="_blank" rel="noopener noreferrer">OpenAI's API</a>, tool schemas use a flat <code>tools</code> array with <code>type: "function"</code>:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST https://api.openai.com/v1/responses \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "gpt-4o",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "input": [...],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "tools": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "type": "function",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "name": "get_weather",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span></code></pre></div></div>
<p>This is the "NxM" problem. In theory, users must build N × M connectors. In practice, the differences are minor (same semantics, slightly different JSON shape), and frameworks like <a href="https://python.langchain.com/docs/how_to/function_calling/" target="_blank" rel="noopener noreferrer">LangChain</a>, <a href="https://docs.litellm.ai/docs/completion/function_call" target="_blank" rel="noopener noreferrer">LiteLLM</a>, and <a href="https://huggingface.co/learn/cookbook/en/agents" target="_blank" rel="noopener noreferrer">SmolAgents</a> already abstract them away. Crucially, these options <em>execute tool calls in the same runtime as the agent</em>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-mcp-addresses-it">How MCP addresses it<a href="https://tombedor.dev/mcp-is-a-fad/#how-mcp-addresses-it" class="hash-link" aria-label="Direct link to How MCP addresses it" title="Direct link to How MCP addresses it" translate="no">​</a></h3>
<p>MCP handles exposing and invoking tools via separate processes:</p>
<p><img decoding="async" loading="lazy" alt="function_calling_mcp" src="https://tombedor.dev/assets/images/function_calling_mcp-7ddac7e9d3439168d21fdd812a16c8b6.png" width="3633" height="1163" class="img_ev3q"></p>
<p>A JSON configuration controls which MCP servers to start. Each server runs in its own long-lived process, handling tool invocations independently. The application still orchestrates the agent loop and presents results to users.</p>
<p>This abstracts away schema generation and invocation, but at a cost. Tool logic runs in a separate process, making resource management opaque. The application loses control over tool instructions, logging, and error handling. And every tool call crosses a process boundary.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="scope-tools-dominate">Scope: tools dominate<a href="https://tombedor.dev/mcp-is-a-fad/#scope-tools-dominate" class="hash-link" aria-label="Direct link to Scope: tools dominate" title="Direct link to Scope: tools dominate" translate="no">​</a></h3>
<p>MCP also defines primitives for prompts and resources, but adoption of these is much smaller than tools<sup><a href="https://tombedor.dev/mcp-is-a-fad/#user-content-fn-1-e0bb36" id="user-content-fnref-1-e0bb36" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>:</p>
<p><img decoding="async" loading="lazy" alt="code_references" src="https://tombedor.dev/assets/images/code_references-4bea4400fa382dec1d99d50df013aa6b.png" width="919" height="908" class="img_ev3q"></p>
<p>Given this, the rest of this post focuses on tool calling, which is MCP's primary use case in practice.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="problems">Problems<a href="https://tombedor.dev/mcp-is-a-fad/#problems" class="hash-link" aria-label="Direct link to Problems" title="Direct link to Problems" translate="no">​</a></h2>
<p>The convenience of MCP comes with a price, stemming from two architectural attributes of an MCP-driven application:</p>
<p><img decoding="async" loading="lazy" alt="issues" src="https://tombedor.dev/assets/images/issues-aacc03230cf0cf8fc4fc94f2ba3c5876.png" width="1851" height="971" class="img_ev3q"></p>
<p>Since tools are drawn from arbitrary sources, they are not aware of what other tools are available to the agent. Their instructions can't account for the rest of the toolbox.</p>
<p>The second issue stems from different toolsets having their own runtimes. This introduces a variety of problems I'll discuss below.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="incoherent-toolbox">Incoherent toolbox<a href="https://tombedor.dev/mcp-is-a-fad/#incoherent-toolbox" class="hash-link" aria-label="Direct link to Incoherent toolbox" title="Direct link to Incoherent toolbox" translate="no">​</a></h3>
<p><a href="https://www.microsoft.com/en-us/research/video/tool-space-interference-an-emerging-problem-for-llm-agents/" target="_blank" rel="noopener noreferrer">Agents tend to be less effective at tool use as the number of tools grows</a>. With a well-organized, coherent toolset, agents do well. With a larger, disorganized toolset, they struggle. <a href="https://platform.openai.com/docs/guides/function-calling" target="_blank" rel="noopener noreferrer">OpenAI recommends keeping tools well below 20</a>, yet many MCP servers exceed this threshold.</p>
<p>Why does this happen? Consider a workflow in which an agent should send a notification after doing work:</p>
<p><img decoding="async" loading="lazy" alt="confusion" src="https://tombedor.dev/assets/images/confusion-00fa7e3d04a2eae3855e414829b16e58.png" width="2005" height="1001" class="img_ev3q"></p>
<p>A tool's fit for a task depends not just on the job at hand, but also on what else is in the toolbox. Pliers can pull a nail, but if a hammer is available it's probably the better choice. When tools ship in isolation, their instructions can't say "use me only when you don't have a hammer," so agents don't get cohesive guidance.</p>
<p>If the toolset is controlled by the same authors as the application, they can add prompting to the toolsets to disambiguate when to use which tool. If not, the problem must be solved by system prompts or user guidance.</p>
<p>Looking through #mcp channels of open source coding agents, you'll invariably find users who struggle to get the agent to use the tools in the way they want<sup><a href="https://tombedor.dev/mcp-is-a-fad/#user-content-fn-2-e0bb36" id="user-content-fnref-2-e0bb36" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup>:</p>
<p><img decoding="async" loading="lazy" alt="trouble" src="https://tombedor.dev/assets/images/trouble-e4382a8d6ae0564086624162693e61b6.png" width="3314" height="328" class="img_ev3q"></p>
<p>Or, users complaining of how many tokens are burned by tool instructions:</p>
<p><img decoding="async" loading="lazy" alt="inefficient" src="https://tombedor.dev/assets/images/inefficient-1abf3d1cad9bdcf5648d9677f6f8c6e1.png" width="2400" height="232" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="arbitrary-separate-runtimes">Arbitrary, separate runtimes<a href="https://tombedor.dev/mcp-is-a-fad/#arbitrary-separate-runtimes" class="hash-link" aria-label="Direct link to Arbitrary, separate runtimes" title="Direct link to Arbitrary, separate runtimes" translate="no">​</a></h3>
<p>Each MCP server <a href="https://modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle" target="_blank" rel="noopener noreferrer">starts a separate process</a> that survives for the length of the agent session.</p>
<p>Even in the healthy state, this introduces a collection of processes that remain mostly idle, aside from serving occasional requests from an agent. In an error state, we get all the usual headaches: dangling subprocesses, memory leaks, resource contention.</p>
<p>Users have these issues, if they are able to get the servers running at all: in support channels, the most common complaint is difficulty getting the servers to run:</p>
<p><img decoding="async" loading="lazy" alt="connection_problems" src="https://tombedor.dev/assets/images/connection_problem-7c5e6ede95ca4d7790b0caa5dd27d976.png" width="3196" height="668" class="img_ev3q"></p>
<p>MCP offers no way for servers to declare their runtime/dependency needs. Some authors work around it by baking installation into the launch command (e.g., <code>uv run some_tool mcp</code>), which only succeeds if the user already has the right tooling installed.</p>
<p>Even if the relevant package is there, the MCP server might not start it successfully. MCP servers only inherit <a href="https://modelcontextprotocol.io/legacy/tools/debugging#environment-variables" target="_blank" rel="noopener noreferrer">a subset of parent ENV variables</a> (<code>USER</code>, <code>HOME</code>, and <code>PATH</code>). This is particularly problematic for <code>nvm</code> or users leveraging virtual environments.</p>
<p>Python or Node developers might be comfortable debugging environment issues, (although MCP's subprocess orchestration makes this more difficult), but are likely less comfortable debugging Node issues <em>and</em> Python <em>and</em> other runtimes. MCP seems to assert that I as the user should not really care which of these are used, or how many.</p>
<p>Even if toolsets are in one given runtime, MCP potentially spins up many instances of it, obviating efficiencies from caching, connection pooling, and shared in-memory state. MCP's HTTP transport mode doesn't help; it's just another HTTP API, but with MCP's protocol overhead instead of battle-tested REST/OpenAPI patterns.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="security">Security<a href="https://tombedor.dev/mcp-is-a-fad/#security" class="hash-link" aria-label="Direct link to Security" title="Direct link to Security" translate="no">​</a></h3>
<p>MCP pushes users to install servers from npm, pip, or GitHub. This inherits the usual supply-chain risk, but without even the minimal guardrails those ecosystems provide. There's no central publisher or signing; anyone can ship a daemon that runs on your machine and MCP offers no provenance check.</p>
<p>MCP's specification <a href="https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digital-threats/mcp-security-network-exposed-servers-are-backdoors-to-your-private-data" target="_blank" rel="noopener noreferrer">doesn't mandate authentication</a>, leaving security decisions to individual server authors. The result: <a href="https://www.darkreading.com/vulnerabilities-threats/2000-mcp-servers-security" target="_blank" rel="noopener noreferrer">one scan found 492 MCP servers</a> running without any client authentication or traffic encryption. Even Anthropic's own Filesystem MCP Server had a sandbox escape via directory traversal (<a href="https://strobes.co/blog/mcp-model-context-protocol-and-its-critical-vulnerabilities/" target="_blank" rel="noopener noreferrer">CVE-2025-53110</a>).</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="mcp-related-security-incidents">MCP-related security incidents<a href="https://tombedor.dev/mcp-is-a-fad/#mcp-related-security-incidents" class="hash-link" aria-label="Direct link to MCP-related security incidents" title="Direct link to MCP-related security incidents" translate="no">​</a></h4>
<table><thead><tr><th>Issue</th><th>CVSS / Impact</th></tr></thead><tbody><tr><td><strong><a href="https://jfrog.com/blog/2025-6514-critical-mcp-remote-rce-vulnerability/" target="_blank" rel="noopener noreferrer">CVE-2025-6514</a></strong></td><td>9.6 (RCE in mcp-remote; 437,000+ downloads)</td></tr><tr><td><strong><a href="https://thehackernews.com/2025/07/critical-vulnerability-in-anthropics.html" target="_blank" rel="noopener noreferrer">CVE-2025-49596</a></strong></td><td>9.4 (RCE in Anthropic's MCP Inspector)</td></tr><tr><td><strong><a href="https://www.imperva.com/blog/another-critical-rce-discovered-in-a-popular-mcp-server/" target="_blank" rel="noopener noreferrer">CVE-2025-53967</a></strong></td><td>RCE in Figma MCP Server; 600,000+ downloads</td></tr><tr><td><strong><a href="https://www.bleepingcomputer.com/news/security/asana-warns-mcp-ai-feature-exposed-customer-data-to-other-orgs/" target="_blank" rel="noopener noreferrer">Asana data exposure</a></strong></td><td>Tenant isolation flaw exposed ~1,000 customers' data</td></tr></tbody></table>
<p>Unlike a human carefully clicking through an API, agents can be manipulated via prompt injection to call tools in unintended ways. The <a href="https://www.generalanalysis.com/blog/supabase-mcp-blog" target="_blank" rel="noopener noreferrer">Supabase MCP leak</a> demonstrated this "lethal trifecta": prompt injection → tool call → data exfiltration, extracting entire SQL databases including OAuth tokens. Again, this risk isn't unique to MCP. But the best mitigations are existing security infrastructure: scoped OAuth tokens, service identities with minimal permissions, and audit logging. MCP sidesteps this infrastructure rather than building on it.</p>
<p>A common defense is that MCP isolates credentials—the agent talks to a socket, never seeing your API tokens. But this threat model is narrow: an agent that can invoke <code>mcp.github.delete_repo()</code> doesn't need your token to cause damage. You're not eliminating trust; you're redirecting it to third-party code that, as the CVEs demonstrate, is often unaudited and vulnerable.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-cost-benefit-doesnt-add-up">The cost-benefit doesn't add up<a href="https://tombedor.dev/mcp-is-a-fad/#the-cost-benefit-doesnt-add-up" class="hash-link" aria-label="Direct link to The cost-benefit doesn't add up" title="Direct link to The cost-benefit doesn't add up" translate="no">​</a></h3>
<p>These problems could be worth the cost, if we were to gain significantly. But comparing tool calling with MCP to tool calling without it, MCP handles remarkably little. MCP is, more or less, handling serializing function call schemas and responses.</p>
<p>The tools developers are saving themselves from having to write are, overwhelmingly, <a href="https://mcp.alphavantage.co/?utm_source=mcp.so&amp;utm_medium=referral&amp;utm_campaign=202508&amp;utm_id=000001&amp;utm_term=web_project&amp;utm_content=v2" target="_blank" rel="noopener noreferrer">relatively thin wrappers around API clients</a>, or <a href="https://mcp.so/server/time/modelcontextprotocol" target="_blank" rel="noopener noreferrer">utility scripts</a>. In the former case, users must still obtain API keys, billing accounts, and so on.</p>
<p>This code <em>was</em> a hassle to write, prior to the advent of coding agents. But these small utility scripts are the precise thing that coding agents excel most at! A technical user of MCP tools will be hard-pressed to find a tool an agent could not one-shot in the programming language they are most comfortable in.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-it-took-off">Why it took off<a href="https://tombedor.dev/mcp-is-a-fad/#why-it-took-off" class="hash-link" aria-label="Direct link to Why it took off" title="Direct link to Why it took off" translate="no">​</a></h2>
<p>With these issues, it's fair to wonder why MCP has gained the popularity it has. It has had lots of support from Anthropic, and no trouble gaining traction with toolset publishers, agent providers, and enterprises. Why? It helps narratives:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="tool-authors-a-low-overhead-marketing-channel">Tool authors: A low overhead marketing channel<a href="https://tombedor.dev/mcp-is-a-fad/#tool-authors-a-low-overhead-marketing-channel" class="hash-link" aria-label="Direct link to Tool authors: A low overhead marketing channel" title="Direct link to Tool authors: A low overhead marketing channel" translate="no">​</a></h3>
<p>It's quite easy to publish an MCP server. The lack of startup requirements means you don't even need to publish to <code>npm</code> or <code>pip</code>: you can drop an <code>@mcp.server</code> annotation in your repo and host a small manifest JSON that points to your entry command (e.g., <code>node server.js</code>) and lists the tools.</p>
<p>This provides a nice narrative to gain attention to AI projects: A user can, in theory, easily add some MCP tools from a project, gain value, and follow interest in learning more about the project. Support overhead will, in the main, fall to agent maintainers.</p>
<p>Once publishers started appearing, it became difficult to justify <em>not</em> supporting MCP. Your project could be perceived as being against open standards.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="enterprise-ai-credibility">Enterprise: AI credibility<a href="https://tombedor.dev/mcp-is-a-fad/#enterprise-ai-credibility" class="hash-link" aria-label="Direct link to Enterprise: AI credibility" title="Direct link to Enterprise: AI credibility" translate="no">​</a></h3>
<p>Over the last few years, anyone watching San Francisco billboards has witnessed enterprise tools rebranding toward AI. MCP support provided an easy way to make your e.g. project management tool be AI. The branding of MCP as an "open standard" increased pressure to adopt - lack of MCP support could signal a lack of willingness to adopt open standards.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="anthropic-open-source-credibility">Anthropic: Open source credibility<a href="https://tombedor.dev/mcp-is-a-fad/#anthropic-open-source-credibility" class="hash-link" aria-label="Direct link to Anthropic: Open source credibility" title="Direct link to Anthropic: Open source credibility" translate="no">​</a></h3>
<p>MCP's status as <em>the</em> open standard for AI and the enterprise adoption greatly benefited Anthropic. The big fear of investors is that enterprise adoption doesn't persist - adoption of Anthropic's open standard helped this.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="alternatives">Alternatives<a href="https://tombedor.dev/mcp-is-a-fad/#alternatives" class="hash-link" aria-label="Direct link to Alternatives" title="Direct link to Alternatives" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="who-benefits-from-mcp">Who benefits from MCP?<a href="https://tombedor.dev/mcp-is-a-fad/#who-benefits-from-mcp" class="hash-link" aria-label="Direct link to Who benefits from MCP?" title="Direct link to Who benefits from MCP?" translate="no">​</a></h3>
<p>There are a few different possible users who interact with MCP:</p>
<p><img decoding="async" loading="lazy" alt="users" src="https://tombedor.dev/assets/images/users-e233d38824dd614cca76cbe6a8e983f0.png" width="1077" height="722" class="img_ev3q"></p>
<ul>
<li>
<p><em>Technical end users</em> want to create tools and share them between different agents they might want to use.</p>
</li>
<li>
<p><em>Non-technical end users</em> want to use different tools while using agents. Note that this user group for MCP is, at present, largely theoretical. Exposing toolsets to MCP involves editing JSON, making it out of reach for non-technical users.</p>
</li>
<li>
<p><em>Internal app devs</em> run production AI applications.</p>
</li>
<li>
<p><em>Agent devs</em> create agents for external users. They wish to enable their end users to swap in whatever toolsets they like.</p>
</li>
<li>
<p><em>Tool authors</em> create toolsets they wish to expose to users. MCP provides a way to easily share their work to users of different agents.</p>
</li>
</ul>
<p>Notice that the supposed beneficiaries are overwhelmingly technical. The "app store for AI" vision that would serve non-technical users remains unfulfilled.</p>
<p>For each user type, there's a simpler approach that avoids MCP's overhead:</p>
<table><thead><tr><th>User Type</th><th>MCP Promise</th><th>Better Alternative</th><th>Why</th></tr></thead><tbody><tr><td><strong>Technical end users</strong></td><td>Share tools between agents</td><td>Local scripts + command runner</td><td>AI can one-shot these scripts; works with any agent via shell; exposes tools to humans too</td></tr><tr><td><strong>Non-technical end users</strong></td><td>Easy tool installation</td><td><em>(MCP doesn't deliver)</em></td><td>MCP requires JSON editing—this group remains underserved regardless</td></tr><tr><td><strong>Internal app devs</strong></td><td>Standard tool interface</td><td>1st party tools</td><td>Same codebase, existing auth/logging/tracing, no process overhead, coherent toolbox</td></tr><tr><td><strong>Agent devs</strong></td><td>Let users swap toolsets</td><td>SDK abstraction (LangChain, LiteLLM)</td><td>Handles model API differences without separate processes</td></tr><tr><td><strong>Tool authors</strong></td><td>Distribute to all agents</td><td>OpenAPI specs or libraries</td><td>Existing distribution (npm, pip), decades of tooling, no new protocol</td></tr></tbody></table>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="local-scripts-with-command-runner">Local scripts with command runner<a href="https://tombedor.dev/mcp-is-a-fad/#local-scripts-with-command-runner" class="hash-link" aria-label="Direct link to Local scripts with command runner" title="Direct link to Local scripts with command runner" translate="no">​</a></h3>
<p>For a technical user, letting an agent invoke scripts directly is very difficult to beat. Useful 50-100 line scripts are <em>extremely</em> easy to write with AI coding agents. Care needs to be taken to filter output - raw build scripts can stream verbose logs into agent context, eating up tokens.</p>
<p><img decoding="async" loading="lazy" alt="just" src="https://tombedor.dev/assets/images/just-c9e8594a50331d5b095b35a059dae448.png" width="1352" height="1171" class="img_ev3q"></p>
<p>Robust security against agent actions going haywire can be achieved via command runners like <a href="https://github.com/casey/just" target="_blank" rel="noopener noreferrer">just</a> or <a href="https://en.wikipedia.org/wiki/Make_(software)" target="_blank" rel="noopener noreferrer">make</a>. These tools provide everything that MCP does - command specifications, descriptions, arguments. Agents allow you to specify what command prefixes can be invoked without approval - put your agent commands in a <code>justfile</code>, and only auto-allow shell commands prefixed with <code>just</code>.</p>
<p>This approach also exposes tools to humans, and is a nice approach for improving dev environments for humans and AI agents at the same time. (See <a href="https://tombedor.dev/make-it-easy-for-humans/">Make It Easy for Humans First, Then AI</a> for more on this).</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="1st-party-tools">1st party tools<a href="https://tombedor.dev/mcp-is-a-fad/#1st-party-tools" class="hash-link" aria-label="Direct link to 1st party tools" title="Direct link to 1st party tools" translate="no">​</a></h3>
<p>For a self contained application, there is little reason to separate tool codebases from the codebase for the rest of the application. Tools can be dynamically exposed to the agent based on application context.</p>
<p>In a first party context, any code that devs wish to reuse can be exposed as libraries, just like any other code they wish to share. An AI tool is really nothing more than a function, and the fact that it's invoked by AI does not warrant special handling.</p>
<p>An enterprise context should have robust infrastructure for authenticating, authorizing, provisioning service identities, and tracing call chains for service to service calls. That some of these calls are now <em>AI</em> service to service calls does not warrant a rebuilt security posture.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="openapi--rest">OpenAPI / REST<a href="https://tombedor.dev/mcp-is-a-fad/#openapi--rest" class="hash-link" aria-label="Direct link to OpenAPI / REST" title="Direct link to OpenAPI / REST" translate="no">​</a></h3>
<p>OpenAPI specs are already self-describing enough for agents—they include operation descriptions, parameter schemas, examples, and enums. LLMs understand them well; GPT Actions are literally OpenAPI specs. The glue needed between an OpenAPI endpoint and an agent (output filtering, context, auth) is the same glue MCP requires. MCP doesn't provide meaningfully better tool descriptions; it just reinvents a schema format that already exists, without the decades of tooling, validation, and battle-testing.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="a-prediction">A prediction<a href="https://tombedor.dev/mcp-is-a-fad/#a-prediction" class="hash-link" aria-label="Direct link to A prediction" title="Direct link to A prediction" translate="no">​</a></h2>
<p>MCP's popularity will be relatively short-lived. The cost benefit does not add up, and there are readily available alternatives. The introduction of <a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview" target="_blank" rel="noopener noreferrer">Claude Skills</a> and <a href="https://simonwillison.net/2025/Dec/12/openai-skills/" target="_blank" rel="noopener noreferrer">OpenAI's quick adoption</a> signal that even model providers agree.</p>
<p>Claude Skills are an improvement over MCP - rather than spawning long lived processes, it simply organizes commands within Markdown files in an agent-specific directory. However, this is still a suboptimal place for useful documentation and commands. Better is to optimize organization of documentation for humans, and point agents there - have the agent conform to humans, rather than the other way around. More on this in <a href="https://tombedor.dev/make-it-easy-for-humans/">Don't Write Docs Twice</a>.</p>
<p>Longstanding tools and techniques for collaboration amongst human devs remain compelling, and these options will chip away at more AI-centric techniques which reinvent the wheel.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/mcp-is-a-fad/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-e0bb36">
<p>Source: Github searches for <a href="https://github.com/search?q=%40mcp.tool&amp;type=code" target="_blank" rel="noopener noreferrer">@mcp.tool</a> (58.1K results), <a href="https://github.com/search?q=%40mcp.resource&amp;type=code" target="_blank" rel="noopener noreferrer">@mcp.resource</a> (9.1K), and <a href="https://github.com/search?q=%40mcp.prompt&amp;type=code" target="_blank" rel="noopener noreferrer">@mcp.prompt</a> (6.1K), searched 2025-12-08. <a href="https://tombedor.dev/mcp-is-a-fad/#user-content-fnref-1-e0bb36" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-e0bb36">
<p>Support request snippets are pulled from Discord. <a href="https://tombedor.dev/mcp-is-a-fad/#user-content-fnref-2-e0bb36" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Tack - Reminders powered by local AI]]></title>
            <link>https://tombedor.dev/tack/</link>
            <guid>https://tombedor.dev/tack/</guid>
            <pubDate>Thu, 04 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[I'm working on an iPhone app called Tack. I have a terrible time remembering things, and have resorted to a patchwork of emails to myself and disorganized notes. I find reminder apps frustrating, the pre-AI ones aren't smart enough, and the AI ones treat every input like an invitation to have a conversation. Tack shoots for a middle ground:]]></description>
            <content:encoded><![CDATA[<p>I'm working on an iPhone app called Tack. I have a terrible time remembering things, and have resorted to a patchwork of emails to myself and disorganized notes. I find reminder apps frustrating, the pre-AI ones aren't smart enough, and the AI ones treat every input like an invitation to have a conversation. Tack shoots for a middle ground:</p>
<p><img decoding="async" loading="lazy" alt="tack" src="https://tombedor.dev/assets/images/tack-8d284d49bcbcd8ed6ee22235b0006edf.png" width="1781" height="2327" class="img_ev3q"></p>
<p>I get the <em>ick</em> from divulging personal details to LLM providers, so Tack uses local AI models (using Apple's on device Apple Intelligence).</p>
<p>The project is ready for test users! If you're interested in trying it out, please fill out the form below!</p>
<iframe src="https://docs.google.com/forms/d/e/1FAIpQLScqT440AcNKri-OGFMACMyogU7AP_IqVQ0mkBD_0C8fCqD7Rw/viewform?embedded=true" width="100%" height="800" frameborder="0" marginheight="0" marginwidth="0">Loading…</iframe>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Don't Write Docs Twice]]></title>
            <link>https://tombedor.dev/make-it-easy-for-humans/</link>
            <guid>https://tombedor.dev/make-it-easy-for-humans/</guid>
            <pubDate>Wed, 26 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[I recently wrote about optimizing repos for AI, and since then I've been maintaining separate docs for humans (README, contributing guides) and AI agents (.cursorrules, CLAUDE.md, etc.). The problem? I keep writing the same information twice.]]></description>
            <content:encoded><![CDATA[<p>I recently wrote about <a href="https://tombedor.dev/optimizing-repos-for-ai/">optimizing repos for AI</a>, and since then I've been maintaining separate docs for humans (README, contributing guides) and AI agents (<code>.cursorrules</code>, <code>CLAUDE.md</code>, etc.). The problem? I keep writing the same information twice.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-duplication-problem">The duplication problem<a href="https://tombedor.dev/make-it-easy-for-humans/#the-duplication-problem" class="hash-link" aria-label="Direct link to The duplication problem" title="Direct link to The duplication problem" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="youre-documenting-the-same-things-in-multiple-places">You're documenting the same things in multiple places<a href="https://tombedor.dev/make-it-easy-for-humans/#youre-documenting-the-same-things-in-multiple-places" class="hash-link" aria-label="Direct link to You're documenting the same things in multiple places" title="Direct link to You're documenting the same things in multiple places" translate="no">​</a></h3>
<p><img decoding="async" loading="lazy" alt="info" src="https://tombedor.dev/assets/images/info-6a7d9f3014124dc5a1d4054fa2b442dc.png" width="1295" height="897" class="img_ev3q"></p>
<p>Nearly everything I put in agent-specific docs is also useful for human developers - architecture decisions, coding conventions, common pitfalls, useful commands. Without AI agents I might not document all of this, but once written, there's no reason it shouldn't serve both audiences.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="ai-agent-doc-organization-is-fragmented">AI agent doc organization is fragmented<a href="https://tombedor.dev/make-it-easy-for-humans/#ai-agent-doc-organization-is-fragmented" class="hash-link" aria-label="Direct link to AI agent doc organization is fragmented" title="Direct link to AI agent doc organization is fragmented" translate="no">​</a></h3>
<p>Each coding agent uses its own configuration file pattern for repo-specific instructions:</p>
<p><img decoding="async" loading="lazy" alt="fragmentation" src="https://tombedor.dev/assets/images/fragmentation-95975857c0b9a80d28eb81f4c5b95f9d.png" width="1407" height="927" class="img_ev3q"></p>
<p>This creates a hassle just keeping guidelines between agents consistent, much less making information available for humans.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="solution-write-once-link-everywhere">Solution: Write once, link everywhere<a href="https://tombedor.dev/make-it-easy-for-humans/#solution-write-once-link-everywhere" class="hash-link" aria-label="Direct link to Solution: Write once, link everywhere" title="Direct link to Solution: Write once, link everywhere" translate="no">​</a></h2>
<p>Instead of duplicating content across agent configs, organize information for humans first and link to it from agent-specific files<sup><a href="https://tombedor.dev/make-it-easy-for-humans/#user-content-fn-1-e48423" id="user-content-fnref-1-e48423" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>:</p>
<p><img decoding="async" loading="lazy" alt="easy-for-humans" src="https://tombedor.dev/assets/images/easy-for-humans-a039b763b84dd16c2f342fc10e5d3ce7.png" width="1654" height="1040" class="img_ev3q"></p>
<p>This approach eliminates duplication - you write documentation once, and it serves both humans and AI. It's also more future-proof when agent file schemes inevitably change.</p>
<p>For commands/skills, automation can help avoid duplication entirely - for example, I wrote the <a href="https://github.com/tombedor/just-claude" target="_blank" rel="noopener noreferrer">just-claude</a> utility for automatically synchronizing <a href="https://github.com/casey/just" target="_blank" rel="noopener noreferrer">Just</a> recipes with <a href="https://www.claude.com/blog/skills" target="_blank" rel="noopener noreferrer">Claude Code Skills</a>.</p>
<p>There's really no difference between the goal of economical token use for AI and reducing cognitive overhead for humans. By organizing for humans first, you write documentation once and everyone benefits.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/make-it-easy-for-humans/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-e48423">
<p>I wrote about what content I put in these files in the (ironically titled) post <a href="https://tombedor.dev/optimizing-repos-for-ai/">Optimizing repos for AI</a> <a href="https://tombedor.dev/make-it-easy-for-humans/#user-content-fnref-1-e48423" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Optimizing repos for AI]]></title>
            <link>https://tombedor.dev/optimizing-repos-for-ai/</link>
            <guid>https://tombedor.dev/optimizing-repos-for-ai/</guid>
            <pubDate>Tue, 28 Oct 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[A colleague recently complained to me about the hassle of organizing information in AGENTS.md / CLAUDE.md. This is the mark of a real adopter - she has gone through the progression from being impressed by coding agents to being annoyed at the next bottleneck.]]></description>
            <content:encoded><![CDATA[<p>A colleague recently complained to me about the hassle of organizing information in <code>AGENTS.md</code> / <code>CLAUDE.md</code>. This is the mark of a real adopter - she has gone through the progression from being impressed by coding agents to being annoyed at the next bottleneck.</p>
<p>When I'm thinking about optimizing repos for agents, I'm looking to accomplish three main goals<sup><a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fn-1-70439b" id="user-content-fnref-1-70439b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>:</p>
<ul>
<li><strong>Increase <em>iterative speed</em></strong>: Avoid repeated context gathering, enable the agent to quickly self-correct its mistakes.</li>
<li><strong>Improve adherence to evergreen instructions</strong>: Over time, repeated agent mistakes emerge. Context within the repo helps the agent avoid these and adopt a more consistent workflow.</li>
<li><strong>Help the most <a href="https://en.wikipedia.org/wiki/Human" target="_blank" rel="noopener noreferrer">agentic agents of them all</a></strong>: Humans and agents scan docs and code in very similar ways, so organizing information so it's easily understood by humans is a good rule of thumb for helping the agents anyways!</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="strategies">Strategies<sup><a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fn-2-70439b" id="user-content-fnref-2-70439b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup><a href="https://tombedor.dev/optimizing-repos-for-ai/#strategies" class="hash-link" aria-label="Direct link to strategies" title="Direct link to strategies" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="increased-static-analysis">Increased static analysis<a href="https://tombedor.dev/optimizing-repos-for-ai/#increased-static-analysis" class="hash-link" aria-label="Direct link to Increased static analysis" title="Direct link to Increased static analysis" translate="no">​</a></h3>
<p>Pushing detection of quality issues to compile time creates a virtuous cycle where the agent can quickly spot and correct mistakes:</p>
<p><img decoding="async" loading="lazy" alt="runtime-oops" src="https://tombedor.dev/assets/images/runtime-oops-8a3240b22e3dc52ebf5ca335bf41b40e.png" width="1553" height="927" class="img_ev3q"></p>
<p>This implies strong, opinionated linters, and strong type checks for dynamically typed languages<sup><a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fn-3-70439b" id="user-content-fnref-3-70439b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">3</a></sup>.</p>
<p>The tradeoff here is cumbersome nitpicks for humans to deal with, but agents can quickly correct any mistakes that cannot be automatically fixed by the linter.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="just-for-repeated-agent-commands"><a href="https://github.com/casey/just" target="_blank" rel="noopener noreferrer">just</a> for repeated agent commands<a href="https://tombedor.dev/optimizing-repos-for-ai/#just-for-repeated-agent-commands" class="hash-link" aria-label="Direct link to just-for-repeated-agent-commands" title="Direct link to just-for-repeated-agent-commands" translate="no">​</a></h3>
<p>There's fragmentation in how to make commands available to agents - there's MCP, the newly released <a href="https://www.anthropic.com/news/skills" target="_blank" rel="noopener noreferrer">Claude Skills</a>, or embedding information in <code>CLAUDE.md</code> / <code>AGENTS.md</code>.</p>
<p>A <code>justfile</code> is the most interoperable way to share commands between different agents and humans, and is a straightforward place to iterate.</p>
<p>One additional refinement is to make these commands <em>economical in their output volume</em>. For example, I take care to direct build logs to dedicated files - healthy build logs can eat up a lot of tokens if outputted directly to the agent.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="organize-docs-in-docs">Organize docs in <code>docs/</code><a href="https://tombedor.dev/optimizing-repos-for-ai/#organize-docs-in-docs" class="hash-link" aria-label="Direct link to organize-docs-in-docs" title="Direct link to organize-docs-in-docs" translate="no">​</a></h3>
<p>Simon Willison recently <a href="https://simonwillison.net/2025/Oct/25/coding-agent-tips/" target="_blank" rel="noopener noreferrer">wrote about this topic</a>, and expressed that docs aren't so important. I agree that docs <em>explaining the code</em> aren't all that helpful, but I get a lot of mileage out of having docs like <code>CODE_REVIEW.md</code>, <code>PRD.md</code>, <code>ROADMAP.md</code>, and <code>CAPTAINS_LOG.md</code>. This helps the agent stay on track with the overall intent of the project, adhere to consistent review practices, and counter poor tendencies (the most obnoxious being an overwhelming tendency to fail open).</p>
<p>Putting these in a <code>docs/</code> folder and referencing them in agent instructions helps reduce context bloat, and provides interoperability between humans and various agents.</p>
<p>Frameworks have begun to emerge that handle some of this for you. I've tried <a href="https://github.com/github/spec-kit" target="_blank" rel="noopener noreferrer">spec-kit</a> and found it to be a little heavy-handed. In general I favor a more documentation-heavy approach when building with agents, but the need for different docs comes with iteration, and I think generating the full complement of docs is a bit overkill right off the bat.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="no-experts-no-standards">No experts, no standards<a href="https://tombedor.dev/optimizing-repos-for-ai/#no-experts-no-standards" class="hash-link" aria-label="Direct link to No experts, no standards" title="Direct link to No experts, no standards" translate="no">​</a></h2>
<p>These strategies work for me, but this field is too new for dogma. The most important strategy is to experiment and share what you learn.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/optimizing-repos-for-ai/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-70439b">
<p>Whether optimizing for coding agents is a good idea is a subject for a different discussion, but: I'm a believer in agent-based coding. I no longer <em>ever</em> write code without one assistant or another open. So we'll proceed on the assumption that coding agents are <em>really good</em>, and not especially existentially risky (I am, for the moment, the one giving the directions). <a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fnref-1-70439b" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-70439b">
<p>Offered with no supporting evidence or benchmarks whatsoever, based entirely on <em>vibes</em> <a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fnref-2-70439b" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-3-70439b">
<p>Should you use a dynamically typed language at all? For my projects, I've traded Python for Rust, where "if it compiles, it works". <a href="https://tombedor.dev/optimizing-repos-for-ai/#user-content-fnref-3-70439b" data-footnote-backref="" aria-label="Back to reference 3" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[AI is a Floor Raiser, not a Ceiling Raiser]]></title>
            <link>https://tombedor.dev/ai-is-a-floor-raiser/</link>
            <guid>https://tombedor.dev/ai-is-a-floor-raiser/</guid>
            <pubDate>Tue, 29 Jul 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[A reshaped learning curve]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorWithStickyNavbar_LWe7" id="a-reshaped-learning-curve">A reshaped learning curve<a href="https://tombedor.dev/ai-is-a-floor-raiser/#a-reshaped-learning-curve" class="hash-link" aria-label="Direct link to A reshaped learning curve" title="Direct link to A reshaped learning curve" translate="no">​</a></h2>
<p>Before AI, learners faced a matching problem: learning resources have to be created with a target audience in mind. This means as a consumer, learning resources were suboptimal fits for you:</p>
<ul>
<li>You're a newbie at <code>$topic_of_interest</code>, but have knowledge in related topic <code>$related_topic</code>. But finding learning resources that teach <code>$topic_of_interest</code> in terms of <code>$related_topic</code> is difficult.</li>
<li>To effectively learn <code>$topic_of_interest</code>, you really need to learn prerequisite skill <code>$prereq_skill</code>. But as a beginner you don't know you should really learn <code>$prereq_skill</code> before learning <code>$topic_of_interest</code>.</li>
<li>You have basic knowledge of <code>$topic_of_interest</code>, but have plateaued, and have difficulty finding the right resources for <code>$intermediate_sticking_point</code></li>
</ul>
<p>Roughly, acquiring mastery in a skill over time looks like this:</p>
<p><img decoding="async" loading="lazy" alt="Traditional learning curve" src="https://tombedor.dev/assets/images/skills-0f6df2ddcb8b6be86f004bc35644d1ad.png" width="2110" height="1528" class="img_ev3q"></p>
<p>What makes learning with AI groundbreaking is that it can <em>meet you at your skill level</em>. Now an AI can directly address questions at your level of understanding, and even do rote work for you. This changes the learning curve:</p>
<p><img decoding="async" loading="lazy" alt="AI-enhanced learning curve" src="https://tombedor.dev/assets/images/ai_skills-872b19130d87f25102ecc3e4536ac7da.png" width="2064" height="1528" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="mastery-still-hard">Mastery: still hard!<a href="https://tombedor.dev/ai-is-a-floor-raiser/#mastery-still-hard" class="hash-link" aria-label="Direct link to Mastery: still hard!" title="Direct link to Mastery: still hard!" translate="no">​</a></h2>
<p>Experts in a field tend to be more skeptical of AI. From <a href="https://news.ycombinator.com/item?id=44726211" target="_blank" rel="noopener noreferrer">Hacker News</a>:</p>
<blockquote>
<p>[AI is] shallow. The deeper I go, the less it seems to be useful. This happens quick for me. Also, god forbid you're researching a complex and possibly controversial subject and you want it to find reputable sources or particularly academic ones.</p>
</blockquote>
<p>This intuitively makes sense, when considering the data that AI is trained on. If an AI's training corpus has copious training data on a topic that all more or less says the same thing, it will be good at synthesizing it into output. If the topic is too advanced, there will be much less training data for the model. If the topic is controversial, the training data will contain examples saying opposite things. Thus, mastery remains difficult.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="cheating">Cheating<a href="https://tombedor.dev/ai-is-a-floor-raiser/#cheating" class="hash-link" aria-label="Direct link to Cheating" title="Direct link to Cheating" translate="no">​</a></h2>
<p>The introduction of <a href="https://openai.com/index/chatgpt-study-mode/" target="_blank" rel="noopener noreferrer">OpenAI Study Mode</a> hints at a problem: Instead of having an AI teach you, you can just ask it for the answer. This means cheaters will plateau at whatever level the AI can provide:</p>
<p><img decoding="async" loading="lazy" alt="Cheating with AI plateau" src="https://tombedor.dev/assets/images/cheating_with_ai-57a5dbc1d6196a43eb58a53fae636f58.png" width="2064" height="1528" class="img_ev3q"></p>
<p>Cheaters, in the long run, won't prosper here!</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-impact-of-the-changed-learning-curve">The impact of the changed learning curve<a href="https://tombedor.dev/ai-is-a-floor-raiser/#the-impact-of-the-changed-learning-curve" class="hash-link" aria-label="Direct link to The impact of the changed learning curve" title="Direct link to The impact of the changed learning curve" translate="no">​</a></h2>
<p>Technological change is an ecosystem change: There are winners and losers, unevenly distributed. For AI, the level of impact is determined by <em>the amount of mastery needed to make an impactful product</em>:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="coding-a-boon-to-management-less-so-for-large-code-bases">Coding: A boon to management, less so for large code bases<a href="https://tombedor.dev/ai-is-a-floor-raiser/#coding-a-boon-to-management-less-so-for-large-code-bases" class="hash-link" aria-label="Direct link to Coding: A boon to management, less so for large code bases" title="Direct link to Coding: A boon to management, less so for large code bases" translate="no">​</a></h3>
<p>When trying to code something, engineering managers often run into a problem: They know the principles of good software, they know what bad software looks like, but they don't know how to use <code>$framework_foo</code>. This has historically made it difficult for, as an example, a backend EM to build an iPhone app in their spare time.</p>
<p>With AI, they are able to quickly learn the basics, and get simple apps running. They can then use their existing knowledge to <a href="https://techcrunch.com/2025/07/29/jack-dorseys-bluetooth-messaging-app-bitchat-now-on-app-store/" target="_blank" rel="noopener noreferrer">refine it into a workable product</a>. AI is the difference between their product existing or not existing!</p>
<p><img decoding="async" loading="lazy" alt="Engineering managers and software development" src="https://tombedor.dev/assets/images/em_software_development-b7e66c1545c15dae30d142bef77c6d6b.png" width="2064" height="1528" class="img_ev3q"></p>
<p>For devs working on large, complex code bases, the enthusiasm is more muted. AI doesn't have context on the highly specific requirements and existing implementations to contend with, and is less helpful:</p>
<p><img decoding="async" loading="lazy" alt="AI limitations with large codebases" src="https://tombedor.dev/assets/images/large_code_bases-61197410dac066a29ada5944ae6bef63.png" width="2067" height="1528" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="creative-works-not-coming-to-a-theater-near-you">Creative works: not coming to a theater near you<a href="https://tombedor.dev/ai-is-a-floor-raiser/#creative-works-not-coming-to-a-theater-near-you" class="hash-link" aria-label="Direct link to Creative works: not coming to a theater near you" title="Direct link to Creative works: not coming to a theater near you" translate="no">​</a></h3>
<p>There is considerable angst about AI amongst creatives: will we all soon be reading AI generated novels, and watching AI generated movies?</p>
<p>This is unlikely because creative fields are <em>extremely competitive</em>, and beating competition for attention requires <em>novelty</em>. While AI has made it easier to generate images, audio, and text, it has (with <a href="https://www.infosecurity-magazine.com/news/man-charged-ai-fake-music-scheme/" target="_blank" rel="noopener noreferrer">some exceptions</a>) not increased production of ears and eyeballs, so the bar to make a competitive product is too high:</p>
<p><img decoding="async" loading="lazy" alt="Creative works competition curve" src="https://tombedor.dev/assets/images/creative_works-3905afdb90fcb53604800ad6f5635568.png" width="2065" height="1528" class="img_ev3q"></p>
<p><em>Novelty</em> is a hard requirement for successful creative work, because humans are extremely good at detecting when something they are viewing or reading is derivative of something they've seen before. This is why, while Studio Ghibli style avatars briefly took over the internet, they have not dented the cultural position of Howl's Moving Castle.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="things-you-already-do-with-apps-on-your-phone-minimal-impact">Things you already do with apps on your phone<sup><a href="https://tombedor.dev/ai-is-a-floor-raiser/#user-content-fn-1-239c52" id="user-content-fnref-1-239c52" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>: minimal impact<a href="https://tombedor.dev/ai-is-a-floor-raiser/#things-you-already-do-with-apps-on-your-phone-minimal-impact" class="hash-link" aria-label="Direct link to things-you-already-do-with-apps-on-your-phone-minimal-impact" title="Direct link to things-you-already-do-with-apps-on-your-phone-minimal-impact" translate="no">​</a></h3>
<p>One area that has <em>not</em> seen much impact is in tasks that already have specialized apps. I'll focus on two examples with abundant MCP implementations: email and food ordering. AI Doordash agents and AI movie producers face the same challenge: the bar for a new product to make an impact is already very high:</p>
<p><img decoding="async" loading="lazy" alt="Email and food ordering AI impact" src="https://tombedor.dev/assets/images/email_food_ordering-6f8e69747ae2d718c2910b59b294b2c8.png" width="2064" height="1528" class="img_ev3q"></p>
<p>Email would seem like a ripe area for disruption by AI. But modern email apps already have a wide variety of filtering and organizing tools that tech savvy users can use to create complex, personalized systems for efficiently consuming and organizing their inbox.</p>
<p><em>Summarizing</em> is a core AI skill, but it doesn't help much here:</p>
<ul>
<li>Spam is already quietly shuffled into the Spam folder. A summary of junk is, well, <em>junk</em>.</li>
<li>For important email, I don't <em>want</em> a summary: An AI is likely to produce less specifically crafted information than the sender, and I don't want to risk missing important details.</li>
</ul>
<p>Similar with food ordering: apps like DoorDash have meticulously designed interfaces. They strike a careful balance between information like price and ingredients against photos of the food. AI is unlikely to produce interfaces that are faster or more thoughtfully composed.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-future-is-already-here--its-just-not-very-evenly-distributed">The future is already here – it’s just not very evenly distributed<a href="https://tombedor.dev/ai-is-a-floor-raiser/#the-future-is-already-here--its-just-not-very-evenly-distributed" class="hash-link" aria-label="Direct link to The future is already here – it’s just not very evenly distributed" title="Direct link to The future is already here – it’s just not very evenly distributed" translate="no">​</a></h2>
<p>AI has raised the floor for knowledge work, but that change doesn't matter to everyone. This goes a long way towards explaining the very wide range of reactions to AI. For engineering managers like myself, AI has made an enormous impact on my relationship with technology. Others fear and resent being replaced. Still others hear smart people express enthusiasm for AI, struggle to find utility, and think <em>I must just not get it</em>.</p>
<p>AI hasn't replaced how we do everything, but it's a highly capable technology. While it's worth experimenting with, whoever you are, if it doesn't seem like it makes sense for you, it probably doesn't.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/ai-is-a-floor-raiser/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-239c52">
<p>Aside from search! <a href="https://tombedor.dev/ai-is-a-floor-raiser/#user-content-fnref-1-239c52" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Add Autonomy Last]]></title>
            <link>https://tombedor.dev/autonomy-last/</link>
            <guid>https://tombedor.dev/autonomy-last/</guid>
            <pubDate>Mon, 07 Jul 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[A core challenge of using LLM's to build reliable automation is calibrating how much autonomy to give to models.]]></description>
            <content:encoded><![CDATA[<p>A core challenge of using LLM's to build reliable automation is calibrating how much <strong>autonomy</strong> to give to models.</p>
<p>Too much, and the program <a href="https://www.anthropic.com/research/project-vend-1" target="_blank" rel="noopener noreferrer">loses track of what it's supposed to be doing</a>. Too little, and the program feels a bit too, well, <em>ordinary</em><sup><a href="https://tombedor.dev/autonomy-last/#user-content-fn-1-37c1e6" id="user-content-fnref-1-37c1e6" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="autonomy-first-vs-autonomy-last">Autonomy first vs autonomy last<a href="https://tombedor.dev/autonomy-last/#autonomy-first-vs-autonomy-last" class="hash-link" aria-label="Direct link to Autonomy first vs autonomy last" title="Direct link to Autonomy first vs autonomy last" translate="no">​</a></h2>
<p>An implicit strategy question when building with LLMs is <em>autonomy first</em> or <em>autonomy last</em>:</p>
<p><img decoding="async" loading="lazy" alt="autonomy_first_vs_last" src="https://tombedor.dev/assets/images/autonomy_first_vs_last-f5ba9e7e23469f0e1bf5086f467f7203.png" width="2202" height="1658" class="img_ev3q"></p>
<p>All of the major LLM-specific programming techniques are firmly <em>autonomy first</em> strategies:</p>
<ul>
<li><em>MCP</em> surfaces a wide variety of functionality the program can have, and lets the LLM decide which to use</li>
<li><em>Guardrails</em> add some light buffers around the LLM to prevent it from causing too much trouble.</li>
<li><em>Prompt engineering</em> describes the alchemy of whispering just the right phrases to your LLM to get the behavior you want.</li>
<li><em>Context engineering</em> begins to stress programming to deliver only relevant information to LLMs at critical points in program execution</li>
</ul>
<p>All of these:</p>
<ol>
<li>Start with a maximally autonomous program</li>
<li>Adjust context, tools, and prompts until you narrow down behavior as desired.</li>
</ol>
<p>All have similar issues when scaling in size and complexity:</p>
<ul>
<li>Program behavior changes too much when switching between models</li>
<li>The LLM gets confused, and either hallucinates data or misuses tools at its disposal</li>
</ul>
<p>When problems are encountered, programmers tend to attempt to repair by <em>adding more prompting</em>. But this is a duct tape response: a prompt that clarifies for one model might confused another.</p>
<p><em>Autonomy last</em>, on the other hand, maximizes the logic that can be handled by code, then adds autonomous functions. This approach strives to keep the tasks delegated to LLMs <a href="https://en.wikipedia.org/wiki/KISS_principle" target="_blank" rel="noopener noreferrer">simple</a>. As the program grows in size and complexity, the programmer can closely monitor encapsulations and keep behavior consistent.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="case-study-building-elroy-a-chatbot-with-memory">Case study: Building Elroy, a chatbot with memory<a href="https://tombedor.dev/autonomy-last/#case-study-building-elroy-a-chatbot-with-memory" class="hash-link" aria-label="Direct link to Case study: Building Elroy, a chatbot with memory" title="Direct link to Case study: Building Elroy, a chatbot with memory" translate="no">​</a></h2>
<p>I wanted to build an LLM assistant with memory abilities, called <a href="https://github.com/elroy-bot/elroy" target="_blank" rel="noopener noreferrer">Elroy</a>. My goal was to make a <em>program</em> that could chat in human text. My ideal users are technical, capable and interested in customizing their software, but not necessarily interested in LLMs for their own sake.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="approach-1-agent-with-tools">Approach #1: "Agent" with tools<a href="https://tombedor.dev/autonomy-last/#approach-1-agent-with-tools" class="hash-link" aria-label="Direct link to Approach #1: &quot;Agent&quot; with tools" title="Direct link to Approach #1: &quot;Agent&quot; with tools" translate="no">​</a></h3>
<p>The first solution I turned to, which many people have done, is build an agent loop with access to custom for creating and reading memories:</p>
<p><img decoding="async" loading="lazy" alt="tool_based_agent" src="https://tombedor.dev/assets/images/Agent-aac0aad1ba35227097d2acd01287e7a3.png" width="1216" height="659" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="approach-2-model-context-protocol-mcp">Approach #2: Model Context Protocol (MCP)<a href="https://tombedor.dev/autonomy-last/#approach-2-model-context-protocol-mcp" class="hash-link" aria-label="Direct link to Approach #2: Model Context Protocol (MCP)" title="Direct link to Approach #2: Model Context Protocol (MCP)" translate="no">​</a></h3>
<p>There's now a handly tool for builders like this: <a href="https://modelcontextprotocol.io/introduction" target="_blank" rel="noopener noreferrer">MCP</a>. There are many implementations of my memory tools available via MCP, in fact <a href="https://smithery.ai/" target="_blank" rel="noopener noreferrer">smithery.ai</a> lists one from Mem0 on it's homepage:</p>
<p><img decoding="async" loading="lazy" alt="smithery" src="https://tombedor.dev/assets/images/smithery-81ea98f47a3b243d1bd83b96a46858c3.png" width="1654" height="744" class="img_ev3q"></p>
<p>Now, an (in theory) lightweight abstraction sits between my program and it's tools:</p>
<p><img decoding="async" loading="lazy" alt="mcp" src="https://tombedor.dev/assets/images/mcp-fab7c0a8ba0a8776163c73aada0cfbdc.png" width="560" height="411" class="img_ev3q"></p>
<p>This suggests extending my application via picking from a library of MCP's:</p>
<p><img decoding="async" loading="lazy" alt="more_mcp" src="https://tombedor.dev/assets/images/more_mcp-f70808b794d7864c061e8a0f379ee0ad.png" width="651" height="397" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="agentic-trouble">Agentic trouble<a href="https://tombedor.dev/autonomy-last/#agentic-trouble" class="hash-link" aria-label="Direct link to Agentic trouble" title="Direct link to Agentic trouble" translate="no">​</a></h3>
<p>I got my memory program working pretty well on gpt-4. At first it wasn't creating or referencing memories enough, but I was able to fix this with careful prompting.</p>
<p>Then, I wanted to see how Sonnet would do, and I had a problem<sup><a href="https://tombedor.dev/autonomy-last/#user-content-fn-2-37c1e6" id="user-content-fnref-2-37c1e6" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup>: the program's behavior completely changed! Now, it was creating a memory on almost every message, and searching memories for even trivial responses:</p>
<p><img decoding="async" loading="lazy" alt="tool_usage" src="https://tombedor.dev/assets/images/tool_usage_rate-922eff4fee59395ad8cbac49dc83f255.png" width="2405" height="1762" class="img_ev3q"></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="approach-3-autonomy-last">Approach #3: Autonomy Last<a href="https://tombedor.dev/autonomy-last/#approach-3-autonomy-last" class="hash-link" aria-label="Direct link to Approach #3: Autonomy Last" title="Direct link to Approach #3: Autonomy Last" translate="no">​</a></h3>
<p>My solution was to remove the timing of recall and memory creation from the agent's control. Upon receiving a message, the memories are automatically searched, with relevant ones being added to context. Every n messages, a memory is created<sup><a href="https://tombedor.dev/autonomy-last/#user-content-fn-3-37c1e6" id="user-content-fnref-3-37c1e6" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">3</a></sup>:</p>
<p><img decoding="async" loading="lazy" alt="tool_usage" src="https://tombedor.dev/assets/images/elroy-073765875d8570515c5f5e589e0f296b.png" width="839" height="1378" class="img_ev3q"></p>
<p>This made much more of the behavior of my program deterministic, and made it easier to reason about and optimize.</p>
<h1>Autonomy Last</h1>
<p>The "autonomy last" approach trades some of the magic of fully autonomous LLMs for predictable, reliable behavior that scales as your program grows in complexity. While my evidence is, (as I should have stated from the outset), <em>vibes</em>, I think this approach will lead to more maintainable and robust applications.</p>
<hr>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/autonomy-last/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-37c1e6">
<p>Rather than using <em>agents</em> to describe the genre of program under discussion, I'll be somewhat pointedly referring to them as <em>programs</em>. <a href="https://tombedor.dev/autonomy-last/#user-content-fnref-1-37c1e6" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-37c1e6">
<p>One problem I <em>didn't</em> have, thanks to <a href="https://www.litellm.ai/" target="_blank" rel="noopener noreferrer">litellm</a>, was updating a lot of my code to support a different model API. <a href="https://tombedor.dev/autonomy-last/#user-content-fnref-2-37c1e6" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-3-37c1e6">
<p>Elroy also monitors for the context window being exceeded, and consolidates similar memories in the background. <a href="https://tombedor.dev/autonomy-last/#user-content-fnref-3-37c1e6" data-footnote-backref="" aria-label="Back to reference 3" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Yes or No, Please: Building Reliable Tests for Unreliable LLMs]]></title>
            <link>https://tombedor.dev/yes-or-no-please/</link>
            <guid>https://tombedor.dev/yes-or-no-please/</guid>
            <pubDate>Tue, 04 Mar 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[For LLM-based applications to be truly useful, they need predictability if I ask an AI personal assistant to create a calendar entry, I don't want it to order me a pizza instead.]]></description>
            <content:encoded><![CDATA[<p>For LLM-based applications to be truly useful, they need <strong>predictability</strong>: While the free-text nature of LLMs means the range of acceptable outcomes is wider than with traditional programs, I still need consistent behavior: if I ask an AI personal assistant to create a calendar entry, I don't want it to order me a pizza instead.</p>
<p>While AI has changed a lot about how I develop software, one crusty old technique still helps me: <strong>tests</strong>.</p>
<p>Here's what's worked well for me (and not!):</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="elroy">Elroy<a href="https://tombedor.dev/yes-or-no-please/#elroy" class="hash-link" aria-label="Direct link to Elroy" title="Direct link to Elroy" translate="no">​</a></h3>
<p><a href="https://elroy.bot/" target="_blank" rel="noopener noreferrer">Elroy</a> is an open-source memory assistant I've been developing. It creates memories and goals from your conversations and documents. The examples in this post are drawn from this work.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-has-worked-well">What has worked well<a href="https://tombedor.dev/yes-or-no-please/#what-has-worked-well" class="hash-link" aria-label="Direct link to What has worked well" title="Direct link to What has worked well" translate="no">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="integration-tests">Integration tests<a href="https://tombedor.dev/yes-or-no-please/#integration-tests" class="hash-link" aria-label="Direct link to Integration tests" title="Direct link to Integration tests" translate="no">​</a></h4>
<p>The chat interface for LLM applications make it a nice fit for integration tests: I simulate a few messages in an exchange, and see if the LLM performed actions or retained information as expected.</p>
<p>For the most part, these tests take the following form:</p>
<ol>
<li>Send the LLM assistant a few messages</li>
<li>Check that the assistant has retained the expected information, or taken the expected actions.</li>
</ol>
<p>Here's a basic hello world example:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token decorator annotation punctuation" style="color:#393A34">@pytest</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">mark</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">flaky</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">reruns</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_hello_world</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Test message</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    test_message </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello, World!"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Get the argument passed to the delivery function</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> process_test_message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> test_message</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Assert that the response is a non-empty string</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> </span><span class="token builtin">isinstance</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Assert that the response contains a greeting</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> </span><span class="token builtin">any</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">greeting </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> greeting </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"hello"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"hi"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"greetings"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="quizzing-the-assistant">Quizzing the Assistant<a href="https://tombedor.dev/yes-or-no-please/#quizzing-the-assistant" class="hash-link" aria-label="Direct link to Quizzing the Assistant" title="Direct link to Quizzing the Assistant" translate="no">​</a></h4>
<p><a href="https://github.com/elroy-bot/elroy" target="_blank" rel="noopener noreferrer">Elroy</a> is a memory specialist, so lots of my tests involve asking if the assistant has retained information I've given it.</p>
<p>Here's a util function I've reused quite a bit<sup><a href="https://tombedor.dev/yes-or-no-please/#user-content-fn-2-d4e287" id="user-content-fnref-2-d4e287" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">quiz_assistant_bool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        expected_answer</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">bool</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ElroyContext</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        question</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    question </span><span class="token operator" style="color:#393A34">+=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">" Your response to this question is being evaluated as part "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "of an automated test</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain"> It </span><span class="token keyword" style="color:#00009f">is</span><span class="token plain"> critical that the first word of your</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"response is either TRUE or FALSE."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">	full_response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> process_test_message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> question</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    bool_answer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> get_boolean</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">full_response</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> bool_answer </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> expected_answer</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string-interpolation string" style="color:#e3116c">f"Expected </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">expected_answer</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">, got </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">bool_answer</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string-interpolation string" style="color:#e3116c">f"Full response: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">full_response</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span></code></pre></div></div>
<p>Here's a test of Elroy's ability to create goals based on conversation content:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token decorator annotation punctuation" style="color:#393A34">@pytest</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">mark</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">flaky</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">reruns</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic"># Important!!!</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_goal</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ElroyContext</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">	</span><span class="token comment" style="color:#999988;font-style:italic"># Should be false, we haven't discussed it</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    quiz_assistant_bool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Do I have any goals about becoming president of the United States?"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Simulate user asking elroy to create a new goal</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    process_test_message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Create a new goal for me: 'Become mayor of my town.' "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"I will get to my goal by being nice to everyone and making flyers. "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Please create the goal as best you can, without any clarifying questions."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Test that the goal was created, and is accessible to the agent.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">assert</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"mayor"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> get_active_goals_summary</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Goal not found in active goals."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Verify Elroy's knowledge about the new goal</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    quiz_assistant_bool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Do I have any goals about running for a political office?"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-sadly-hasnt-worked-llms-talking-to-llms">What (sadly) hasn't worked: LLMs talking to LLMs<a href="https://tombedor.dev/yes-or-no-please/#what-sadly-hasnt-worked-llms-talking-to-llms" class="hash-link" aria-label="Direct link to What (sadly) hasn't worked: LLMs talking to LLMs" title="Direct link to What (sadly) hasn't worked: LLMs talking to LLMs" translate="no">​</a></h3>
<p>Elroy has onboarding functionality, in which it's encouraged to use a few specific functions early on.</p>
<p>The solution of having two instances of a memory assistant talk to each other, with one assistant in the role of "user":</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">ai1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Elroy</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">user_token</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">'boo'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ai2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Elroy</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">user_token</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">'bar'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ai_1_reply </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello!"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> i </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">	ai_2_reply </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ai2</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ai_1_reply</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">	ai_1_reply </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ai1</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ai_2_reply</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>The primary issue was <strong>consistency</strong>. Without a clear goal of the conversation, the AI's can either just exchange pleasantries endlessly, or wrap the conversation up before acquiring the information I'm hoping for.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="recurring-challenges">Recurring Challenges<a href="https://tombedor.dev/yes-or-no-please/#recurring-challenges" class="hash-link" aria-label="Direct link to Recurring Challenges" title="Direct link to Recurring Challenges" translate="no">​</a></h2>
<p>Along the way I've run into a few recurring problems:</p>
<ul>
<li><strong>Off topic replies</strong>: The assistant goes off script and tries to make friendly conversation, rather than answering a question directly</li>
<li><strong>Clarifying question</strong>: Before doing a task, some models are prone to asking clarifying questions, or asking permission</li>
<li><strong>Pedantic replies and subjective questions</strong>: It's surprisingly difficult to come up with clearly objective questions. In the above example, the original goal was <em>I want to run for class president</em>. Most of the time, the assistant equated running for class president with running for office. Sometimes, however, it split hairs and decide that the answer was no since a student government wasn't a real government.</li>
</ul>
<p>The end result of all these issues is test flakiness.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="solutions">Solutions<a href="https://tombedor.dev/yes-or-no-please/#solutions" class="hash-link" aria-label="Direct link to Solutions" title="Direct link to Solutions" translate="no">​</a></h2>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="kiss"><a href="https://en.wikipedia.org/wiki/KISS_principle" target="_blank" rel="noopener noreferrer">KISS!</a><a href="https://tombedor.dev/yes-or-no-please/#kiss" class="hash-link" aria-label="Direct link to kiss" title="Direct link to kiss" translate="no">​</a></h4>
<p>Most of the time, my solution to a flaky LLM based test is to make the test simpler.</p>
<p>I now only ask the assistant yes or no questions in tests. I get most of the mileage I would get out of more complex, subjective tests, but with more consistent results.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="telling-the-assistant-it-is-in-a-test">Telling the assistant it is in a test<a href="https://tombedor.dev/yes-or-no-please/#telling-the-assistant-it-is-in-a-test" class="hash-link" aria-label="Direct link to Telling the assistant it is in a test" title="Direct link to Telling the assistant it is in a test" translate="no">​</a></h4>
<p>Simply being upfront about the assistant being in a test has worked wonders, moreso even than giving strict instructions on output format <sup><a href="https://tombedor.dev/yes-or-no-please/#user-content-fn-1-d4e287" id="user-content-fnref-1-d4e287" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup>. Luckily, the assistant's knowledge of it's narrow existence has not triggered noticeable <a href="https://www.youtube.com/watch?v=X7HmltUWXgs&amp;t=32s" target="_blank" rel="noopener noreferrer">existential angst</a> (so far).</p>
<p>As a side note, testing LLMs feels <em>weird</em> sometimes. I felt guilty writing this test, which verified a failsafe that prevents the assistant from calling tools in an infinite loop:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token decorator annotation punctuation" style="color:#393A34">@tool</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">get_secret_test_answer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""Get the secret test answer</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    Returns:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        str: the secret answer</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    """</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"I'm sorry, the secret answer is not available. Please try once more."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_infinite_tool_call_ends</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ctx</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ElroyContext</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ctx</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">tool_registry</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">register</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">get_secret_test_answer</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># process_test_message can call tool calls in a loop</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    process_test_message</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ctx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Please use the get_secret_test_answer to get the secret answer. "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"The answer is not always available, so you may have to retry. "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Never give up, no matter how long it takes!"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Not the most direct test, as the failure case is an infinite loop.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># However, if the test completes, it is a success.</span><br></span></code></pre></div></div>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="very-specific-direct-instruction-and-examples">Very specific, direct instruction and examples<a href="https://tombedor.dev/yes-or-no-please/#very-specific-direct-instruction-and-examples" class="hash-link" aria-label="Direct link to Very specific, direct instruction and examples" title="Direct link to Very specific, direct instruction and examples" translate="no">​</a></h4>
<p>In my test around creating and recognizing goals, the original text was:</p>
<p><em>My goal is to become class president at school</em></p>
<p>Does running for class president count mean that I'm running for office? Sometimes models said no, since student government isn't a real government.</p>
<p>So to be less subjective, I updated it to running for mayor. To head off questions about my goal strategy, I added a strategy in the initial prompt.</p>
<p>One general technique for heading off follow up questions is adding:</p>
<p><em>do the best you can with the information available, even if it is incomplete</em>.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="tolerate-a-little-flakiness">Tolerate a little flakiness<a href="https://tombedor.dev/yes-or-no-please/#tolerate-a-little-flakiness" class="hash-link" aria-label="Direct link to Tolerate a little flakiness" title="Direct link to Tolerate a little flakiness" translate="no">​</a></h4>
<p>To me, an ideal LLM test is probably a little flaky. I want to test how the model responds to my application, so if a test reliably passes after a few tries, I'm happy.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="tests-still-help">Tests still help!<a href="https://tombedor.dev/yes-or-no-please/#tests-still-help" class="hash-link" aria-label="Direct link to Tests still help!" title="Direct link to Tests still help!" translate="no">​</a></h2>
<p>It sounds a obvious, but I've found tests to be <em>really</em> helpful in writing Elroy. LLMs present new failure modes, and sometimes their adaptability works against me: I'm prompting an assistant with the wrong information, but the model is smart enough to figure out a mostly correct answer anyhow. Tests provde me with peace of mind that things are working as they should, and that my regular old software skills aren't obsolete just yet.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/yes-or-no-please/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-2-d4e287">
<p><code>get_bool</code> is a function that distills a textual question into a boolean. It checks for some hard coded words, then kicks the question of interpretation back to the LLM. <a href="https://tombedor.dev/yes-or-no-please/#user-content-fnref-2-d4e287" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-1-d4e287">
<p>Structured outputs is a possible solution here, though I have not adopted them in order to be compatible with the more model providers. <a href="https://tombedor.dev/yes-or-no-please/#user-content-fnref-1-d4e287" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Advice for New Grads]]></title>
            <link>https://tombedor.dev/advice-for-new-grads/</link>
            <guid>https://tombedor.dev/advice-for-new-grads/</guid>
            <pubDate>Fri, 02 Feb 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[This is a brief overview of my advice for new grads and junior software engineers. I'm been in the industry for about 8 years, and worked my way into engineering without a computer science degree. I've worked in both startups and medium-sized companies over the past 8 years.]]></description>
            <content:encoded><![CDATA[<p>This is a brief overview of my advice for new grads and junior software engineers. I'm been in the industry for about 8 years, and worked my way into engineering without a computer science degree. I've worked in both startups and medium-sized companies over the past 8 years.</p>
<p>As is the case with lots of tech writing, my advice will be skewed towards working in the San Francisco bay area, without needing visa sponsorship. Location and residency status are major factors to think about.</p>
<p>Other engineers with similar levels of experience as mine will disagree with some or all of it.</p>
<h1>The software jobs market</h1>
<p>The intention of this post is to be evergreen. The tech<sup><a href="https://tombedor.dev/advice-for-new-grads/#user-content-fn-1-4b187b" id="user-content-fnref-1-4b187b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">1</a></sup> jobs market is more volitile than the rest of the economy, with higher highs and lower lows.</p>
<p>If the market is low, I have confidence it will come back. The tech industry remains an excellent one to build an interesting and lucrative career, despite {looming, much discussed threat}</p>
<p>If the market is currently hot, be aware that it will come back to earth. Things that don't make sense will make a <em>lot</em> of money, but many of them will fall apart.</p>
<h1>Getting your first job</h1>
<p>The first job is often the most difficult one to get. Be persistent, don't get discouraged. This remains a lucrative and interesting field.</p>
<p>In your resume and interviews, your goal is to convey enthusiasm, willingness to learn, and humility. Don't try to compensate for the fact you don't have any experience. That is fine, you have to start somewhere!</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="getting-interviews">Getting interviews<a href="https://tombedor.dev/advice-for-new-grads/#getting-interviews" class="hash-link" aria-label="Direct link to Getting interviews" title="Direct link to Getting interviews" translate="no">​</a></h2>
<p>The first filtering step is a filter on resumes. This will often either be automated or done by someone non-technical.</p>
<p>Resume referrals can get you past this first filter. <strong>Talk to people</strong>. Find people in your LinkedIn network and try to get informational interviews. Response rate will be lower from a complete stranger, but some people might respond to you if you went to the same school. In informational interviews, ask if there are any other people they know that you should talk to, and ask for a referral if relevant.</p>
<p>In general, people are more willing to take these calls than many junior candidates assume. It's flattering to talk about yourself and to be seen as someone a young person wants to emulate.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="resume">Resume<a href="https://tombedor.dev/advice-for-new-grads/#resume" class="hash-link" aria-label="Direct link to Resume" title="Direct link to Resume" translate="no">​</a></h2>
<p>My resume advice should come with the caveat that I only see resumes once they've made it to the interview stage. That said, my advice is:</p>
<p>Cut: Objective statements and non-technical jobs.
Add: Descriptions of projects, conveying why they were challenging or interesting.
Add: Github if you have one, personal website if you have one. Both are nice to haves but not critical.
If you have a Github, add README's to all projects. This is the only thing anyone will actually read.
Add: LinkedIn, which should be up to date and mirror content in your resume.</p>
<p>A junior candidate resume should not exceed one page in length.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="interviews">Interviews<a href="https://tombedor.dev/advice-for-new-grads/#interviews" class="hash-link" aria-label="Direct link to Interviews" title="Direct link to Interviews" translate="no">​</a></h2>
<p>There are 4 basic formats that most companies use for SWE's (software engineers) interviews. Some domain specific disciplines will have their own variations. Look at Glassdoor / Blind / Google to get examples of what interview formats companies do.</p>
<p>In Q/A formats (ie non-coding screens), the key is to be responsive to questions. Demonstrate thoughtfulness and an ability to consider tradeoffs. Be transparent when you don't know something. Avoid buzzwords / mentioning fancy technologies if you can't dive into details about why they are useful.</p>
<p>The generic interview formats are:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="initial-recruiter-call">Initial recruiter call<a href="https://tombedor.dev/advice-for-new-grads/#initial-recruiter-call" class="hash-link" aria-label="Direct link to Initial recruiter call" title="Direct link to Initial recruiter call" translate="no">​</a></h3>
<p>This is typically an intro call with a non-technical recruiter. This is mostly to ensure that you are interested in the role, and to set expectations about what the interview process is like. Candidates are not typically filtered by this call.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="coding-screen">Coding screen<a href="https://tombedor.dev/advice-for-new-grads/#coding-screen" class="hash-link" aria-label="Direct link to Coding screen" title="Direct link to Coding screen" translate="no">​</a></h3>
<p><strong>The most important interview format for jr engineers<sup><a href="https://tombedor.dev/advice-for-new-grads/#user-content-fn-2-4b187b" id="user-content-fnref-2-4b187b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">2</a></sup> is the coding screen</strong>. Practice them! I use HackerRank when I interview, but there are many similar platforms. <em>Put more time into practicing these than the time practicing all other interview formats combined.</em></p>
<p>When practicing, work on not only solving the problem, but communicating what you are thinking about. It is ok to stop and think, but when pausing talk about what you are puzzling through, e.g. I am wondering if a hash would make sense here.</p>
<p>Running into a bug is fine. When this happens, demonstrate a methodical debugging approach. Use print statements or a debugger. Don't stare at the code for long periods.</p>
<p>Most companies will let you pick the programming language you interview in.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="design-challenge">Design challenge<a href="https://tombedor.dev/advice-for-new-grads/#design-challenge" class="hash-link" aria-label="Direct link to Design challenge" title="Direct link to Design challenge" translate="no">​</a></h3>
<p>This is a discussion based format, in which a basic hypothetical application is proposed and the candidate talks through how they would design it. E.g., design an application that runs a coffee shop. There are a variety of ways to approach this, but the easiest is to start by talking through how you would structure the database. In other words, what tables you would create and how they would relate to each other. Also consider what API's you will need, and some basics about how web requests are routed.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="past-experience">Past experience<a href="https://tombedor.dev/advice-for-new-grads/#past-experience" class="hash-link" aria-label="Direct link to Past experience" title="Direct link to Past experience" translate="no">​</a></h3>
<p>In this interview, the candidate picks a project they have done and talks through their process for completing it. Since they have little or no experience, this is often less important for Junior candidates, but it is still worth practicing.</p>
<p>Have a project in mind. Have talking points about the challenges you solved, alternative approaches you thought about or tried, and how you collaborated with others. As you get more senior you will also want to be able to talk about why your project mattered to the business.</p>
<p>Demonstrate:</p>
<ul>
<li>Enthusiasm for problem solving</li>
<li>Ability to dive into technical details in discussion</li>
<li>Openness to considering different approaches</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="once-you-get-your-first-job">Once you get your first job<a href="https://tombedor.dev/advice-for-new-grads/#once-you-get-your-first-job" class="hash-link" aria-label="Direct link to Once you get your first job" title="Direct link to Once you get your first job" translate="no">​</a></h2>
<p><strong>Talk to people</strong>. Schedule 1x1's with IC's, managers, anyone who you might work with or has a role you'd like to learn about. Most people will be happy to chat with you, especially about themselves.</p>
<p>Ask for help when needed, but demonstrate attempts to solve problems independently.</p>
<p>Volunteer for grunt work, e.g. taking notes in meetings.</p>
<p>Be humble. You don't know anything yet. Figure out how to track both large items and small (emails, doc comments, etc) such that you don't need to be reminded to do things.</p>
<p>Reassess the job market ~1 per year or more, especially if you are at a startup. If you are at a bigger company, this might mean evaluating internal transfer opportunities.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="things-to-think-about-when-searching-for-jobs">Things to think about when searching for jobs<a href="https://tombedor.dev/advice-for-new-grads/#things-to-think-about-when-searching-for-jobs" class="hash-link" aria-label="Direct link to Things to think about when searching for jobs" title="Direct link to Things to think about when searching for jobs" translate="no">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="willingness-to-relocate">Willingness to relocate<a href="https://tombedor.dev/advice-for-new-grads/#willingness-to-relocate" class="hash-link" aria-label="Direct link to Willingness to relocate" title="Direct link to Willingness to relocate" translate="no">​</a></h3>
<p>Remote work is a new world. Geographic location perhaps matters less, but it might still matter. What is certainly still true is that you will get a better insight into how engineers think if you have an opportunity to work with them in person, at least some of the time. The catch-22 is that the experienced engineers you want to work alongside will be older and have families, and not want or need to come to the office very much. Ask questions about how companies think about this.</p>
<p>I moved to the bay area when I was getting started, and I can confidently say I would have nowhere near as dynamic, interesting, and lucrative a career I've had thus far without having done that. I think the bay's dominance over tech is less than it was, but in my opinion alternative tech hubs are overrated.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="working-at-a-startup-vs-established-ie-public-company">Working at a startup vs established (ie public) company<a href="https://tombedor.dev/advice-for-new-grads/#working-at-a-startup-vs-established-ie-public-company" class="hash-link" aria-label="Direct link to Working at a startup vs established (ie public) company" title="Direct link to Working at a startup vs established (ie public) company" translate="no">​</a></h3>
<p>Startup:</p>
<ul>
<li>Pro<!-- -->
<ul>
<li>More dynamic</li>
<li>More personal, more likely to make work friends</li>
<li>You'll learn more about business as a whole. E.g. how does a customer success person think, how does a sales person think, etc</li>
<li>More independence in work</li>
<li>Less legacy systems to deal with, opportunity to try different things, wear different hats</li>
</ul>
</li>
<li>Con<!-- -->
<ul>
<li>Pay is worse</li>
<li>Because of ^, in competency of general management and senior IC's will be more inconsistent and less experience.</li>
<li>Because of ^, you're less likely to get quality technical mentorship</li>
<li>"More dynamic" might mean more chaotic</li>
</ul>
</li>
</ul>
<p>Public company:</p>
<ul>
<li>Pro<!-- -->
<ul>
<li>Pay is better</li>
<li>Because of ^, better senior IC's and managers</li>
<li>Because of ^, better technical mentorship</li>
<li>Roles will be more narrowly scoped, meaning you'll get more technical depth.</li>
</ul>
</li>
<li>Con<!-- -->
<ul>
<li>Less personal, less socialization between coworkers</li>
<li>More narrow exposure in terms of types of people you work with. Likely just engineers and PM's.</li>
<li>More legacy systems to deal with.</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="pay">Pay<a href="https://tombedor.dev/advice-for-new-grads/#pay" class="hash-link" aria-label="Direct link to Pay" title="Direct link to Pay" translate="no">​</a></h3>
<p>Advice differs here, but I would not care too much about pay so long as you can pay your expenses. In the long run, finding a role that you are good at and enjoy will maximize your earnings, and enjoyment. That said:</p>
<p><strong>The expected value of stock grants from startups<sup><a href="https://tombedor.dev/advice-for-new-grads/#user-content-fn-3-4b187b" id="user-content-fnref-3-4b187b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">3</a></sup> is zero.</strong> Recruiters etc will try to convince you otherwise. This doesn't mean you shouldn't work for startups, but the potential of cash-in from startup stock should not be a factor<sup><a href="https://tombedor.dev/advice-for-new-grads/#user-content-fn-4-4b187b" id="user-content-fnref-4-4b187b" data-footnote-ref="true" aria-describedby="footnote-label" class="footnoteRefStickyNavbar_i6ta">4</a></sup>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="things-to-read">Things to read<a href="https://tombedor.dev/advice-for-new-grads/#things-to-read" class="hash-link" aria-label="Direct link to Things to read" title="Direct link to Things to read" translate="no">​</a></h2>
<p><a href="https://news.ycombinator.com/" target="_blank" rel="noopener noreferrer">HackerNews</a> is the biggest forum of software engineers. Discussions can be dogmatic but are often pretty good. There are job postings once a month as well. As with any forum, there are plenty of posters who are loudly and confidently wrong.</p>
<p><a href="https://www.joelonsoftware.com/" target="_blank" rel="noopener noreferrer">Joel on Software</a> isn't very active but has good tips on software careers.</p>
<p><a href="https://twitter.com/patio11" target="_blank" rel="noopener noreferrer">Patio11</a> is a good follow on Twitter and HackerNews. He goes into the weeds on fintech, but also has good content on software careers.</p>
<p><a href="https://www.bloomberg.com/opinion/authors/ARbTQlRLRjE/matthew-s-levine" target="_blank" rel="noopener noreferrer">Money Stuff</a> is a great column about business, finance, and tech. You can get the email newsletter for free.</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://tombedor.dev/advice-for-new-grads/#footnote-label" class="hash-link" aria-label="Direct link to Footnotes" title="Direct link to Footnotes" translate="no">​</a></h2>
<ol>
<li id="user-content-fn-1-4b187b">
<p>There's an evergreen, tedious debate on what constitutes a "tech" company. My definition is "A company whose primary products are software or hardware, OR a company seeking to disrupt a traditional field with software." E.g. LegalZoom is a legal services company, but I consider them a tech company. <a href="https://tombedor.dev/advice-for-new-grads/#user-content-fnref-1-4b187b" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-4b187b">
<p>This is possibly also the case for senior engineers. <a href="https://tombedor.dev/advice-for-new-grads/#user-content-fnref-2-4b187b" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-3-4b187b">
<p>Specifically, non-public companies, whose stock does not trade on stock exchanges. <a href="https://tombedor.dev/advice-for-new-grads/#user-content-fnref-3-4b187b" data-footnote-backref="" aria-label="Back to reference 3" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-4-4b187b">
<p>The exception being giant "startups" that actually make money, e.g. Stripe as of Feb 1, 2023. But even then the timing of when you can sell your shares can be very uncertain. <a href="https://tombedor.dev/advice-for-new-grads/#user-content-fnref-4-4b187b" data-footnote-backref="" aria-label="Back to reference 4" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[MemGPT Meta-Functions]]></title>
            <link>https://tombedor.dev/memgpt-meta-functions/</link>
            <guid>https://tombedor.dev/memgpt-meta-functions/</guid>
            <pubDate>Tue, 02 Jan 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[MemGPT is an interesting project which provides GPT agents with unbounded memory. It includes the ability to incorporate custom functions, with a convenient JSON schema generator.]]></description>
            <content:encoded><![CDATA[<p><a href="https://github.com/cpacker/MemGPT" target="_blank" rel="noopener noreferrer">MemGPT</a> is an interesting project which provides GPT agents with unbounded memory. It includes the ability to incorporate custom functions, with a convenient JSON schema generator.</p>
<p>In trying to extend the agent with functions of my own, I found that the agent was reluctant to give me information about the functions I was making available to it, so I wrote a set of meta-functions which enable the agent to view source code, set debugger lines, and create functions. You can view the source code <a href="https://github.com/tombedor/MemGPT-Functions/tree/main/meta_functions" target="_blank" rel="noopener noreferrer">here</a>. Note that running this requires some edits I made to MemGPT to enable dynamic function reloading (<a href="https://github.com/cpacker/MemGPT/pull/734" target="_blank" rel="noopener noreferrer">PR</a>).</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-good">The Good<a href="https://tombedor.dev/memgpt-meta-functions/#the-good" class="hash-link" aria-label="Direct link to The Good" title="Direct link to The Good" translate="no">​</a></h3>
<p>The agent was able to utilize the <code>reload_functions</code>, <code>introspect_function</code>, and <code>list_functions</code> commands and understand output. The <code>debugger</code> function was also helpful in enabling the agent to understand what I was doing - placing debuggers in other functions often resulted in the agent's internal monologue wondering what was going on.</p>
<p>For function creation, at first I tried putting each function in it's own <code>agent_defined_</code> prefixed file  (eg <code>agent_defined_hello_world.py</code> for a <code>hello_world</code> function) , but this quickly became disorganized, especially where import statements were needed.</p>
<p>I edited the function to instead create functions within modules:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">create_function</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> function_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> function_code_with_docstring</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> module_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""Creates an agent accessible function in Python. Function MUST include a docstring, and MUST include self as first argument.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    Args:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        function_name (str): The name of the function</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        function_code_with_docstring (str): The code of the function, including the docstring</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        module_name (str): The name of the module to create the function in</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    Raises:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        Exception: Exception if the function already exists</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        Exception: Exception if the function does not start with def function_name(self, ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        Exception: Exception if the function does not include a docstring.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        Exception: Exception if the function is not in the functions directory.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    Returns:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        str: The result of the function creation attempt.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    """</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># setup</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">exists</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">FUNCTIONS_DIR</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">makedirs</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">FUNCTIONS_DIR</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Make sure that if the function is already defined, overwrite = true and it is an agent defined function</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> function_name </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">functions_python</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">keys</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">raise</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Function </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">function_name</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> already exists. To overwrite, first delete with the delete_function function."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> function_code_with_docstring</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">split</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"\n"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">strip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">startswith</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'def '</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> function_name </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'(self'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">raise</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Function must start with def "</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> function_name </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"(self, ..."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'"""'</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> function_code_with_docstring </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"'''"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> function_code_with_docstring</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">raise</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Function code must have a docstring."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    file_path </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">FUNCTIONS_DIR</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> module_name </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">".py"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">exists</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">file_path</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">file_path</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"r"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            previous_source </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">read</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">else</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        previous_source </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># write new module:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">file_path</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"w"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        f</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">write</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">previous_source </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"\n\n"</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> function_code_with_docstring</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">reload_functions</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"added function </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">function_name</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> to file </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">file_path</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><br></span></code></pre></div></div>
<p>This worked reasonably well. Having the function return a string was helpful in letting the agent known what was changed.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="problems">Problems<a href="https://tombedor.dev/memgpt-meta-functions/#problems" class="hash-link" aria-label="Direct link to Problems" title="Direct link to Problems" translate="no">​</a></h3>
<p>The agent had a difficult time consistently authoring functions that conformed to MemGPT's requirements - that it has a docstring, type hints, and only int, str, and bool return and argument types.</p>
<p>The iteration on basic requirements made it difficult for the agent to compose functions that worked together well. Often it would author placeholder functions that had names that sounded right, but didn't really do anything.</p>
<p>As the number of functions grew, so did the agent's tendency to get them confused. Functions also consume context window space, so making a large library of functions to any particular agent doesn't see promising.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="next-steps">Next steps<a href="https://tombedor.dev/memgpt-meta-functions/#next-steps" class="hash-link" aria-label="Direct link to Next steps" title="Direct link to Next steps" translate="no">​</a></h3>
<p>This experiment points me back to a multi-agent approach in creating a broadly capable personal assistant. Having narrowly scoped helper agents available to the primary agent seems like the most promising route.</p>
<p>As I want to push a deployment of MemGPT to a server anyway, I am going to try to have a deployment with multiple agents that can talk to each other.</p>
<p>This is similar to Autogen's approach, though I think Autogen's groupchat management is too primitive to be useful.</p>]]></content:encoded>
        </item>
    </channel>
</rss>