Skip to main content

Yes or No, Please: Building Reliable Tests for Unreliable LLMs

· 7 min read

For LLM-based applications to be truly useful, they need predictability: While the free-text nature of LLMs means the range of acceptable outcomes is wider than with traditional programs, I still need consistent behavior: if I ask an AI personal assistant to create a calendar entry, I don't want it to order me a pizza instead.

While AI has changed a lot about how I develop software, one crusty old technique still helps me: tests.

Advice for New Grads

· 9 min read

This is a brief overview of my advice for new grads and junior software engineers. I'm been in the industry for about 8 years, and worked my way into engineering without a computer science degree. I've worked in both startups and medium-sized companies over the past 8 years.

As is the case with lots of tech writing, my advice will be skewed towards working in the San Francisco bay area, without needing visa sponsorship. Location and residency status are major factors to think about.

Other engineers with similar levels of experience as mine will disagree with some or all of it.

The Questionable Value of the OpenAI GPT Store

· 3 min read

OpenAI launched its GPT Store this week. Brands and developers can create custom GPT's, either for sale or for free. Both have eagerly launched many GPT's, probably due to the relatively low overhead of creating them.

I am skeptical of the value. For brands, this feels like the AI equivalent of a service that sends postcards in response to an email. The access pattern and interface are more or less exactly the same as traditional apps or sites, only via a GPT. It's a neat trick, but I think users will quickly lose interest.

MemGPT Meta-Functions

· 4 min read

MemGPT is an interesting project which provides GPT agents with unbounded memory. It includes the ability to incorporate custom functions, with a convenient JSON schema generator.

In trying to extend the agent with functions of my own, I found that the agent was reluctant to give me information about the functions I was making available to it, so I wrote a set of meta-functions which enable the agent to view source code, set debugger lines, and create functions. You can view the source code here. Note that running this requires some edits I made to MemGPT to enable dynamic function reloading (PR).