nPlan's Barry

Barry update: A more sophisticated AI agent

Since we launched Barry, our project controls AI, back in June, we’ve shared several videos of it ‘thinking’ and chatting with users - but we haven’t revealed much about what’s going on ‘under the hood’ - until now, that is 😲

We introduced Barry in May this year, and we’ve been updating its capabilities ever since. I don’t often talk about the internals, but this week I’d like to highlight some of what goes on under the hood.

Barry has always been an “AI agent”, not just an LLM. In fact, the LLM that underpins Barry is interchangeable, and we often make tweaks and updates to how the LLMs are used. One core component of this agent architecture is the “planner”. This is an LLM that plans out a series of actions that are required to answer the question or task posed by the user. For example, an action may be to search the forecast for the details of certain activities, or a predecessor/successor network.

This is a massively complicated task, and until now we have been calling the planner once per question, and then executing its plan. This week, we made some changes that bring a massive improvement to the quality of responses you can see from Barry. The planner is now recursive, meaning that after executing each action, we ask it again what it should do next. One way we’ve already seen how this improves is when asking about network effects. Previously, the planner would have only been able to say “search for the successors of activities X,Y, and Z”. However, the planner does not know about all the activities in a schedule, so that search would be limited. Now, it is able to look at the successors of certain activities, reason about the impact, and follow the network in the next step of the loop.

Even more importantly, we’ve managed to do so without affecting the response time of simple questions — the planner knows when it’s done and it has sufficient data to answer a question.

The question was “how does a 1 month delay to CON-9800 affect the rest of the project?”

Before this change, we were getting a lot of hallucinations caused by the fact that the relevant data was not provided to the “frontend” LLM. See below:

The previous version was answering in a suboptimal way because it was not able to ground itself with sufficient data about the schedule network

(Yes, we deliberately created a terrible schedule so there is lots of room for improvement)

I’ve included some screenshots of our development environment that highlight the change in reasoning capability. There were more steps than those in the screenshots, but you can already see how the new planner reasons about the results of each step to generate the next step.

Step 1:

Notice how the planning agent also “talks to itself”. This provides a couple of advantages: we always have a reason for why it created a specific plan. This is very useful for debugging, but also in the future we expect much richer communication to happen between the planner agent (or agents?) and the LLM that generates the output that you see in the platform.

Step 2:

Notice how it is running some of the same commands again. This is ok, because we also instruct the planner to be thorough and err on the side of completeness, so it builds some redundancy in its plans. You or I would call it “being on the safe side”.

Answer:

As you can see, Barry is now gathering much more information and data from the platform, and the resulting reasoning is getting better and better. You can expect us to continuously make improvements like this. This one is easy to explain, so I decided to open the hood and show the difference in outcome, but there are plenty more — we update Barry’s engine on average about once or twice a day to make it more powerful!