Barry update: A more sophisticated AI agent
Since we launched Barry, our project controls AI, back in June, we’ve shared several videos of it ‘thinking’ and chatting with users - but we haven’t revealed much about what’s going on ‘under the hood’ - until now, that is 😲
We introduced Barry in May this year, and we’ve been updating its capabilities ever since. I don’t often talk about the internals, but this week I’d like to highlight some of what goes on under the hood.
Barry has always been an “AI agent”, not just an LLM. In fact, the LLM that underpins Barry is interchangeable, and we often make tweaks and updates to how the LLMs are used. One core component of this agent architecture is the “planner”. This is an LLM that plans out a series of actions that are required to answer the question or task posed by the user. For example, an action may be to search the forecast for the details of certain activities, or a predecessor/successor network.
This is a massively complicated task, and until now we have been calling the planner once per question, and then executing its plan. This week, we made some changes that bring a massive improvement to the quality of responses you can see from Barry. The planner is now recursive, meaning that after executing each action, we ask it again what it should do next. One way we’ve already seen how this improves is when asking about network effects. Previously, the planner would have only been able to say “search for the successors of activities X,Y, and Z”. However, the planner does not know about all the activities in a schedule, so that search would be limited. Now, it is able to look at the successors of certain activities, reason about the impact, and follow the network in the next step of the loop.
Even more importantly, we’ve managed to do so without affecting the response time of simple questions — the planner knows when it’s done and it has sufficient data to answer a question.
The question was “how does a 1 month delay to CON-9800 affect the rest of the project?”
Before this change, we were getting a lot of hallucinations caused by the fact that the relevant data was not provided to the “frontend” LLM. See below:
(Yes, we deliberately created a terrible schedule so there is lots of room for improvement)
I’ve included some screenshots of our development environment that highlight the change in reasoning capability. There were more steps than those in the screenshots, but you can already see how the new planner reasons about the results of each step to generate the next step.
Notice how the planning agent also “talks to itself”. This provides a couple of advantages: we always have a reason for why it created a specific plan. This is very useful for debugging, but also in the future we expect much richer communication to happen between the planner agent (or agents?) and the LLM that generates the output that you see in the platform.
Notice how it is running some of the same commands again. This is ok, because we also instruct the planner to be thorough and err on the side of completeness, so it builds some redundancy in its plans. You or I would call it “being on the safe side”.
As you can see, Barry is now gathering much more information and data from the platform, and the resulting reasoning is getting better and better. You can expect us to continuously make improvements like this. This one is easy to explain, so I decided to open the hood and show the difference in outcome, but there are plenty more — we update Barry’s engine on average about once or twice a day to make it more powerful!
Why AI is more accurate than humans at probabilistic risk forecasting
Alan Mosca and Georgia Stillwell explore how machine learning surpasses human expertise in risk forecasting. They discuss the limitations of human judgment in traditional risk management, emphasizing overconfidence and inaccuracy in estimating probabilities.
What I wish I knew about ML infrastructure when I was a researcher
Join Carlos Ledezma, Product Manager at nPlan, on an exciting journey to optimize your Machine Learning projects. Learn how Docker can revolutionize your ML infrastructure, making code portable, reproducible, and easily deployable. Explore practical examples and expert insights to unlock the full potential of Docker in ML engineering. Dive into this blog now to elevate your career in Machine Learning!
We’re building a diverse team at the intersection of two male-dominated industries — here’s how you can too
Construction-tech is a man’s world — but it doesn’t have to be. People Ops Associate Freya McDonnell reveals how nPlan is trying to level the playing field.