nPlan's Barry

Barry update: A more sophisticated AI agent

Since we launched Barry, our project controls AI, back in June, we’ve shared several videos of it ‘thinking’ and chatting with users - but we haven’t revealed much about what’s going on ‘under the hood’ - until now, that is 😲

We introduced Barry in May this year, and we’ve been updating its capabilities ever since. I don’t often talk about the internals, but this week I’d like to highlight some of what goes on under the hood.

Barry has always been an “AI agent”, not just an LLM. In fact, the LLM that underpins Barry is interchangeable, and we often make tweaks and updates to how the LLMs are used. One core component of this agent architecture is the “planner”. This is an LLM that plans out a series of actions that are required to answer the question or task posed by the user. For example, an action may be to search the forecast for the details of certain activities, or a predecessor/successor network.

This is a massively complicated task, and until now we have been calling the planner once per question, and then executing its plan. This week, we made some changes that bring a massive improvement to the quality of responses you can see from Barry. The planner is now recursive, meaning that after executing each action, we ask it again what it should do next. One way we’ve already seen how this improves is when asking about network effects. Previously, the planner would have only been able to say “search for the successors of activities X,Y, and Z”. However, the planner does not know about all the activities in a schedule, so that search would be limited. Now, it is able to look at the successors of certain activities, reason about the impact, and follow the network in the next step of the loop.

Even more importantly, we’ve managed to do so without affecting the response time of simple questions — the planner knows when it’s done and it has sufficient data to answer a question.

The question was “how does a 1 month delay to CON-9800 affect the rest of the project?”

Before this change, we were getting a lot of hallucinations caused by the fact that the relevant data was not provided to the “frontend” LLM. See below:

The previous version was answering in a suboptimal way because it was not able to ground itself with sufficient data about the schedule network

(Yes, we deliberately created a terrible schedule so there is lots of room for improvement)

I’ve included some screenshots of our development environment that highlight the change in reasoning capability. There were more steps than those in the screenshots, but you can already see how the new planner reasons about the results of each step to generate the next step.

Step 1:

Notice how the planning agent also “talks to itself”. This provides a couple of advantages: we always have a reason for why it created a specific plan. This is very useful for debugging, but also in the future we expect much richer communication to happen between the planner agent (or agents?) and the LLM that generates the output that you see in the platform.

Step 2:

Notice how it is running some of the same commands again. This is ok, because we also instruct the planner to be thorough and err on the side of completeness, so it builds some redundancy in its plans. You or I would call it “being on the safe side”.

Answer:

As you can see, Barry is now gathering much more information and data from the platform, and the resulting reasoning is getting better and better. You can expect us to continuously make improvements like this. This one is easy to explain, so I decided to open the hood and show the difference in outcome, but there are plenty more — we update Barry’s engine on average about once or twice a day to make it more powerful!

Written by

Alan Mosca

Posted on

14.8.2023

We’re building a diverse team at the intersection of two male-dominated industries — here’s how you can too

Construction-tech is a man’s world — but it doesn’t have to be. People Ops Associate Freya McDonnell reveals how nPlan is trying to level the playing field.

Diversity in Risk Management

As Pride Month 2022 draws to a close, nPlan Principal Risk Engineer Richard Bendall-Jones discusses the importance of diversity, equity and inclusivity to effective risk management

The impact of extreme events on project delay

In early 2022, nPlan conducted a study on close to 500,000 schedules and found that around 80% of projects are delayed.

PERT distribution leads to overconfidence in outcomes

In my last post, I explored the phenomenon that while most activities on projects are delivered on time, most projects are delivered late. I introduced the concept of long-tails in distributions and how it’s the activities that reside in these long-tails that cause most of the delay in projects.

Insights Pro

Portfolio

Schedule Integrity Checker

Our Story

Our AI

Events

Careers

nERD

nPlanet

News & Press

Barry update: A more sophisticated AI agent

We’re building a diverse team at the intersection of two male-dominated industries — here’s how you can too

Diversity in Risk Management

The impact of extreme events on project delay

PERT distribution leads to overconfidence in outcomes

Products

Solutions

About nPlan

Learn & Network

Legal

Insights Pro

Portfolio

Schedule Integrity Checker

Barry update: A more sophisticated AI agent

We’re building a diverse team at the intersection of two male-dominated industries — here’s how you can too

Diversity in Risk Management

The impact of extreme events on project delay

PERT distribution leads to overconfidence in outcomes

Products

Solutions

About nPlan

Learn & Network

Legal

Your cookie preferences