Why AI is more accurate than humans at probabilistic risk forecasting
Alan Mosca and Georgia Stillwell explore how machine learning surpasses human expertise in risk forecasting. They discuss the limitations of human judgment in traditional risk management, emphasizing overconfidence and inaccuracy in estimating probabilities.
Part 1: Machines vs People
nPlan is in the business of forecasting risk on projects, by using machine learning to predict the outcomes of construction schedules. We spend a lot of time thinking about subjectivity, forecasting accuracy, schedule quality, and the way people think about uncertainty and artificial intelligence.
People often ask two questions during the early phases of an engagement with us:
- How can an algorithm be better than an expert at forecasting risk?
- How can your algorithm work on my unique project?
We’ll look at both these questions, providing insight from two angles: my technical experience and Georgia’s risk analysis background.
In part 1, we’ll address the first question.
How can an algorithm be better than an expert?
To understand why a machine could outperform project experts, we first need to know how professionals have an impact on a ‘traditional’ risk management process. Today, most professional bodies (including the IRM, ISO, Prince2, API) will prescribe a risk management processing using the following steps:
- Review (rinse and repeat)
We can start by looking at the team’s role and influence in the first two steps: identification and assessment.
This process is pretty much the same everywhere no matter the project size. From the small (£20M) to the mega (£10B+) schemes, this means that the project teams are potentially making the same mistakes over and over again. We believe this to be one of the reasons why we still see project delays, even though delays in projects have been ‘a thing’ for decades. The problem is that we are asking humans to identify risks, the likelihood of those risks occurring, and the impact of those risks if they materialise. This creates not just one, but three points of subjectivity for each risk item, and many stirling opportunities to mess up the inputs to a risk analysis.
But I’ve got the best tunnelling team ever, and they are estimating these risks, not guessing them!
This is probably as good a time as any to mention that humans are terrible at estimating probabilities and risk. In the book Superforecasting, the authors talk at length about several years of research into understanding what makes some people incredibly good at creating forecasts, whilst most are not better than randomly guessing outcomes. We recommend that if you are serious about risk and forecasting, you put this book at the top of your reading pile. The tl;dr of it is that it is all about estimating uncertainty, and doing so in a way that is consistent with your own track record of estimation. It turns out that, for example, experts are terrible at this, and are consistently overconfident and ultimately wrong. However, the way we appraise experts makes us vulnerable to confirmation bias, and we ultimately find signal in the noise by praising people for ultimately just guessing right many times in a row.
Uncertainty, Forecasting and Risk
When we talk about any events in the future, we must consider all three. They exist together and are forever entangled.
Risk, defined as broadly as possible, is the potential for deviation from an intended plan or result.
Forecasting is the act of producing a probabilistic view over any future events.
Uncertainty is the variability of any outcome, and is a direct consequence of Risk. It is also included as a parameter when producing forecasts.
The higher the Risk, the higher the Uncertainty, the wider the forecasts.
Human perception of probability
Humans, as we previously stated, are not good at estimating probabilities. Especially small probabilities. If something is 90% probable, we all tend to feel very secure about it, and are not able to internalise the fact that one in ten times that something will not occur. It is important to accept it as a fact, which comes from our own species’ evolution, that humans will always be overconfident forecasters.
Many of you may recall sitting in a risk workshop with a risk manager/analyst/engineer probing your mind for a probability of some sort of event like sprayed concrete lining collapse, as if you were meant to know that intuitively. “I’ve seen it once in the five major tunneling jobs I’ve worked on, so there’d probably be a one in five or 20% chance” is the best anyone can do here. The probability may vary, but that doesn’t make it any more accurate. With effects like group think and anchoring (and all other cognitive biases too), the team is only likely to adjust in relation to that starting 20%, which is another flaw in the way we make judgements. This process goes is repeated until we have assessed all the risks we guessed, and married them up with guessed probabilities and impacts.
These “probabilities” are then added to a monte carlo simulation, on which billion dollar decisions are made. Are you gawking? You should be. Big financial decisions are based on subjective inputs like this every day! Also in this process, we completely ignored all catastrophic risks as well as the low-likelihood high-impact risks which were too hard to estimate — the Black Swans, plus any others that were just unknown to the project team.
Machine Learning and Forecasting
Machine Learning, in short, is the act of taking data with known outcomes, following a certain set of algorithms, and producing a model that has somewhat internalised the nature of the problem at hand as a mathematical model. These models are used to create predictions about new data which was previously unseen.
ML models can also be created in such a way that they are producing forecasts, with the associated uncertainty, guaranteeing true calibration. A calibrated model is a model whose uncertainty is correct. So if a model says there is 10% probability of some event occurring, when we measure over all of the same predictions, that event happens 10% of the time.
The machine is your friend
If we now think about the process that professionals follow today, two issues appear:
- The teams of experts thinking about risk are doing so by coming up with a limited set of “potential” events, instead of thinking about all possible occurrences and taking and “end to end” view of (in nPlan’s case) project delay. It is guaranteed that there will be “risks” that have been missed.
- Humans are not good at forecasting because of their inherent overconfidence, which leads to traditional forecasting exercises to have severely underestimated risk.
We trained our ML models to be the Risk Manager’s friend. We trained on over half a million project schedules of projects ranging from datacentres to railways, and have taught our models to provide calibrated forecasts with accurate estimates of uncertainty (if you’ve read Superforecasting, we actually measure the Brier Score of our models).
Rather than competing with the Risk Manager, this approach empowers her with risk forecasts that are derived from data instead of opinions. This allows her then to focus on the actions that the model will never be able to take, which in our view are a far more valuable use of one’s time: the exploration, and the decisions. By being able to understand how schedule changes and progress updates affect the uncertainty of a project’s outcomes, we want to help everyone be proactive in reducing said uncertainty.
Once the uncertainty is below acceptable thresholds, we observe that activities stop interfering with each other and the schedule becomes “stable”. It is then truly effective to adopt the existing and well-known project management techniques such as Critical Path Methodology (CPM), and focus on improving the efficiency of the project, because we are no longer affected by wild changes.
We’re building a diverse team at the intersection of two male-dominated industries — here’s how you can too
Construction-tech is a man’s world — but it doesn’t have to be. People Ops Associate Freya McDonnell reveals how nPlan is trying to level the playing field.
The impact of extreme events on project delay
In early 2022, nPlan conducted a study on close to 500,000 schedules and found that around 80% of projects are delayed.
PERT distribution leads to overconfidence in outcomes
In my last post, I explored the phenomenon that while most activities on projects are delivered on time, most projects are delivered late. I introduced the concept of long-tails in distributions and how it’s the activities that reside in these long-tails that cause most of the delay in projects.
What I wish I knew about ML infrastructure when I was a researcher
Join Carlos Ledezma, Product Manager at nPlan, on an exciting journey to optimize your Machine Learning projects. Learn how Docker can revolutionize your ML infrastructure, making code portable, reproducible, and easily deployable. Explore practical examples and expert insights to unlock the full potential of Docker in ML engineering. Dive into this blog now to elevate your career in Machine Learning!