It’s easy to think that keeping a clean and up-to-date inventory of your assets is simple, but more often than not you’ll find that most inventories are full of incomplete or downright incorrect entries.
Detecting assets and components in places that the operator doesn’t expect is extremely valuable from a risk management and liability perspective and that’s exactly what our Asset Inventory Correction service aims to do.
We just processed over 200,000 images, using computer vision, in an effort to correct an inventory of almost 40,000 distinct assets for a client in the electricity transmission sector. We made nearly 320,000 predictions, and created over 42 iterations of our AI pipeline to get there; here’s what we learned along the way.
Problematic Data
How do you update thousands of records without going through each one? After all, that would require a gargantuan (and very tedious) effort of manually going through historical images – 200,000 of them! – and correcting each entry.
Things get even more complicated when you consider that each asset has multiple images, some taken at different times and some not good enough to be analysed. On top of that, you aren’t looking for just a single data point in each image: each asset has multiple attributes that must be logged.

At first glance, this is a clear-cut problem easily solvable by AI. With enough labelled images you can train a classifier to automatically label the rest of the dataset. We labelled a few thousand images and along the way, we developed a technique that allows us to reduce training data by up to 80%. Taking one of our classifiers as an example, the training results were good – 93% precision and 86% recall – but not where we wanted them to be. Granted, this includes a class that is extremely rare, which brings the average down. If we exclude it, the precision and recall become 97% and 98% respectively. But that’s just on the image level; we’re dealing with assets here.
Going from image to asset
Because each asset has multiple associated images, our classifier has to make, say, 5 predictions for each asset. How do you reconcile different predictions for the same asset?

We tried a bunch of different strategies:
- Taking the most confident prediction
- Taking the most frequent prediction
- Taking the prediction with largest average confidence
- Auxiliary regression model trained on additional metadata
Before applying these strategies we also implemented some quality filters: with each classification the model also gives a numerical score that indicates how “confident” the model is about its prediction. We excluded predictions that weren’t confident enough. In a similar vein, we also removed predictions that only occurred once or twice, for instance, if the model has made 4 predictions of class A, and only 1 for class D, then no matter how confident the D-prediction is we ignore it.

A new test dataset
These quality filters and aggregating strategies are not a panacea, but by tinkering with the threshold values and exploring different combinations we were able to maximise accuracy when testing against the test assets. This dataset was created by manually determining the classification for a number of assets and ensuring that it comprised of assets that represent the whole dataset as evenly as possible.
Model | Image test set | Asset test set |
---|---|---|
Classifier 1 | 9500 | 100 |
Classifier 2 | 9500 | 100 |
Object Detector 1 | 2100 | 220 |
Object Detector 2 | 2400 | 450 |
We ran each strategy for each class in each model and chose the one that yielded the best score against the asset test set; that’s how we ended up with over 40 iterations of various pipelines. As always there were edge cases that threw the model off; for example, some assets actually change entirely from one image to the next, since surveys were conducted at different times.
Ultimately, this accumulation of predictions led to even better results; we saw increases of up to 12% across the 4 models.
Rare assets pose a problem
Let’s take a step back and evaluate the models with the original goal in mind: reducing risk by identifying mistakes in the inventory. Across our models, the precision and recall both average to ~95%. That means less than 5% of assets are likely to be miscategorised by the AI, which sounds great, but what does it mean for assets that very rarely occur on the network/inventory?
The rarer the asset the more likely it is that the model’s prediction of it is actually a false positive. Let’s visualise this by simulating some data at different accuracy levels:
Accuracy | Frequency | Count | TP | FP | FN | TN |
---|---|---|---|---|---|---|
1 | 0.01 | 300 | 300 | 0 | 0 | 29700 |
0.95 | 0.01 | 300 | 285 | 1485 | 15 | 28215 |
0.90 | 0.01 | 300 | 270 | 2970 | 30 | 26730 |
0.85 | 0.01 | 300 | 255 | 4455 | 45 | 25245 |
0.80 | 0.01 | 300 | 240 | 5940 | 60 | 23760 |
True Positives (TP) = number correctly classified as belonging to the rare class
False Positives (FP) = number incorrectly classed as belonging to the rare class
False Negatives (FN) = number incorrectly classed as not rare
True Negatives TN) = number of non-rare correctly classed as non-rare
Accuracy = TP / (TP + FN)
In the example above, we have 30,000 assets in our inventory and each asset could have one of 2 types of components installed on it. Let’s imagine that 1% of these assets are fitted with a rare component; that means we have 300 “rare” assets.
A model with 95% accuracy will correctly flag 285 of the 300 and incorrectly flag 1,485 non-rare assets as rare. From a statistical point of view, missing 15 assets out of 30,000 does not sound like an issue but in reality, it can have implications for how safety and risk are managed by the operator.
As part of our asset inventory service, we manually review detections after the fact to ensure a clean record. It is much easier and quicker to approve/reject records once they’ve already been labelled by the AI. Continuing with our example, we would have to review 1,770 (285 + 1485) detections. For the effort of manually reviewing 1,770 instead of 30,000, we can recover 95% of the rare class.
The key takeaways are:
- for a rare asset detection the majority of detections will be false
- some of the rare assets will be missed but we can identify most of them by manually reviewing a fraction of what would be required to do this manually
- humans are not 100% accurate, particularly when reviewing hundreds of thousands or millions of images.
Asset inventory is never as simple as it sounds, but with some computer vision pipelines we’re able to correct large datasets at a fraction of the time and effort compared to the manual process.
Interested in find out what we can do for your asset inventory? Get in touch!