AI speed reading: Recovering GPS Coordinates using Models Trained with Synthetic Data

18k Videos over 10 Years weren’t Assigned to an Asset

Our customer has been collecting visual data using drones and helicopters for over a decade. Unfortunately, much of it isn’t mapped to the asset it’s for; the only way of knowing what it is was to look at the footage. There is no EXIF data containing GPS coordinates either. So what can we do? How do we automatically assign each survey to the asset it belongs to?

Overlay Showing GPS Coordinates of the Aircraft

Luckily, most videos are captioned with an overlay showing the GPS coordinates of the aircraft when recording. We used a simple convolutional neural network to run Optical Character Recognition (OCR) able to “read” each digit, which we then convert to a GPS coordinate. Armed with this information we map to the tower closest to the aircraft.

Each GPS coordinate is made up of latitude and longitude coordinates. In total these are 14 digits. Of these, 2 do not change, leaving 12 to be recognised. We are aiming for a 90% accuracy for the entire GPS string, this implies that the accuracy on each individual digit has to be 99%. No publicly available model had this level of accuracy for our data set, so we trained our own model using synthetic data.

Using synthetic data to bootstrap an OCR model

The size and quality of data used to train an OCR model is often more important than the type of model or the choice of architecture. In the beginning, we didn’t have a great deal of time to create data manually by extracting crops and transcribing the coordinates they contained. What we could do was generate synthetic data, use this to train a model and then use the model itself to bootstrap the creation of real data with the correct labels.  

Using the camera maker’s font files the team produced synthetic data to train the model by creating various ‘fake’ GPS coordinated with differing backgrounds. The model was then trained on this data and its performance evaluated against a test set of real images that had been manually labelled.

Variations of ‘0’ Produced Using Synthetic Data

Incorporating background knowledge helps improve accuracy

A code crops out the area of the video containing the 10 longitude and latitude digits to feed them into the OCR model which individually analyses them. With a 96% accuracy of the model per individual digit, over the 10 digits, the model’s chance to identify each correctly reduces to 66%. Every digit needs to be correctly identified to map the video to the respective tower, therefore every improvement in accuracy hugely influences the model’s ability.

Tom worked on tailoring the model to the customer’s footage format. The real-life data made apparent that certain longitude and latitude digits are 2-3 times more common than others. He therefore worked on improving the model’s accuracy in recognising these numbers, based on their appearance frequency. This, combined with making the model biased towards these numbers, increased the accuracy.

Through using synthetic data to train and improve its character recognition, the model’s accuracy increased to 98%. With the model accurately identifying the longitude and latitude 82% of the time, the likelihood of the video being mapped to the correct tower is increased. 

Identifying and closing the gaps between real and synthetic data

At the risk of stating a truism; synthetic data is not the same as real data. It differs in ways that may not be obvious and / or are difficult to replicate. Models trained on synthetic data with this ‘real to synthetic variance’ will perform less well when exposed to real data. So one of the roles of a data scientist is to try and recognise these gaps and close them down during the generation process. Comparing the synthetic data to the real data we noticed the difference in noise and digit placement. We therefore increased the noise of the synthetic data backgrounds and slightly alternated the digit placement. Using this to train the model we achieved a higher model accuracy. 

Results and future work

The team trained a model which was able to reliably ‘read’ the GPS coordinates from a video, we then used these coordinates to determine what the closest tower was and attach the video to this asset. Although not entirely foolproof, for example near substations the helicopter can be closer to a tower different to the one it’s filming, the approach has worked really well. Of the over 18k videos we were able to map 14k.

About Sarah Bauroth

Sarah is passionate about sustainability, championing environmental causes to protect our natural world. She is excited by the potential of AI in supporting these goals.

More from

Sarah Bauroth

Sign Up for More!

Sign up for news, info, and to stay ahead of AI industry trends.