At the start of 2020 we didn’t know if it would be possible to collect images of the required quality for an AI based approach to work. What we’ve found is that, yes, it can work. We built a system to collect high resolution images, label them and trained a model for identifying invasive from survey images. The model is OK, it’s not great yet. Mainly because we weren’t able to collect as much training data as we probably need and we need to spend more time preprocessing data and applying other models in parallel to a neural network. I feel confident now that we have a clear path for developing and deploying this technology in practice.
This post goes through what we did during the project. The results of the project are important as collecting images from a moving car and then using them to identify invasive plant species would be a great way to quickly survey highways for their presence. It would also be less disruptive and safer.
First Some Background
Invasive plant species are a threat to native wildlife and require close management. Keen-AI and UKCEH have developed a platform for collecting footage along roadsides, and have created an AI model for identifying invasive species in these images. Throughout 2020 the team were out collecting images from a car and on foot. This data was then used to train deep neural networks able to identify invasive plants from images.
We used data from the North Wales Trunk Road Agency (NWTRA) to target our collection of images. Initially we focussed on Japanese Knotweed, Himalayan Balsam and Rhododendron, which are all invasive plant species that have to be managed along roadsides.
We developed a hardware and software system able to collect high resolution images of the roadside from its mounted position on a car roof. However, despite having data on the locations of stands of invasive plants, we found it difficult to collect large volumes of images of our target invasive species. Examples of Japanese knotweed were especially hard to find with Himalayan balsam being more common. We adapted to this constraint by adding to our training data by additionally collecting images using a handheld camera and by focussing the early phase of the project on Himalayan balsam. Handheld pictures were used to train the model with images collected from the car-mounted system being held back and used to evaluate the performance of the model.
We used a Canon 5D Mark IV for the car-mounted survey and handheld images, some handheld images were also taken using a Nikon 5600D.
During the project, which ran from May 2020 to February 2021, we collected 89,489 images. 73,836 from a car, and 5,353 handheld. We also took opportunities to collect images using the same system onboard a train, collecting 10,028 images, and took additional images from the iRecord database (a total of 272) to help model training. Surveys covered a total of 576 miles of road and 150 miles of railway.
Once collected, we uploaded images to the web-based KAI Remote Sensing Platform. The KAI platform stores raw images, supports image labelling and the extracts survey metadata from images and logs (e.g. GPS location and time). Ecologists from the Centre for Ecology and Hydrology labelled survey images by drawing bounding boxes around regions of interest. To date 6,990 samples have been labelled.
Data Pipeline and Model Training
To ease labelling effort and to reduce training time we chose to train an image classifier for Himalayan Balsam. Python, Tensorflow and Keras were used. The model was trained using handheld images. To maintain resolution rather than compress the large images generated by the DSLRs they were split into patches that could be fed to the model. The aim of the model was to predict whether or not a given patch contained Himalayan Balsam.
Two surveys, Peterbrook and Kingsbury contained regions where Himalayan Balsam was seen. Images from these surveys were marked as containing HB or not. Each image was split into 16 patches. Each patch was then further classified as containing HB or not. These patches were passed to the classifier, which returned a probability score for whether or not the patch contained HB.
Initial Results: Peterbrook Survey
The Peterbrook survey contained 286 images with a mixture of HB and not HB. The 286 images were split into 4576 patches. The chart below illustrated the ROC curve at a patch level for various threshold values of P. It is possible to identify 80% of the patches containing HB at a roughly 30% false positive rate.
We want to determine whether a whole survey image contains HB. This is denoted by IMG_P. Assume an image is split into n patches and let p(i) denote the probability that the patch contains HB according to the model. IMG_P denotes the probability that HB is present somewhere in the image. Making the simplifying assumption that patches are independent, then:
P(Img contains HB) = 1 – (1-p(1)) * (1 – p(2)) * …. * (1 – p(n))
On the IMAGE level ROC curve, the index for the corresponding threshold IMG_P value is shown. We can get an 80% TP at the expense of between 25% – 30% FP rate. The code below selects the threshold at index 52 and then generates confusion matrix and image level bamboo plot. There are 3 groups or stands of HB along the route. All of them are identified even though not all images are correctly classified.
At the cutoff selected, 80% of true positives would be correctly classified. 25% of negatives would be classified as positive.
Validation of Results: Kingsbury Survey
The analysis appears to show the model working but the analysis is compromised: we’re fixing the cutoff to maximise the accuracy for this survey. In reality the cut off can’t be known ahead of time. To evaluate whether the approach works, we can take the cutoff chosen for Peterbrook apply to the Kingsbury survey and look at the results. In this case 84% of images containing HB are correctly classified whereas 52% of negative images are incorrectly classed as positive. This is a useful result for the following the reason: the vast majority of images collected when seeking INNS will be negative so using this model we can exclude 48% of these before presenting them to an ecologist. This saves time.
How can we improve the results?
The cutoff value is 95%. As a result any image with a P value of less than 95% will be classed as negative. The model is assigning a very high probability to many negatives, this suggests the patch classifier can be improved. The results could be improved by:
Larger more representative training set
Model was trained on images taken with a handheld camera. These are typically sharper and taken from closer up than a survey image. Also a greater proportion of the image contains Himalayan Balsam. Very few handheld images contained negatives, these were sourced from unused surveys.
Segmentation prior to classification
Survey images often contain regions of road, buildings, vegetation and parts of the survey car. Segmenting the image and only classifying regions containing vegetation should improve model accuracy.
Habitat classification prior to plant detection
Species prefer particular ecological habitats classifying both habitat and plant feeding to a higher level model may lead to improved accuracy overall.
Use a Recurrent Neural Network Architecture
Patches within images are not independent and images from a sequence of surveys images aren’t either. RNNs maintain and update an internal state that is used to process the next sample. This internal state can learn the conditional relationships between patches from the same image and images presented in order from a survey.