When U.S. defense forces rely on insights from machine learning (ML) models to determine the next best path forward for their missions, commanders require absolute confidence in their decision-making.
The U.S. Department of Defense (DoD) relies on accurate intelligence about the adversary’s assets—such as movements, personnel numbers, vehicles, and more—to determine near and long-term strategies. Building a robust database of information involves deploying a variety of technologies in parallel, but in these target identification efforts, many obstacles exist when ensuring that this data is trustworthy and actionable.
It is not uncommon for military targeting decisions to be made in real-time, in just seconds, so the accuracy of ML models deployed in these applications is vital in order to suitably reduce commanders’ risk. Too often, however, issues exist that can erode trust in the ML models deployed in target identification scenarios.
How ML models can misidentify targets
Object detection models are the types of ML models most commonly used for target identification. Predictions in object detection have two parts: a box that identifies where the object is located in the frame or image and a label that says what object is in that box. Therefore, the most obvious way an ML model can misidentify a target is to locate the object but assign an incorrect label.
Another common failure is a localization error, where a label is correct, but the box does not sufficiently identify the target.
It is also important for MLops teams to consider why an ML model would misidentify a target. Generally speaking, computer vision models can be thought of as a composition of two sub-components: a feature extractor and a task-specific component. The feature extraction component learns how to represent different aspects of the image, such as textures, colors, shapes, and more.
These image features are then used by the second component to perform the task, such as identifying key targets in an aerial image. So, when a model misidentifies a target, one of the reasons for this is that there is something about the extracted features of the image that causes the model to identify the incorrect target as the correct one. These corruptions could be visual, environmental, or adversarial in nature.
Successful ML models need to expect the unexpected
Data scientists often train their ML models against a set of ideal conditions. In terms of weather, it is difficult to prepare a model for a desert sandstorm when it has not been exposed to such conditions. This is also true of fog, rain, lens flare, and other visual obstructions that can hinder a model’s performance.
The training model might recognize an ideal sharp picture of a Humvee, but will it recognize a blurred one, or a target hidden by cloud cover in a split second?
A lot rides on this answer.
All of the above interruptions can occur suddenly and without much warning. If this coincides with significant activity the model is meant to observe—such as troop movement, missile deployment, or other changes—this activity can go undetected or lead to target misidentification. All it takes is one split second for an AI mission to fail catastrophically.
These missions are scenarios where even a 1% chance of failure may present an unnecessary risk to decision-makers and warfighters. After all, even if the correct target is identified 99 out of 100 times, that one failure might be the difference between lives lost and lives saved.
How VESPR Validate drives confidence from the battlefield to the boardroom
If relying on unvalidated models, commanders would be playing a dangerous game of guesswork. Misidentifying an innocent vehicle such as a bus as an enemy target can escalate into civilian fatalities and throw into question the validity of gathered intelligence about the adversary’s military capabilities. This would mark a significant blow to warfighter trust in AI and ML.
CalypsoAI’s independent testing and validation solution, VESPR Validate, provides stakeholders with the tools to measure their ML models’ ability to effectively identify targets in these high-stakes situations. VESPR Validate does this through performance testing—the test compares the model’s real-world predictions against what should have been predicted based on the labeled data that is provided.
Models are tested against a variety of corruptions and perturbations, including weather (such as rain, fog, and frost.); visual corruptions (such as blur, contrast, and lens flare); digital corruptions (image compression, Gaussian Noise, and more); and adversarial attacks.
To consider the use cases presented in this article, VESPR Validate can provide visibility to decision-makers on the following challenges: can an ML model detect sand-obscured Humvees? School buses? Can it distinguish one kind of military asset from the other when the images are blurred or lower in quality?
Knowing what the model is capable of—and, equally important, not capable of—before deployment can help DoD teams make more informed decisions and mitigate battlefield risk. These missions are too high-stakes to deploy models with any uncertainty about whether they will perform.