On June 22, the U.S. Department of Defense released its official “Responsible Artificial Intelligence Strategy and Implementation Pathway,” a foundational document that charts the department’s course toward Responsible AI (RAI) as well as how it seeks to embrace AI and maintain the U.S.’s place as a global leader in data and technology.
Supporting AI development is key to maintaining a competitive advantage in the global AI race. As the report states, “Our adversaries and competitors are investing heavily in AI and AI-enabled capabilities in ways that threaten global security, peace, and stability. To maintain our military advantage in a digitally competitive world, the [DoD] must embrace AI technologies to keep pace with these evolving threats.”
Ultimately, the DoD articulates that the desired end state for RAI is “trust.” At CalypsoAI, trust in AI has been our guiding principle, and the Department’s pathway makes clear that testing solutions for AI systems are a critical element in building that trust. Tools for testing, evaluation, verification, and validation (TEVV) can expedite deployments, provide reassurance of AI security, ensure stakeholders that their systems will perform as intended, and more—all building toward the Department-wide goal of RAI.
How TEVV supports the DoD’s tenets of “AI trust.”
The DoD has identified six “foundational tenets” of RAI. These are the elements by which trust in AI is achieved. They are listed as: RAI Governance, Warfighter Trust, Product and Acquisition Lifecycle, Requirements Validation, Responsible AI Ecosystem, and AI Workforce.
In particular, TEVV is highlighted as the most significant driving force behind warfighter trust and a critical piece of the product and acquisition lifecycle.
The DoD’s goal:
“Achieve a standard level of technological familiarity and proficiency for system operators to achieve justified confidence in AI capabilities and AI-enabled systems. Trustworthiness is bolstered by the application of TEVV frameworks that allow for the monitoring of system performance, reliability, unintended behavior, and failure modes before fielding the system and during operation. The combination of these factors contributes to a greater understanding of an AI’s capabilities and limitations, which will be critical for the development of an AI-ready force.”
Independent testing can provide performance visibility even to non-experts in the AI field, driving widespread confidence in the mission and the effectiveness of ML models deployed in critical environments. Through this tenet, AI only is deployed to production if the senior decision-makers, operators, and analysts have the confidence it works and will remain effective regardless of mission or environment.
Independent TEVV can also provide a significant advantage for the U.S. over its adversaries. The unreliability of AI systems has already been cited as a reason why other nations, such as Russia, are not deploying them in current war-fighting scenarios. If the U.S. is able to build the next generation of AI in such a fashion that it garners the trust of all stakeholders—from the battlefield to the boardroom—that will provide the DoD with a significant head start in the AI race.
Product and Acquisition Lifecycle
The DoD’s goal:
“Exercise appropriate care in the AI product and acquisition lifecycle to ensure potential AI risks are considered from the outset of an AI project, and efforts are taken to mitigate or ameliorate such risks and reduce the likelihood of unintended consequences while enabling AI development at the pace the Department needs to meet the National Defense Strategy.”
Commanders’ risk is a very important factor to consider when it comes to decision-making based on the findings of AI systems. It is important for there to be complete visibility throughout the process on how these AI solutions work, what risks they will face once deployed, and how resilient they are against unintended or unexpected conditions. Only when you consider all these possibilities—and, critically, test against them—can stakeholders be confident that these solutions are ready for the battlefield.
The DoD’s goal:
“Use the requirements validation process to ensure that capabilities that leverage AI are aligned with operational needs while assessing relevant AI risks. System performance requirements validation increases the reliability and safety of systems prior to and during deployment.”
Potential risks, vulnerabilities, and operational needs must be considered at every stage of the ML lifecycle. Independent TEVV acts as a facilitator in this process, constantly stress testing models and identifying vulnerabilities and risks that may need to be addressed through additional training. It also provides an automated tool to further test models following deployment, ensuring robustness as circumstances change or the system faces model drift.
Defense leaders need to remember that AI trust one day can lead to uncertainty the next day. As the strategy states, “RAI should not be viewed as a static end state where the use of an AI capability is designated as ‘responsible’ and never revisited. Instead, RAI centers around continuous oversight, moving beyond traditional performance metrics to include the aspects of workforce, culture, organization, and governance that affect how AI is implemented throughout the product lifecycle.”
What goes into RAI will continue to change and evolve, as will the obstacles and challenges that AI systems continue to face. The DoD must invest in the correct tools to drive AI innovation and provide confidence in deployment, and TEVV solutions are one far-reaching piece of the RAI puzzle. Embracing a forward-thinking AI policy is an important step, but it is just one piece of the puzzle. It will be important for the DoD to pair this policy with top-of-the-line technology to truly win the global AI race.
Also, you can read how CalypsoAI is creating independent TEVV solutions to build trust.