Skip to main content

This blog post is an excerpt from CalypsoAI’s comments submitted to the National Security Commission on Artificial Intelligence on October 23, 2020.

Effectively managing Artificial Intelligence (AI) and Machine Learning (ML) bias is a challenge and opportunity for government and industry, defined by how organizations interact to procure and develop AI/ML algorithms. AI/ML models are unlike any other previously developed software-based capability, and are not well served in traditional Federal Acquisition Regulation (FAR) based frameworks. FAR acquisition is a linear, requirements-based process. Contracting frameworks such as Fixed Price, Cost Plus, and T&M were designed to balance risk, informed by solution complexity. Traditional program management measures were designed to monitor cost, schedule, and performance risk. Whereas FAR and Non-FAR acquisition vehicles are designed for the purchase of objects such as planes or software, there is no acquisition process for an algorithm, a snapshot of computational logic to make objects function in new and better ways. Consideration of the lifecycle of the algorithms should be built into the acquisition decision process. For example, the inclusion of AI bias monitoring as part of a model-based and systems-based engineering process would allow AI performance to drive system requirements, design, analysis, verification, and validation activities beginning in the conceptual design phase and continuing throughout development and later life cycle phases.

ML modeling introduces the need to redefine and broaden performance management. For example, bias in ML models is one aspect of performance. Model bias is not inherently predictable or quantifiable in acquisition strategy planning, thus making a contracting strategy determination difficult. During model development, bias is not detectable when performance is only defined in terms of delivered software increments. Further, traditional program management measures can obscure, and even penalize the discovery and management of conditions detrimental to model effectiveness. Here, the government can work with industry to produce tools that automatically monitor for bias in procured software or let industry partners serve as independent auditors for software acquisitions. 

“Consideration of the lifecycle of the algorithms should be built into the acquisition decision process.”


Model bias is a specific category of model performance risk, which should be addressed as part of a holistic risk management strategy. This strategy is likely to differ from traditional approaches due to the stochastic and dynamic nature of ML. As models are trained (and retrained), oversight intended to curb unintentional introduction of bias should be pervasive and comprehensive – in that one particular method is likely not sufficient to detect all potential forms of bias. Industry should be resourced to provide continuous innovation towards bias monitoring, innovation that can be quickly realized into a ready-for-deployment form. Industry-academic partnerships that are arranged or funded by USG would greatly reduce the time to market for new techniques for bias mitigation. Detecting patterns in vast amounts of data is exactly what ML is designed to do. Explicitly testing for what patterns were identified in modeling is a new dimension of performance that goes beyond delivering code. ML models can have tens of neural layers acting on final results, some patterns are predictable and valid, while some are unanticipated, or worse, coincidental and not reflective of truth. These conditions are only detrimental to the overall veracity and validity of ML models and their efficacy in deployment.

“Partnership should reward overall model understanding, predictability, transparency, and trust.”


Given the complexity of ML models and the potential for bias, industry and government need to continuously collaborate on how to address it. The collaboration should take the form and function of independent model Testing, Evaluation, Verification, and Validation (TEV&V). Like human learning, TEV&V is a continuous activity designed to reveal what learning has taken place. Testing algorithms is an evolving capability comprising (but not limited to) bias detection, adversarial poisoning, noise injection, cross-validation, and hyper-parameter tuning. Collectively, these elements quantify bias and comprise a basis for ongoing industry/government collaboration and validation supporting AI/ML development. In the context of acquisition and program management, TEV&V should be incorporated into acquisition planning at every stage of the AI/ML lifecycle. Left to industry or government alone, TEV&V will not provide the level of rigor required to effectively manage bias, and this must be done as a partnership.