Andrew Spage | Director of Solutions Architecture at CalypsoAI
Have you ever wondered what it’s like to be the official who completely blew a call? Until a few years ago I was probably like many sports fans who got upset at bad or missed penalties, then I signed up to be a youth football official. I loved football as a kid and I wanted to give back to the sport that gave me so much.
Training for officials started in June and went on for months over the summer in preparation for the fall, and it’s comprised of two major components, classroom, and on-field learning. Knowing the rules is a big part, but as far as learning goes it’s really just memorizing the rule book. Some of the best, and most revealing learning took place when I was given circumstances or scenarios and asked what to do. On-field training, or “mechanics” is best described as reflex conditioning, knowing where to go, and learning how to watch the game differently, to develop a new approach to observing the game. The simplest example I could give is that officials don’t watch the ball sail through the air on kickoff, no rules to be broken up there. Officiating means watching the players on the field and their interactions.
With several months of practice and training, it was time to take the field. Nothing I learned, studied, imagined, or anticipated happened. I was on Mars, in a clown suit, with hundreds of angry locals upset about the rocket I landed on their kid’s school. Everything was unfamiliar. At the end of the game, I asked one of the veterans, when does this get easier? His answer, there will never be a game where you don’t see something for the first time.
Now, I asked a question at the onset that was intended as a metaphor. There are parallels between my experience as an official and what we see in AI/ML development and deployment. Obviously, I am not an algorithm, but my training experience was similar to how an algorithm might be developed. I was given a set of rules for governing play, and a limited number of circumstances to observe and apply them. The strangest part about it was changing focus. For an official, it’s not about a breakout run, it’s about getting to the right spot, and observing and interpreting how players act on one another relative to the rules. After my first few games, it was obvious to me that my training fell short of fully equipping me to do my job, the only way to become competent as an official is to officiate – If you have kids who are learning to drive, you know what I mean. At some point, you’re going to hand them the keys and live with the outcome. I am not suggesting we deploy AI and ML algorithms into production environments and hope for the best – quite the opposite. I am suggesting that we evaluate their performance, and more specifically, evaluate performance relative to the function and mission they’re supporting.
After a few games, the chaos became a little more manageable. Then one Saturday morning I went to a local park to officiate a set of games ranging from young grade-schoolers to junior high kids. As I walked to the fields from the parking lot I was greeted by a nice man who introduced himself as the league commissioner. Then he said something that no one wants to hear, “we have police protection for you, there was an incident with the officials and parents last week.” Like referees, observing, interpreting, and deciding are essential functions of AI or automation. Some AIs, such as a computer vision algorithm, might just observe. Others might observe and interpret, such as a facial recognition algorithm. And yet, others may turn observation and interpretation into action, like an autonomous vehicle. Continuously evaluating AI algorithms and their fitness for purpose has to be our goal. Certainty about how a model will perform in every circumstance is a function of potentially thousands of features and relationships, known and unknown, identified in training. I joined CalypsoAI because model insight is essential to not only deployment, but it’s the only way to realize the full potential of AI/ML-based solutions. As a Side Judge, I could look back on my performance and know what I needed to do to be better next time. Similarly, data scientists and mission owners can partner to identify collection needs and modeling approaches that best support their objectives, but the only way to know how a model will perform is testing, evaluation, verification, and validation. In all human and machine-based decision-making, hindsight is 20/20.
Mid-way through my second year I was working a 5th and 6th grader game. In the 3rd quarter, a running back took a hand-off and ran off tackle, he got the corner and before he made it 5 yards a cornerback hit him pretty hard. I saw the ball come out, and I saw the defender land on it. I ruled it a fumble and a turnover. When the pile of bodies untangled, the running back had the ball. He looked at me in disbelief, with tears in his eyes, and ran off the field. I’ve never stopped wondering if I got that call right. Seeing isn’t always believing, and believing is not always knowing. Decision-making is almost always predicated on prior knowledge and the circumstances of the decision. For example, automatically identifying a plane in overhead imagery accurately is a relative statement, and asserting what constitutes “accurately” outside of the decisions informed by the answer is inane. That is to say, certainty can be approached but likely not achieved. Adversaries obscure their movements to prevent detection. The inability to decide is not an option, robust AI and augmented decision-making tools must perform even when information is imperfect. That’s a pretty high bar.
I became an official to give back, giving back meant being fair, to protect the safety of the players, and to advocate for the game and the players’ love for it. Officials must be trusted, they have to be. The training and experience they gain over time supports the assertion of trust. Similarly, AI that isn’t trusted is in fact potentially dangerous. Gaining trust in AI-based systems is a function of their creation and the conditions they operate in. Testing for trust in an algorithm once developed is a function of the decisions it will inform. At CalypsoAI we’ve built our portfolio on developing and testing for secure and trusted AI because all decisions have consequences. Like people, AI/ML models will respond differently depending on what the models observe. The fact is, there is no perfect football official and there is no perfect AI. Testing for model performance along with multiple features and gradients is essential. Our goal is comprehensive insight and optimized outcomes for AI-based systems. Anytime a new component is introduced into a large system, the impacts to the larger mission need to be understood. Just like when an official is faced with an unfamiliar set of circumstances and a decision to make, AI’s decision-making ability should be assessed in terms of the mission it is supporting. There simply isn’t any way to know if it’s fit for purpose.
Trust and understanding are human conditions that are limited by human observation and experience. Before the microscope, we never imagined microorganisms so we never sought remedies that could cure them. There are questions we should be asking about the circumstances an AI might encounter that we haven’t asked yet. We believe securely developing and testing AI are the new microscope that will lead to insight, and rapid adaptation, and they are the straightest path to new discovery.

Andy has over 30 years of experience as an innovator and thought leader in both commercial and government enterprises. His accomplishments include establishing the first IC-sponsored geospatial data sharing platform to partner with industry and support the development of AI/CV algorithms.