Sunday, August 03, 2025

Failures and how does it impact the quality of Generative AI

 I gave a talk on Generative AI as one of PyData Pittsburgh's monthly events.  While the focus of the presentation was on demonstrating impacts of randomness on Gen AI output, during the discussion we talked alot about how we teach Gen AI a specific domain, and what makes a person an expert and can Gen AI learn those things.  There were a few things about being an expert that will take a lot of work to replicate when starting with Foundation models, but one that stuck to me was the role of failure in learning,  and how hard it will be to teach this to foundation models.

My friend Polly Mitchell-Gunthrie talks about Context, Collaboration, and Consciencience when talking of the limitations of foundation models. Context is a well known, discussed, and acknowledged issue in Gen AI that we address through variations on prompting and grounding.  Conscience is both looking at issues in ethics but also mission.  But collaboration is harder, because Generative AI does not have institutional memory.  In particular, the memory of failures.

In American culture (which is where I am), we have a pressure to be perfect, and to make no mistakes and no failures.  But in a wide range of domains where there are high standards of performance, there is a maxim that is some variation of "if you have not failed, you did not try hard enough."  But even in these communities, we rarely document these failures, this level of training is done person to person, with mentors/trainers/leaders who provide cover to try different things and tolarate some level of failure in the pursuit of excellence. But more importantly for this dicussion, this does not get published, because these communities are cognizant of how intolerant of failure the general population is. But that means that the general population does not realize that the performance and excellence was developed through experiences of failure.  And the lack of documentation means that foundation models do not learn this. (for a counter example, look at baking websites that explain causes  of failure using pictures of baking disasters)

Instead of reality, the internet is a record of successes, and not failures.  This is a known problem (it is frequently discussed in academia, with journals only publishing successes, without providing lessons learned from failures, leading to a lot of wasted effort as research groups go down dead ends that other groups had already explored.) But with foundation models, that means they are trained on the successes, and not the failures.  So everything seems easy, and the Generative AI that uses these foundation models provides answers with assurance, but the people who have to implement them run in to all of the myriad of problems that come when doing things in real life.

Could you address this through grounding?  This is a cultural issue, you would need to have a record of failures, where those who went into the unknown areas of your domain were allowed to fail without adverse consequence. Then you could potentially have the Gen AI realize that a path of action could lead to an unresolved problem.  And you would have to accept the Gen AI discovering those failures, and actually telling you about them (things like this are part of the problem Gen AI has with understanding context).  So, similar to problems where there are multiple correct answers, this is as much a cultural problem in what we as a society see fit to write down (which becomes part of Foundation models), and what we do not.