A Pittsburgher back from the Sandbox: September 2025

Tuesday, September 23, 2025

Book review: AI Snake Oil by Arvind Narayanan and Sayash Kapoor

AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference by Arvind Narayanan
My rating: 4 of 5 stars

I read AI Snake Oil as part of the INFORMS Book Club. I work with predictive AI and generative AI at work, and I describe what I do as figuring out how AI fails, then work with my business partners to develop a process and application to make AI useful and productive. This book falls into the category of demonstrating how AI fails.

There are several chapters, each with a discussion of a way that AI fails and how the authors figured it out. But they have a pattern. First, the failures in AI is in part due to how a particular model is trained. If the training data does not match the intended use, such as the data actually represents one characteristic but the model is being used for something else. Next, they discuss that the people who made the model do not always have incentive to get it right. In particular, the large AI companies do not have incentive to either evaluate the quality of the models or improve them.

Some things I think they do well.
1. Differentiate between various generations of AI. They specifically break out predictive AI, generative AI, and symbolic AI. Each of which work differently than the others.
2. Focus on the training data. This is where AI models need to be examined (by definition, AI does not include a description of the system, so predictive and generative AI have to learn about the world through large amounts of diverse data.) And failures come from the data not matching the setting where a model is applied.
3. Be skeptical of claims that come from computer companies. I always say don't let people selling you things define terms. They also say don't let industry set the rules, the standards, or barriers of entry. Because their goal is to defend their market share, not the benefit of society.

This is a good book to read, especially as part of a discussion. Highly recommended

View all my reviews

Saturday, September 20, 2025

Business problem framing: The value of frameworks in analysis and communiction

Badge signifying completion of the INFORMS Business Problem Framing course

As part of ongoing professional education, I took the INFORMS Business Problem Framing class, which is also the lead in to the Certified Analytics Professional training that is being developed. You may wonder what can be learned in a 5 hour course. It is a tour of frameworks for looking at business problems. And this is something that gets short changed in engineering training that focuses on analytic methods (i.e.math) to the expense of understanding the problem in context and working with people. This course is needed by anyone who learned analytics as a math or computational practice and needs to learn to work with business partners/clients and is good for those who work with business partners/clients but want more range and ways of communicating (which is everyone who works in analytics in the real world)

Back when I was an enginneering professor, the supply chain center at the associated business school asked if I would be willing to coach their supply chain case study teams. As a school with a certain amount of pretension, they found it disheartening that they were never winning these regional competitions (operations management supply chain professors and industrial engineering supply chain professors are drawn from the same pool, so this actually made sense as they needed someone who was not actually part of running a case competition.) But, after some conversations I did not approach it as a refresher on supply chain modeling, I approached our coaching sessions as teaching them how to read a case through the lens of frameworks that they had learned at some point, but may not have realized were a key tool. Because frameworks are tools for organizing how you think of problems and a way to communicate about problems.

In every domain of expertise, experts use frameworks to organize how they look at the world. Frameworks help experts look at all aspects of the situation. And when provided information they can realize that they have not been given critical information that can change how they should approach a problem. Novices will often work only with what they have, like a classroom problem. I have noted that Generative AI foundation models make the same mistake, they work with only what they have in cases where a framework would have told them a direction to go to get missing information for a fuller picture of the problem.

The other use of frameworks is communication. Frameworks provide a way of quickly communicating the essense of the issue and the information that a decision will depend on. When I was deployed in Afghanistan, I had a brief that was working its way up the chain.It was being scheduled for a 2-star general. A couple of that general's staff were present when I had given the brief at a lower level. And they told me that the brief was good, but I had to completely redo it to fit into a specific framework. Because that is how that general processed information. (In the end, the staff went through my brief and realized that the recommendations were sound and we got the result without having to formally present the brief)

What can a 5-hour class do? The Business Problem Framing course is a tour of a wide range of frameworks, probably familiar to someone coming out of business school, but not familiar to those whose focus is on analytical methods, math, and programming with data. And many of the frameworks are very similar in purpose. But the right framework for the problem is on partially about the particulars of the problem, it is also about the framework that communicates the problem to everyone involved. And just as having many ways of delivering a message is helpful to have, familiarity with a number of frameworks is helpful to have when communicating with stakeholders, because it is more possible that you will find a framework that resonates with everyone. And that leads to a better and more fruitful effort in solving the right problem at the right time.

Sunday, September 14, 2025

First days with a Corne 46 key split keyboard

I've been feeling the first twinges of soreness in my wrists, so I've started a range of actions to prevent this from getting worse. This includes virtual physical therapy (Sword by Thryve), an ergonomic chair (off Amazon.com), a small standing desk (also from Amazon.com). And way down the rabbit hole, a Corne split keyboard. Which is notable for (1) having only 46 keys, being split, and being column staggered instead of the more traditional row staggered. (sometimes called ortholinear, but that should be used to refer to keyboards where keys are layed out in a grid, not staggered in any way.)

I bought the board and case from YMDK on Amazon.com. As my first split keyboard, I got the most basic version, a Corne v4.1 3 X 6 + 5 key configuration (46 keys total, no screens, encoders, or other options), wired (as opposed bluetooth or 2.4Hz USB dongle) and standard MX switches. In addition I got Oetemu silent tactile switches with low profile keycaps (in purple, Go Northwestern!).

Actually, the first Corne I got I ordered as refurbished (i.e. used). And the right side did not work. There is a well known failure mode and I suspect the person who had this before caused it to fail and returned it, and the Amazon.com people did not know how to test it properly when processing the return. But after some troubleshooting with YMDK I returned it and bought a new one, which worked just fine.

After making sure this worked (there is a general instruction to connect the two parts before plugging in the USB-C cable to connect it to the computer.) Next I added switches, and finally the keycaps. For the keycaps, the way that split keyboards work is many keys are actually accessed through layers (e.g. capital letters through a shift layer, numbers on a layer, symbols on another layer, etc.) I touch type, so I was not concerned about knowing where the letters or default '0' layer are. So I put the keys for the left side numeber layer on the left side, and the right side symbol layer on the right side. Although, this is not much of a crutch as my keycaps are not easy to read if I don't have the RGB lighting on.

So the main layer is mostly the basic letter part of the keyboard, except the top row of numbers and symbols is gone. On the left side, I keep the escape key in the top left corner, and lose the Caps lock key. I get the Caps Lock key through a key combo, tapping both shift keys simultaneously. On the right side, I have h-p then the Backspace key, so I have to move the -/_ and =/+ keys to the symbol layer. Similarly the bracket keys and the backslash key from the next row. The bottom row happens to fit just fine (most keyboards have a wide shift key to take up the space.) The remaining 10 keys on each side are <ctrl>, <layer 1>, <Enter>, <space>, <layer 2>, <Alt> and the four center keys I mapped to <GUI>, Insert, <Menu>, Delete. I also set up homerow mods, where keys would have one function when tapped, but another function when held. So, the four home row keys on each side were (from inside to outside) Layer, Shift, Ctrl, Alt, Menu/GUI, with the layer and menu/GUI keys on the opposite side from where there dedicated key was found. Not sure if I'll use these home row key mods yet (using these dual role keys is called 'tap dancing')

What makes these small keyboards work are the additional layers. I have three: a number layer, a symbol layer, and an Alt layers. For the number layer, I have the left side set up as a numeric keypad on the w-e-r columns. 0-+-- are on the t column, .-*-/are on the q column. The left most column are ~, =, and <-. The <- is because I program in R, the = needed a place, and the ~ was displaced from the number row, and is used to mean approximately or equivalent. On the right side, the top row are the boolean (logic) operators <>$|!, for greater than, less than, AND, OR, and NOT. For the second row, HJKL are navigation arrows for left, up, down, right. These are the keys used by VIM (which I have set up on all of my programming environments). Below these in the NM<> keys are beginning of line, left one word, right one word, end of line. The two keys to the right are page up and page down. To the right of those on the edge are top of document and end of document.

The symbols layer top row are the symbols that are shifted number keys. In the right side, the second row are the keys that were displaced: ` - = [ ] \. The row below them are the shifted values of those same keys: ~ _ + { ] \. These were ordered so that [] and {} were in the same columns as (). The left side second and third row has mouse navigation keys. I actually am not sure how these work, we'll see if I figure out how to use them. :-)

The Alt layer is accessed by pressing the Layer 1 and Layer 2 keys simultaneously. The top row are the Function keys F1 through F10, with F11 and F12 continuing around the corner on the right column. The left side are keys for managing the keyboard. Specifically the keyboard lighting. This is a wired keyboard, so there is no bluetooth to worry about. The top left button is a keyboard reset key. The right side are audio and screen controls. Mute then volumn down and up, then media back, play/pause, media forward. Then screen brightness down and up. Finally Print Screen, Scroll lock, and Pause/Break.

The reason for the complications of layers is to reduce finger/hand movement. Specifically, there is no rhyme or reason for the locations of numbers and symbols, which is why people who type a lot of numbers will use a numeric keypad. So I put the numbers and basic arithmetic symbols on a number layer. I don't have a good way of locating symbols, so I basically place them based on their normal location, with the exception of making the sets of brackets in the same columns.

For typing on the split column staggared keyboard, the basic idea is to put your hands on the table, then arrange the keyboard to be under each hand, with the columns aligned to your fingers. However, when I do this my right hand is happy going up and down the columns. But my left hand does not want to, because it is used to having to go to the side when it goes off the home row, so I am still adjusting the left side. The c and v keys seem to be especially problematic (my middle finger drifts right and hits 'v' when I wanted a 'c')

Just started down the rabbit hole of split keyboards., so I'll see how I master the layers. One obvious benefit, I now actually use the media keys on the keyboard, I used to always use the mouse, because I could never remember where the media key were and I did not feel like hunting for them on my keyboard. Know, the keys are in a logical place (for my definition of logical). I still have to think about where symbols are, but I do that for symbols that are not with the letter keys anyway. Navigation keys are easy, because I already use VIM and I did not like using the arrow keys in the lower right corner anyway. Other potential subjects for obsession are the home row mods and key combinations. In addition to the two shift keys for Caps Lock and the F1 and F2 keys for F3, there are also common combinations for *(=[+{ that make it so these common keys can be typed without using the Layer key,

I have a Sofle on the way, and another Kickstarter backed split keyboard that I'm anticipating in November-December. This keyboard is going to be my lower profile, simple keyboard (no number row, no screens, no encoders, no bluetooth/2.4 dongle, and a 3D printed case instead of acrylic sandwhich). I'm also exploring tenting. My simple way to start is Steepy laptop riser stand, which allow for 3 levels of elevation. I'm starting at the lowest elevation, I'll try higher options later.

Now back to work!

Wednesday, September 03, 2025

Subject domains that lead to failure in large language models output

At the 2025 YinzOR conference I was talking with Léonard Boussioux about types of domains where large language models (LLM) have a tendency to fail, and other conversations encouraged me to write this down.

There are stories of the early days of aviation, where a test pilot would come back and learn that his plane had cracks, and were delighted because that meant that they were learning the limits of the aircraft. In that spirit we want to look for domains where the foundations models will give poor results, so that those developing applications can look for potential failures and design applications and train users to be attentive for errors. For this discussion, the cause of the errors are the data used to train the foundation models. Like other deep learning based models, to uncover categories of errors, we look at the training data.

Large language models tend to fail due to inability to work with nuance and naivete. My friend Polly Mitchell-Gunthrie describes LLMs as unable to work with context, collaboration, and conscience. I describe problems in LLMs as failures in nuance, naivete, and novice problems. Again, this is due to how the foundation models are trained (effectively all publicly available text), so these are social problems, and my not be solvable in this real of LLM based AI.

Novice problems are due to the characteristics of what is available on the internet. The majority of information on the internet is aimed at beginners. (computing topics are significant exceptions to this) So there is a lot of information that rises to the equivalent of an introductory sequence in college. So it has a body of knowledge. But using a body of knowledge that is targeted at introductory level leads to nuance, naivete, and novice errors.

Nuance issues are probably the most recognized. Nuance comes into play in subjects were details matter where the answer in a specific situation is not the same as the standard case. When given a setting, an LLM (like a novice) will take the information provided in the prompt and find other sources that include the same information and come up with an output (answer). However, and expert would take information and fit it into an applicable framework. Then, the expert will recognize that there is missing information that influences the final answer and ask for that information. Similarly, when considering other references, the same framework tells the expert the extent of applicability of that reference. An LLM only matches text in the prompt with the references, so will not always check that the context of the reference matches the context of the setting of the user. These types of issues lead experts to reach very different conclusions than people who are new to a domain, and the LLM tend to act like novices here. As an exercise to help people identify domains where LLMs do badly, I ask people to pick a topic that they know well, but not through textbooks or classwork, and not computer related (this tends to lead to topics that they know experiencially or through true research). Most people identify a hobby, my manager did this exercise with his master thesis topic. Another variation of nuance are details that frequently occur together, but are not the same. Since the LLM works by probablistically choosing words that occur together, it can often try to combine related topics or words that should not be. A frequent example of this is in anatomy, where LLMs trained on medical texts will often conflate the names of two body parts and into a body part that does not actually exist.

Naivete occurs when someone is in possession of facts, but does not recognize the consequences of those facts. For an LLM, it is easy to take a prompt, then from references that match that prompt, identify other facts/details that are typically associated with the information provided by the prompt. But unless it finds references that explicitly spell out the consequences of a particular collection of facts, the LLM will not provide the consequence. As an example, my then 10 year old daughter had written a story that was set in a domestic setting in the United States during the 1860s (U.S. Civil War era). So when I ran her through the exercise of a topic that was not well known, she asked the Generative AI about an aspect of domestic life, specifically methods for starting fires. Her comment was that the generative AI gave details that as far as she could tell were all true. But, it did not provide an important consequence. When given the same set of details, a modern day chemist would mentally translate the 19th century terms to modern day counterparts, and immediately recognize that it contains all the ingredients to cause an explosion. And in real life this is what happened so there are very few examples of this technology in museums, because they all exploded. And my daughter regarded that knowing a technology meant for use in domestic (home) life had a tendency to explode to be an important detail and the LLM not reaching that conclusion to be a failure.

Another type of novice error are exceptions and crossing domains. Many domains will teach general frameworks and rules of thumb at the introductory level. They are intended to help practitioners succeed and to avoid common pitfalls. However, past the introductory level practitioners learn the reasons behand the framework and rules, either from deeper training or through experience, so experts will know the exception to the rules or when to modify rules based on the particular circumstance at hand. This is even more important in cases where multiple domains are involved, which is common outside controlled environment such as academic or teaching environments. In this case, the standard rules for the multiple domains can conflict. Experts will resolve this both by establishing exceptions based on the circumstance, but also looking at the ultimate goal or intent of the activity, and break or bend rules based on which rules interfere with the goals or the mission. But they don't completely through out the rules, experts will keep in mind the intent of the rule and ensure that the intent is addressed. When LLMs are given both the rules of the domains as well as history of prior activity, the LLMs will often identify the fact that rules are broken, and no longer follow the rules, which leads to poor outputs that do not respect the issues that arise with these domains in practice.

LLMs are especially handicapped when there are intersecting domains. When articles or other texts are written or published, the general rule is to have anything you write/publish be on a single topic, which makes it easier to identify the target audience and for the target audience to find your work. Topics that are within intersecting domains tend to be niche topics, and are both difficult to get published and difficult to find. An thus less likely to be included in the foundation models training data. Another area that is not found in published texts are failures. In many domains, expertise is developed through experiencing failures. However, these domains tend not to document or publish the failures that experts learn from because of potential of repercussions or public disapproval. And if these are not published, they will not be available for training foundation models.

The purpose of this exercise is to make Generative AI useful. And to be useful the ones who work with Generative AI models have to be able to recognize and look for so that they can screen Generative AI output for other types of errors. For example, my now 11 year old daughter continues to identify errors in Generative AI output ranging from trivial to profound, and because she has this ability, I have no concerns about her use of Generative AI. Same with my colleagues, once they have experienced identifying errors in AI (and this holds for machine learning models as well), they are able to identify future errors and react appropriately, and not taking the outputs of AI as automatically true. And this leads to more productive use of AI.

A Pittsburgher back from the Sandbox

Tuesday, September 23, 2025

Book review: AI Snake Oil by Arvind Narayanan and Sayash Kapoor

Saturday, September 20, 2025

Business problem framing: The value of frameworks in analysis and communiction

Sunday, September 14, 2025

First days with a Corne 46 key split keyboard

Wednesday, September 03, 2025

Subject domains that lead to failure in large language models output

Recent Comments

Blog Archive

Tags

Contributors