Wednesday, October 22, 2025

A tale of two Corne: one month with split keyboards

My split keyboard journey started with two Corne keyboards and a Sofle all purchased over a period of two months. The Sofle is from Ergomech and is a Bluetooth enabled.  I use it as a wired work keyboard.  The two Corne keyboards are from YMDK, purchased through Amazon. One is an MX, the other is an MX low profile.  Here, I talk about the two Cornes. I will look at (1) Buying from YMDK on Amazon, (2) use of the two keyboards, (3) the keymap journey.






I bought the keyboards from YMDK on Amazon.com. YMDK markets pre-soldiered, hot-swappable,wired and wireless  (2..4GHz dongle) Corne 4.1 keyboards with 3D printed enclosed case with 46 keys  (3x6 +5). Also a wireless Sofle.  I wanted wired only to make my first steps into the split keyboards with fewer complications, in particular the wireless versions are powered by replacable button cell batteries, that I did not want to deal with. I immediately flashed both keyboards with 4.1 Vial versions of the Corne firmware and that had no problems.

I have one keyboard with MX Akko Dracula linear switches (35 g weight) and XDA profile PBT keycaps, and one low profile with kaith Deep Sea Whale Low Profile choc v2 (silent tactile).  The first keyboard I ordered was a refurbished MX switches. The right side did not work, and I suspect this is because Amazon refurbished items usually are returns, and the prior owner probably shorted the keyboard. A couple back and forths with YMDK customer service (there is a link on Amazon to them, and it is not too hard to get the YMDK customer service email address.) and we decided to return it through Amazon and I ordered a new one. The Low profile keyboard was not a problem. The website make it clear that it could take either Kaith 1333 (v1) or 1353  (v2) switches. So I got 1353 switches and used Wormier Low profile (skyline) key caps.  So, with Amazon return policies, I found buying from YMDK and working with their customer service reasonably good, although I am leary about any complications like wireless.

For using the keyboards, I use them with my own laptop and I have one at a standing desk that I use for both my work and personal laptop (I take a standing session and connect my laptop to a docking adapter). And I also bring the low profile Corne with me when I go in to the office (my acrylic sandwich Sofle looks a little fragile with its openings and the suspended acrylic oled screen cover). I like the low profile version. With the 3D printed case and low profile keys and switches, it feels relatively durable and no big failure points like catching on something. I warp up the halves in a bandana and put it with the cables into a lined bag and that seems to work well. I'm not sure I would like the low profile keyboard as my only keyboard, it feels tough because I'm always bottoming out compared to my MX Sofle. But as a secondary keyboard to provide variety I think it does well. And it does get attention when I take it around :-)

With the Akko Dracula switches, I think the light switches don't work for me. I am constantly having accidental key presses with mod-tap keys that I don't with the silent tactile keys with 45g weights. I think I'm going to put that keyboard aside until I feel like getting new switches for it.

The keymap journey is an ongoing one. I think I'm at a point that I only make small changes a few days apart.  Some big choices along the way in roughly the order I settled on them.

  • QWERTY- I'm staying with the QWERTY layout. I know it well, and I am not so fast a typist that any keymap optimization can make any meaningful difference.


  • Numbers. I started with numbers being a top row of a layer. Eventually I realized that when I need numbers, I need several, so I switched to making a numeric keypad with arithmetic symbols on one side, then the other symbols on the other side of the layer.
  • Symbols. There are seven'ish pairs that need to be taken care of. I touch type, so I wanted the pairs that are usually on the same key together.  So these are:  `~, -_, =+, [], {}, '", \|.  The quotes get taken care of by moving enter to the thumb cluster, so quotes stay in place. =+ and -_ I put with the numeric keypad. So I had two rows of the symbol layer were the []\'` on one row and the shifted version of those keys {}|"~ were in the row below that.  I put the brackets on the outside (left row) because I ended up putting all the brackets on combos, so I still have them on this layer, but on the edge.  And since I program in R, I put the ` and ~ closer to the index finger.  The top row of the symbol layer I put logical operator symbols:  <>&|!, (less than, greater than, and, or, not).  Making two columns of all the bracket keys (except parens) and keeping them in an order I would be able to remember.


  • Navigation. The top row of the navigation layer are the symbols that are the shifted number row.  For the remaining two rows, the right side is navigation, left side mouse control. Right side is centered on hjkl, because I use VIM and my fingers already know those keys.  Below those are horizontal keys, beginning of line, previous word, next word, end of line. I put page up and page down on the two keys to the right of those. On the far right I have beginning and end of document, but I don't think I use those that much. The mouse cluster is xdcv.  Then f,s are the click buttons, gb are scroll up and down.  za are scroll left and right.  I use these a surprising amount of the time (especially to help recover after accidental mod presses)


  • Adjust layer:  Left half is for controlling the keyboard (lighting), right half is media controls.  I have volume mute, down, using hjk, media back, play/pause, next anm,  l; are screen brightness.  ./ are zoom out/in.  I never really learned to use the keys because these were always on the function row and every keyboard had them in a different place. Now that I got to put them where I wanted, I use them a surprising amount.


  • Screen navigation.  I had window management (switch windows, move windows) on the thumb cluster. I figured if I was in this layer I did not need space, enter, backspace/delete. I knew these keys, but they were awkward on a normal keyboard
  • Programming key combos.  I made combos for the bracket symbols ()<>{}{} with left on the left side and right bracket on the right. I tend to use these instead of the normal typewriter layout.  I have additional combos for : <- |> # for R programming (the I also have combos for open file and the command palette in Visual Studio Code.,
  • Mod-tap and layer-tap. I have an extra layer tap keys on g and h, which I use to mirror my layer keys so each layer is accessible from each side of the keyboard. For example, the number pad, I can either use it single handed with the thumb holding the layer key, or use the index finger on the other side. I usually use it one handed if I only need one key on the number pad, opposite hand if I need more. I let my fingers decide. I also have (from outside in) home row mods of layer, shift, control, alt, gui. But I took out the gui modtap as I was doing too many accidental mod presses, especially with the Akko Dracula switches. (I think the other keys are not as noticable because they have momentary effects, but GUI and menu lead to something happenning.)
  • Thumb clusters, I have ended up with the thumb cluster being Insert(held Ctrl), Enter (held GUI), raise layer (navigation), lower layer (numpad/symbol), space, backspace (held Alt). I realized that the menu key was also right click on the mouse (and the key on the mouse layer works), 
Some other decisions along the way
  • Delete/Backspace. I first left backpace next to P and delete key in the thumb cluster, but I repeated used delte when I meant backspace, so I moved backspace to the thumb cluster and had delete in the corner.  It makes for the same Ctrl-Alt-Delete chord that I'm used to.
  • Escape and tab. I started with escape in the corner, but my fingers always wanted tab to be next to Q. So Tab went into the corner and Escape went where CAPS LOCK usually is. I don't use CAPS LOCK much,  so I made a combo with both space keys as something easy to remember if I ever want it.
  • Numpad. I started with the  numbers on the top row of a layer, but they were always awkward, just like they are normally. Then I realized they could be a numpad, with room for arithmetic keys around them. I also tried both left and right sides, and ended up with the right side. Because - and _ are used so much, I had those under the index finger and =+ went to the other side of the numpad.
  • Shift and Ctrl/Alt keys. I tried out putting Shift in the thumb cluster and Ctrl, Alt in the corners where shift usually was, but changing that muscle memory was too hard.
  • Space and Enter. I tried space on the left first, then saw I was making too many errors so I switched them.
Observations from use.
  • I don't miss the number row. The only time I notice it is when typing passwords or other things like phone numbers where muscle memory knew where the numbers are, but I'm creating a new set of muscle memory.  And the symbol keys always needed a layer key (Shift).
  • I use the media, zoom, and window management keys all the time. I never used them on a regular keyboard because I could not remember where every keyboards keep them and they were odd key combinations (odd to me). So these mean I am using the mouse a lot less.
  • After one month, using a regular keyboard feels uncomfortable because it felt cramped and flat (I have a variety of tenting solutions) I don't remember it being much more comfortable when I started using split keyboards, but that is probably because of dealing with all the changes in geometry.
  • I noticed that I only use the sixth column on the base layer with Tab-Escape-Shift and Delete-<'>-Shift.  So I am only six combos away from switching.to a 5X3 Corne (must resist . . .)



Friday, October 03, 2025

Setting up a keymap for a Corne split keyboard to be used for data analytics

Continuing my dive into the rabbit hole known as split keyboards, I got a Corne keyboard from YMDK on Amazon.com.  A Corne is an open source keyboard (circuit board and source code are freely available. The original creator is not in the keyboard building and selling business so he lets others improve his design and sell them) It is a 3 row X 6 column + 3 (thumb row) key per side board column staggered board. The goal here is comfort from being able to independently place the halves of the keyboard under my fingers and reduce the need to twist my wrists. (i.e. reduce repetitive strain injury)

The trick with this keyboard is what to do with they symbol and control modifiers. Also known as the keymap. The answer is creating layers, such as the shift layer used for capital letters and the symbols under the number keys. So I have a symbol/navigation layer and a number/mouse control layer.

There are a few general principles I had in making the keyboard.

  1. The baseline is the standard QWERTY keyboard. I spent a lifetime building up my muscle memory and I'm not going to throw it away.
  2. I wanted to try a numpad on the right side that should also be associated with common math symbols.
  3. I wanted VI type navigation keys  (i.e. h, j, k, l correspond to left, down, up, right)
  4. Keep the symbols associated with shifted number keys on the top row, in order
  5. For symbols that did not make the base layer or the math layer, keep the symbol and the shifted symbol together (shifted symbol below the main one)y
  6. As I use it and make mistakes, move characters to the key my fingers wanted them to be.
So, the base layer is as much of the QWERTY layout as could fit.  The left column had tab above escape above shift. I started out with the escape in the corner and tab under it, but clearly my pinky wanted the tab to be next to Q. For the right column, I started out with the backspace next to P and Delete under my thumb, but a bit of use led me to switch them.  The left thumb keys had insert, enter, and lower layer (symbols and navigation), with the insert key doubling as Control when held down (also known as mod-tap). The Enter key doubled as Alt when held. The right thumb keys were raise layer (numpad, math symbols, and mouse control), space (Control when held) and backspace (alt when held). I also set up home row mods, where both hands the home row keys doubled as shift, control, alt, Command when held.  


The lower layer was for symbols and navigation.  The top row had the symbols that would normally be on the shifted number keys.  The left side had the keys that were displaced from the base layer: []{}\|`~.  The right side had the VI arrow keys on h, j, k, l. The row below those where navigation within the row: beginning of row, previous word, next word, end of row. The column to the right had page up and page down. The last column on the right had top and bottom of document (or cell for Jupyter notebooks)



The raise layer was mouse controls, numpad, and math symbols. Left side had mouse controls (left, right, up, down, clicks, scroll) and math logic symbols &|!<>.  Right side was a numpad, with -+/*_= around the numbers.  



The third or adjust layer were to control the keyboard or computer.  Left side was the keyboard, specifically lighting.  Right side were media controls, screen brightness, and screen zoom.



What really made this work was the use of combos.  I made combos (two key combinations) for the bracket type symbols which were mirrored on the left and right sides. This covered <>, (), [], {}.  In the inside columns, I made combos for :, <-, |>, # which are used in R.

Actually, this was a major modification. My initial layout had the numbers along the top row on the raise layer. But while I don't use numbers that much, when I need them I need many of them, and a numpad is less awkward.  I think.  I will know after much more use.

And here are my keyboards.  I have two Corne's, one MX with Akko Dracula (low weight linear) and XDA profile keycaps and a low profile with Kaith low profile silent tactile with low profile MX keycaps.  And a Sofle with Oetemu silent tactile switches and XDA keycaps. The Keychron K12 with Cherry Reds and OEM profile keys is what I was using before as a reference.  The Sofle is used with my work computer (having a number row is useful for making passwords smoother). The low profile Corne is packed in a bandana and a bag for travel. The MX Corne is used for my personal/non-work laptop. Since I got my first Corne in late August, this represents a big dive into the rabbit hole of split keyboards. 

For tenting, the Sofle has M5 bolts that came with it from Ergomech, the low profile Corne is using Steepy laptop risers which is what I take when out and about.  And the MX Corne is on Cooper Cases MagSafe stands. 






Tuesday, September 23, 2025

Book review: AI Snake Oil by Arvind Narayanan and Sayash Kapoor

AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the DifferenceAI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference by Arvind Narayanan
My rating: 4 of 5 stars

I read AI Snake Oil as part of the INFORMS Book Club. I work with predictive AI and generative AI at work, and I describe what I do as figuring out how AI fails, then work with my business partners to develop a process and application to make AI useful and productive. This book falls into the category of demonstrating how AI fails.

There are several chapters, each with a discussion of a way that AI fails and how the authors figured it out. But they have a pattern. First, the failures in AI is in part due to how a particular model is trained. If the training data does not match the intended use, such as the data actually represents one characteristic but the model is being used for something else. Next, they discuss that the people who made the model do not always have incentive to get it right. In particular, the large AI companies do not have incentive to either evaluate the quality of the models or improve them.

Some things I think they do well.
1. Differentiate between various generations of AI. They specifically break out predictive AI, generative AI, and symbolic AI. Each of which work differently than the others.
2. Focus on the training data. This is where AI models need to be examined (by definition, AI does not include a description of the system, so predictive and generative AI have to learn about the world through large amounts of diverse data.) And failures come from the data not matching the setting where a model is applied.
3. Be skeptical of claims that come from computer companies. I always say don't let people selling you things define terms. They also say don't let industry set the rules, the standards, or barriers of entry. Because their goal is to defend their market share, not the benefit of society.

This is a good book to read, especially as part of a discussion. Highly recommended


View all my reviews

Saturday, September 20, 2025

Business problem framing: The value of frameworks in analysis and communiction

Badge signifying completion of the INFORMS Business Problem Framing course

As part of ongoing professional education, I took the INFORMS Business Problem Framing class, which is also the lead in to the Certified Analytics Professional training that is being developed.  You may wonder what can be learned in a 5 hour course.  It is a tour of frameworks for looking at business problems.  And this is something that gets short changed in engineering training that focuses on analytic methods (i.e.math) to the expense of understanding the problem in context and working with people.  This course is needed by anyone who learned analytics as a math or computational practice and needs to learn to work with business partners/clients and is good for those who work with business partners/clients but want more range and ways of communicating (which is everyone who works in analytics in the real world)

Back when I was an enginneering professor, the supply chain center at the associated business school asked if I would be willing to coach their supply chain case study teams.  As a school with a certain amount of pretension, they found it disheartening that they were never winning these regional competitions (operations management supply chain professors and industrial engineering supply chain professors are drawn from the same pool, so this actually made sense as they needed someone who was not actually part of running a case competition.) But, after some conversations I did not approach it as a refresher on supply chain modeling, I approached our coaching sessions as teaching them how to read a case through the lens of frameworks that they had learned at some point, but may not have realized were a key tool.  Because frameworks are tools for organizing how you think of problems and a way to communicate about problems.

In every domain of expertise, experts use frameworks to organize how they look at the world.  Frameworks help experts look at all aspects of the situation.  And when provided information they can realize that they have not been given critical information that can change how they should approach a problem. Novices will often work only with what they have, like a classroom problem.  I have noted that Generative AI foundation models make the same mistake, they work with only what they have in cases where a framework would have told them a direction to go to get missing information for a fuller picture of the problem.

The other use of frameworks is communication. Frameworks provide a way of quickly communicating the essense of the issue and the information that a decision will depend on. When I was deployed in Afghanistan, I had a brief that was working its way up the chain.It was being scheduled for a 2-star general. A couple of that general's staff were present when I had given the brief at a lower level. And they told me that the brief was good, but I had to completely redo it to fit into a specific framework. Because that is how that general processed information.  (In the end, the staff went through my brief and realized that the recommendations were sound and we got the result without having to formally present the brief)

What can a 5-hour class do?  The Business Problem Framing course is a tour of a wide range of frameworks, probably familiar to someone coming out of business school, but not familiar to those whose focus is on analytical methods, math, and programming with data. And many of the frameworks are very similar in purpose.  But the right framework for the problem is on partially about the particulars of the problem, it is also about the framework that communicates the problem to everyone involved.  And just as having many ways of delivering a message is helpful to have, familiarity with a number of frameworks is helpful to have when communicating with stakeholders, because it is more possible that you will find a framework that resonates with everyone. And that leads to a better and more fruitful effort in solving the right problem at the right time.

Sunday, September 14, 2025

First days with a Corne 46 key split keyboard

I've been feeling the first twinges of soreness in my wrists, so I've started a range of actions to prevent this from getting worse.  This includes virtual physical therapy (Sword by Thryve), an ergonomic chair (off Amazon.com), a small standing desk (also from Amazon.com). And way down the rabbit hole, a Corne split keyboard. Which is notable for (1) having only 46 keys, being split, and being column staggered instead of the more traditional row staggered. (sometimes called ortholinear, but that should be used to refer to keyboards where keys are layed out in a grid, not staggered in any way.)


I bought the board and case from YMDK on Amazon.com. As my first split keyboard, I got the most basic version, a Corne  v4.1 3 X 6 + 5 key configuration (46 keys total, no screens, encoders, or other options), wired (as opposed bluetooth or 2.4Hz USB dongle) and standard MX switches. In addition I got Oetemu silent tactile switches with low profile keycaps (in purple, Go Northwestern!).

Purple corne keyboard Go Northwestern!


Actually, the first Corne I got I ordered as refurbished (i.e. used). And the right side did not work.  There is a well known failure mode and I suspect the person who had this before caused it to fail and returned it, and the Amazon.com people did not know how to test it properly when processing the return. But after some troubleshooting with YMDK I returned it and bought a new one, which worked just fine.

After making sure this worked (there is a general instruction to connect the two parts before plugging in the USB-C cable to connect it to the computer.)  Next I added switches, and finally the keycaps.  For the keycaps, the way that split keyboards work is many keys are actually accessed through layers (e.g. capital letters through a shift layer, numbers on a layer, symbols on another layer, etc.)  I touch type, so I was not concerned about knowing where the letters or default '0' layer are. So I put the keys for the left side numeber layer on the left side, and the right side symbol layer on the right side. Although, this is not much of a crutch as my keycaps are not easy to read if I don't have the RGB lighting on.

Purple corne keyboard Go Northwestern!

So the main layer is mostly the basic letter part of the keyboard, except the top row of numbers and symbols is gone. On the left side, I keep the escape key in the top left corner, and lose the Caps lock key. I get the Caps Lock key through a key combo, tapping both shift keys simultaneously.  On the right side, I have h-p then the Backspace key, so I have to move the -/_ and =/+ keys to the symbol layer.  Similarly the bracket keys and the backslash key from the next row.  The bottom row happens to fit just fine (most keyboards have a wide shift key to take up the space.)  The remaining 10 keys on each side are <ctrl>,  <layer 1>, <Enter>, <space>, <layer 2>, <Alt>  and the four center keys I mapped to <GUI>, Insert, <Menu>, Delete.  I also set up homerow mods, where keys would have one function when tapped, but another function when held.  So, the four home row keys on each side were (from inside to outside) Layer, Shift, Ctrl, Alt, Menu/GUI, with the layer and menu/GUI keys on the opposite side from where there dedicated key was found.  Not sure if I'll use these home row key mods yet (using these dual role keys is called 'tap dancing')  

What makes these small keyboards work are the additional layers. I have three: a number layer, a symbol layer, and an Alt layers.  For the number layer, I have the left side set up as a numeric keypad on the w-e-r columns. 0-+-- are on the t column, .-*-/are on the q column.  The left most column are ~, =, and <-.  The <- is because I program in R, the = needed a place, and the ~ was displaced from the number row, and is used to mean approximately or equivalent.  On the right side, the top row are the boolean (logic) operators <>$|!, for greater than, less than, AND, OR, and NOT.  For the second row, HJKL are navigation arrows for left, up, down, right.  These are the keys used by VIM (which I have set up on all of my programming environments).  Below these in the NM<> keys are beginning of line, left one word, right one word, end of line.  The two keys to the right are page up and page down. To the right of those on the edge are top of document and end of document. 

The symbols layer top row are the symbols that are shifted number keys. In the right side, the second row are the keys that were displaced: ` - = [ ] \.   The row below them are the shifted values of those same keys: ~ _ + { ] \. These were ordered so that [] and {} were in the same columns as ().  The left side second and third row has mouse navigation keys.  I actually am not sure how these work, we'll see if I figure out how to use them. :-)

The Alt layer is accessed by pressing the Layer 1 and Layer 2 keys simultaneously. The top row are the Function keys F1 through F10, with F11 and F12 continuing around the corner on the right column. The left side are keys for managing the keyboard. Specifically the keyboard lighting.  This is a wired keyboard, so there is no bluetooth to worry about. The top left button is a keyboard reset key.  The right side are audio and screen controls. Mute then volumn down and up, then media back, play/pause, media forward.  Then screen brightness down and up.  Finally Print Screen, Scroll lock, and Pause/Break.

The reason for the complications of layers is to reduce finger/hand movement. Specifically, there is no rhyme or reason for the locations of numbers and symbols, which is why people who type a lot of numbers will use a numeric keypad. So I put the numbers and basic arithmetic symbols on a number layer.  I don't have a good way of locating symbols, so I basically place them based on their normal location, with the exception of making the sets of brackets in the same columns.

For typing on the split column staggared keyboard, the basic idea is to put your hands on the table, then arrange the keyboard to be under each hand, with the columns aligned to your fingers.  However, when I do this my right hand is happy going up and down the columns. But my left hand does not want to, because it is used to having to go to the side when it goes off the home row, so I am still adjusting the left side.  The c and v keys seem to be especially problematic (my middle finger drifts right and hits 'v' when I wanted a 'c')

Just started down the rabbit hole of split keyboards., so I'll see how I master the layers. One obvious benefit, I now actually use the media keys on the keyboard, I used to always use the mouse, because I could never remember where the media key were and I did not feel like hunting for them on my keyboard.  Know, the keys are in a logical place (for my definition of logical).  I still have to think about where symbols are, but I do that for symbols that are not with the letter keys anyway. Navigation keys are easy, because I already use VIM and I did not like using the arrow keys in the lower right corner anyway.  Other potential subjects for obsession are the home row mods and key combinations. In addition to the two shift keys for Caps Lock and the F1 and F2 keys for F3, there are also common combinations for *(=[+{ that make it so these common keys can be typed without using the Layer key,

I have a Sofle on the way, and another Kickstarter backed split keyboard that I'm anticipating in November-December. This keyboard is going to be my lower profile, simple keyboard (no number row, no screens, no encoders, no bluetooth/2.4 dongle, and a 3D printed case instead of acrylic sandwhich).  I'm also exploring tenting. My simple way to start is Steepy laptop riser stand, which allow for 3 levels of elevation. I'm starting at the lowest elevation, I'll try higher options later.

Purple corne keyboard Go Northwestern!

Now back to work!

Wednesday, September 03, 2025

Subject domains that lead to failure in large language models output

At the 2025 YinzOR conference I was talking with Léonard Boussioux about types of domains where large language models (LLM) have a tendency to fail, and other conversations encouraged me to write this down.

There are stories of the early days of aviation, where a test pilot would come back and learn that his plane had cracks, and were delighted because that meant that they were learning the limits of the aircraft.  In that spirit we want to look for domains where the foundations models will give poor results, so that those developing applications can look for potential failures and design applications and train users to be attentive for errors.  For this discussion, the cause of the errors are the data used to train the foundation models.  Like other deep learning based models, to uncover categories of errors, we look at the training data.

Large language models tend to fail due to inability to work with nuance and naivete.  My friend Polly Mitchell-Gunthrie describes LLMs as unable to work with context, collaboration, and conscience.  I describe problems in LLMs as failures in nuance, naivete, and novice problems.  Again, this is due to how the foundation models are trained (effectively all publicly available text), so these are social problems, and my not be solvable in this real of LLM based AI.

Novice problems are due to the characteristics of what is available on the internet.  The majority of information on the internet is aimed at beginners. (computing topics are significant exceptions to this) So there is a lot of information that rises to the equivalent of an introductory sequence in college.  So it has a body of knowledge.  But using a body of knowledge that is targeted at introductory level leads to nuance, naivete, and novice errors.

Nuance issues are probably the most recognized.  Nuance comes into play in subjects were details matter where the answer in a specific situation is not the same as the standard case.  When given a setting, an LLM (like a novice) will take the information provided in the prompt and find other sources that include the same information and come up with an output (answer).  However, and expert would take information and fit it into an applicable framework.  Then, the expert will recognize that there is missing information that influences the final answer and ask for that information. Similarly, when considering other references, the same framework tells the expert the extent of applicability of that reference.  An LLM only matches text in the prompt with the references, so will not always check that the context of the reference matches the context of the setting of the user.  These types of issues lead experts to reach very different conclusions than people who are new to a domain, and the LLM  tend to act like novices here.  As an exercise to help people identify domains where LLMs do badly, I ask people to pick a topic that they know well, but not through textbooks or classwork, and not computer related (this tends to lead to topics that they know experiencially or through true research).  Most people identify a hobby, my manager did this exercise with his master thesis topic.  Another variation of nuance are details that frequently occur together, but are not the same.  Since the LLM works by probablistically choosing words that occur together, it can often try to combine related topics or words that should not be.  A frequent example of this is in anatomy, where LLMs trained on medical texts will often conflate the names of two body parts and into a body part that does not actually exist.

Naivete occurs when someone is in possession of facts, but does not recognize the consequences of those facts.  For an LLM, it is easy to take a prompt, then from references that match that prompt, identify other facts/details that are typically associated with the information provided by the prompt.  But unless it finds references that explicitly spell out the consequences of a particular collection of facts, the LLM will not provide the consequence.  As an example, my then 10 year old daughter had written a story that was set in a domestic setting in the United States during the 1860s (U.S. Civil War era). So when I ran her through the exercise of a topic that was not well known, she asked the Generative AI about an aspect of domestic life, specifically methods for starting fires.  Her comment was that the generative AI gave details that as far as she could tell were all true. But, it did not provide an important consequence.  When given the same set of details, a modern day chemist would mentally translate the 19th century terms to modern day counterparts, and immediately recognize that it contains all the ingredients to cause an explosion. And in real life this is what happened so there are very few examples of this technology in museums, because they all exploded. And my daughter regarded that knowing a technology meant for use in domestic (home) life had a tendency to explode to be an important detail and the LLM not reaching that conclusion to be a failure.

Another type of novice error are exceptions and crossing domains.   Many domains will teach general frameworks and rules of thumb at the introductory level.  They are intended to help practitioners succeed and to avoid common pitfalls.  However, past the introductory level practitioners learn the reasons behand the framework and rules, either from deeper training or through experience, so experts will know the exception to the rules or when to modify rules based on the particular circumstance at hand.  This is even more important in cases where multiple domains are involved, which is common outside controlled environment such as academic or teaching environments.  In this case, the standard rules for the multiple domains can conflict.  Experts will resolve this both by establishing exceptions based on the circumstance, but also looking at the ultimate goal or intent of the activity, and break or bend rules based on which rules interfere with the goals or the mission.  But they don't completely through out the rules, experts will keep in mind the intent of the rule and ensure that the intent is addressed.  When LLMs are given both the rules of the domains as well as history of prior activity, the LLMs will often identify the fact that rules are broken, and no longer follow the rules, which leads to poor outputs that do not respect the issues that arise with these domains in practice.

LLMs are especially handicapped when there are intersecting domains.  When articles or other texts are written or published, the general rule is to have anything you write/publish be on a single topic, which makes it easier to identify the target audience and for the target audience to find your work. Topics that are within intersecting domains tend to be niche topics, and are both difficult to get published and difficult to find. An thus less likely to be included in the foundation models training data.  Another area that is not found in published texts are failures.  In many domains, expertise is developed through experiencing failures. However, these domains tend not to document or publish the failures that experts learn from because of potential of repercussions or public disapproval. And if these are not published, they will not be available for training foundation models.

The purpose of this exercise is to make Generative AI useful. And to be useful the ones who work with Generative AI models have to be able to recognize and look for so that they can screen Generative AI output for other types of errors.  For example, my now 11 year old daughter continues to identify errors in Generative AI output ranging from trivial to profound, and because she has this ability, I have no concerns about her use of Generative AI.  Same with my colleagues, once they have experienced identifying errors in AI (and this holds for machine learning models as well), they are able to identify future errors and react appropriately, and not taking the outputs of AI as automatically true.  And this leads to more productive use of AI.

Sunday, August 03, 2025

Failures and how does it impact the quality of Generative AI

 I gave a talk on Generative AI as one of PyData Pittsburgh's monthly events.  While the focus of the presentation was on demonstrating impacts of randomness on Gen AI output, during the discussion we talked alot about how we teach Gen AI a specific domain, and what makes a person an expert and can Gen AI learn those things.  There were a few things about being an expert that will take a lot of work to replicate when starting with Foundation models, but one that stuck to me was the role of failure in learning,  and how hard it will be to teach this to foundation models.

My friend Polly Mitchell-Gunthrie talks about Context, Collaboration, and Consciencience when talking of the limitations of foundation models. Context is a well known, discussed, and acknowledged issue in Gen AI that we address through variations on prompting and grounding.  Conscience is both looking at issues in ethics but also mission.  But collaboration is harder, because Generative AI does not have institutional memory.  In particular, the memory of failures.

In American culture (which is where I am), we have a pressure to be perfect, and to make no mistakes and no failures.  But in a wide range of domains where there are high standards of performance, there is a maxim that is some variation of "if you have not failed, you did not try hard enough."  But even in these communities, we rarely document these failures, this level of training is done person to person, with mentors/trainers/leaders who provide cover to try different things and tolarate some level of failure in the pursuit of excellence. But more importantly for this dicussion, this does not get published, because these communities are cognizant of how intolerant of failure the general population is. But that means that the general population does not realize that the performance and excellence was developed through experiences of failure.  And the lack of documentation means that foundation models do not learn this. (for a counter example, look at baking websites that explain causes  of failure using pictures of baking disasters)

Instead of reality, the internet is a record of successes, and not failures.  This is a known problem (it is frequently discussed in academia, with journals only publishing successes, without providing lessons learned from failures, leading to a lot of wasted effort as research groups go down dead ends that other groups had already explored.) But with foundation models, that means they are trained on the successes, and not the failures.  So everything seems easy, and the Generative AI that uses these foundation models provides answers with assurance, but the people who have to implement them run in to all of the myriad of problems that come when doing things in real life.

Could you address this through grounding?  This is a cultural issue, you would need to have a record of failures, where those who went into the unknown areas of your domain were allowed to fail without adverse consequence. Then you could potentially have the Gen AI realize that a path of action could lead to an unresolved problem.  And you would have to accept the Gen AI discovering those failures, and actually telling you about them (things like this are part of the problem Gen AI has with understanding context).  So, similar to problems where there are multiple correct answers, this is as much a cultural problem in what we as a society see fit to write down (which becomes part of Foundation models), and what we do not.