AI Learning

No question, I am a library nerd: I’ll wander through the shelves, grabbing, random seemingly interesting bound periodicals or journals to glance. The truly interesting finds are taken to a nearby table where I may deep-dive for hours.

On a recent trip to the University of Minnesota Law Library, I looked at the cover page to learn that this journal – from the 1950s – appears to have a single checkout between acquisition and computerizing the library. I began to wonder how many other unloved books and journals exist in libraries, acquired in good faith to improve the library’s collection but otherwise just taking up shelf space.

Proceedings of the 1898 Conference between the United States and the United Kingdom to define the border between Alaska and Canada (Canada then a colony). The League of Nations annual reports of achievements, laws passed, actions taken, etc. [Surprisingly, the League of Nations only wound down after the United Nations was established). Laws enacted by the Austro-Hungarian Empire. British publications on governing the India Empire. All useful in very specific research but often overlooked, ignored, forgotten.

AI Blind Spots

Overlooked, ignored, and forgotten publications are extremely low priority for digitizing, and what’s not digitized can not be ingested by AI models. What AI hasn’t learned cannot be used in whatever analysis AI is asked to do.

The same appliesin software engineering: AI models are trained only code to which it has access, but companies are protective of their intellectual property and are reluctant to share for the greater good. Big Tech orgs have large code bases with which to train internal models, but smaller orgs do not have the same benefit: their AI-generated code is based on public repositories (e.g., GitHub) or their own code. Is this enough to generate solutions better than the engineers on-staff? Is this enough to generate viable future-facing solutions? Sometimes, but not always.

I spoke with a Big Tech AI expert at a tech conference who agreed with my contention that, until AI can learn without disclosing an org’s intellectual property, the benefit of AI continues to be limited. Increase the code from which AI learns can only improve the solutions being generated. However, today AI doesn’t discriminate between public and private code from which it is trained, therefore companies can’t trust that their intellectual property will be protected. Will we get there? Probably, but it’s going to take some time.