AI search struggles – Phil Baker

When a new product is released, it’s usually tested to insure it works before it comes to market, but that convention is not being followed when it comes to the various AI software from OpenAI, Google and others. Instead companies that should know better are trying to one up each other by introducing and overly hyping their AI software that’s untested, noticeably buggy, and often just plain wrong.

Google, who built their reputation on (once) providing the best search engine, has even positioned results from their very buggy AI product ahead of their normal search results. You would think they would know better, but they are reacting to not what’s best for their users, but what will keep their stock price elevated and to fend off Microsoft.

Suddenly, they are in competition with Microsoft who threatens to upend the search market with their investment in OpenAI, and that’s become an existential threat to Google. Competition is usually good for the consumer, but exaggerated claims and untested products are creating an unreliable experience for all of us.

AI has some real value right now, but it’s far from predictable and often returns odd and incorrect results. Today AI is based on LLM (large language model) and answers questions by learning statistical relationships from millions of text documents during a computationally intensive process. It looks for word patterns in its huge database to best match answers to the queries. But it has no intelligence and can’t distinguish between good documents and bad.

The problem is its data comes from a range of sources that are not always accurate. It often takes a question out of context, used incorrect data, and often creates completely false answers, called halucinations.

As a widely reported example, Google’s AI product called Overview recommended using Elmers Glue to keep cheese from sliding off pizza and suggesed the consumption of rocks is a source of healthy minerals.

Their CEO Sundar Pichai defended the errors, saying the company has tested the software with over a billion queries over the past year. They’ve also blamed users, saying, “Many of the examples we’ve seen have been uncommon queries and we’ve also seen examples that were doctored or that we couldn’t reproduce.”

Gary Marcus, an AI expert and NYU professor, someone who I follow, says the [AI] tech is now about 80% correct, but the final 20% is extremely challenging, and be the hardest thing of all to accomplish.

The best way to use AI today is to not assume the answers are correct when asking it for specific facts. It’s best used for querying information whose accuracy you can assess based on common sense. For example, I’ve used ChatGPT successfully to ask it how to create a PDF book with pages that flip, what are the inside dimensions of typical side-by-side refrigerators, what are the options from traveling to San Diego to Bozeman, MT, what are the best sites for auto reviews, and for creating a job description for an engineering product manager. Most of the answers can be found on the web with a search or two, but the AI software is often much faster and provides the results in a convenient format without requiring additional clicks.

Gary Marcus, describes the glue and pizza error as partial regurgitation. Compare the automatically-generated bullshit on the left (in this case produced by Google’s AI) with what appears to be the original source on the right.

In this case AI found a reference that was not serious. Other nonsensical results were attributed to using The Onion as a source of data.

As far as search is concerned, we are in for a rough time until these companies can figure out what to do. And we are the guinea pigs until they do.