By Edward Zitron

by Edward Zitron 

Modern AI models are trained by feeding them "publicly-available" text from the internet, scraped from billions of websites (everything from Wikipedia to Tumblr, to Reddit), which the model then uses to discern patterns and, in turn, answer questions based on the probability of an answer being correct.

Theoretically, the more training data that these models receive, the more accurate their responses will be, or at least that's what the major AI companies would have you believe. Yet AI researcher Pablo Villalobos told the Journal that he believes that GPT-5 (OpenAI's next model) will require at least five times the training data of GPT-4. In layman's terms, these machines require tons of information to discern what the "right" answer to a prompt is, and "rightness" can only be derived from seeing lots of examples of what "right" looks like.

[…]

In essence, the AI boom requires more high-quality data than currently exists to progress past the point we're currently at, which is one where the outputs of generative AI are deeply unreliable. The amount of data it needs is several multitudes more than currently exists at a time when algorithms are happily-promoting and encouraging AI-generated slop, and thousands of human journalists have lost their jobs, with others being forced to create generic search-engine-optimized slop. One (very) funny idea posed by the Journal's piece is that AI companies are creating their own "synthetic" data to train their models, a "computer-science version of inbreeding" that Jathan Sadowski calls Habsburg AI.

via Zinnia Jones
by Edward Zitron 

While I’m guessing, the timing of the March 2019 core update, along with the traffic increases to previously-suppressed sites, heavily suggests that Google’s response to the Code Yellow was to roll back changes that were made to maintain the quality of search results.

A few months later in May 2019, Google would roll out a redesign of how ads are shown on the platform on Google’s mobile search, replacing the bright green “ad” label and URL color on ads with a tiny little bolded black note that said “ad,” with the link looking otherwise identical to a regular search link. I guess that's how it started hitting their numbers following the code yellow.  

In January 2020, Google would bring this change to the desktop, which The Verge’s Jon Porter would suggest made “Google’s ads look just like search results now.”

Five months later, a little over a year after the Code Yellow debacle, Google would make Prabhakar Raghavan the head of Google Search, with Jerry Dischler taking his place as head of ads. After nearly 20 years of building Google Search, Gomes would be relegated to SVP of Education at Google. Gomes, who was a critical part of the original team that made Google Search work, who has been credited with establishing the culture of the world’s largest and most important search engine, was chased out by a growth-hungry managerial types led by Prabhakar Raghavan, a management consultant wearing an engineer costume.