World Wide Web

Big Tech’s Anti-Labor Playbook Has Come for Wikipedia

in Medium  

Do we have anything left now?

In mid-May, the Wikimedia Foundation fired Brooke Vibber.

If that name doesn’t mean anything to you, here is what it should mean. Vibber took over as lead developer of MediaWiki, the platform that runs Wikipedia, in early 2003. She was the first full-time employee the Wikimedia Foundation ever hired, and its first Chief Technical Officer. For more than twenty years she was the engineer you called when something deep in the code was broken. The Foundation itself once described her as one of a very small number of people in the world who deeply understand the technical underpinnings of the system. She was also a union organizer.

A week later, on May 21, the Foundation announced it had disbanded the Community Tech team. Five engineers and a manager: gone. Their job had been to take the wishes Wikipedia editors submitted through an official channel called the Community Wishlist, and build them. It was the one team at WMF whose product owner was, in effect, the volunteer community. Most of the engineers were also union organizers.

[…]

Bernadette Meehan became CEO on January 20, 2026, recruited from a career that included Wall Street stints at J.P. Morgan and Lehman Brothers, a spokesperson role at the National Security Council, senior leadership at the Obama Foundation, and most recently a posting as U.S. Ambassador to Chile. Four months in, the longtime lead developer of MediaWiki is fired, the team that personifies community service is dissolved, and the union is in open confrontation.

This is the standard tech playbook. Fire the engineers who know how the system works, fire the ones organizing labor, hope nothing catastrophic breaks before you can ship something splashy. Twitter did it. Meta did it. Salesforce did it. Google did it. We have all seen this movie.

Brooke is a first-gen Fediversian, and an absolute legend. This is a disgrace.

Google Search as you know it is over

in TechCrunch  

The era of the “ten blue links” is officially over.

At its Google I/O conference on Tuesday, Google unveiled an AI-powered overhaul of Search centered around a reimagined “intelligent search box” — what the company describes as the biggest change to this entry point to the web since the search box debuted more than 25 years ago.

Instead of returning a simple list of links, Google Search will drop users into AI-powered interactive experiences at times. Google is also introducing tools that can dispatch “information agents” to gather information on a user’s behalf, along with tools that let users build personalized mini apps tailored to their needs.

The resulting experience will no longer look much like how people envision Google Search, which has long been defined by ranked links to websites that have the information you need.

[…]

Combined, these changes will likely further decimate Google referrals to publishers, which have already been suffering from declining referrals due to AI Overviews. This has put some ad-dependent media operations out of business, and now things will likely get worse.

There’s little time left for publishers to adapt. The new search box is arriving this week, and generative UI is arriving this summer. Both are free. The mini-app-building feature and information agents will roll out first to Google AI Pro and Ultra subscribers this summer.

The Death of Community Memory

by Joan Westenberg 

Every time I search for a solution to a problem for Drupal 10/11, I get page after page of results for Drupal 6/7. By the time of the release of Drupal 8, discussions had moved from groups and issue queues to Slack.

Communities are having the same debates over and over. New members ask questions that were definitely answered six months ago. Teams rediscover solutions to problems they already solved. Users search for solutions to problems that seem to repeat. And repeat. And repeat.

We used to have forums. And forums had one massive advantage: you could find things.

Threads had descriptive titles. There were categories. Search actually worked because the content was structured for retrieval. If someone asked a question that had been answered before, you could link them to the previous discussion instead of retyping everything.

Then Slack happened, and Discord, and Teams, and we all decided that real-time chat was simply better: More modern // more collaborative. More like how humans “naturally communicate” (as if there’s anything natural about the internet itself.)

[…]

Companies pay for Slack per user per month. The cost of storage is real but abstracted. Meanwhile, the cost of fragmenting and decaying knowledge is completely invisible until it’s too late. How do you measure the time wasted rehashing old decisions? How do you quantify the mistakes that could have been avoided if someone had been able to find that old discussion?

These costs are real and large, but they don’t show up in any budget line.

Denial

by Jeremy Keith 

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

[…]

When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.

If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

FOSS infrastructure is under attack by AI companies

in LibreNews  

Three days ago, Drew DeVault - founder and CEO of SourceHut - published a blogpost called, "Please stop externalizing your costs directly into my face", where he complained that LLM companies were crawling data without respecting robosts.txt and causing severe outages to SourceHut.

[…]

Then, yesterday morning, KDE GitLab infrastructure was overwhelmed by another AI crawler, with IPs from an Alibaba range; this caused GitLab to be temporarily inaccessible by KDE developers.

[…]

By now, it should be pretty clear that this is no coincidence. AI scrapers are getting more and more aggressive, and - since FOSS software relies on public collaboration, whereas private companies don't have that requirement - this is putting some extra burden on Open Source communities.

Configuring Firefox

Really good tips here, including a couple I'd not heard about and promptly followed:

This is the bare minimum necessary to configure Firefox so that it behaves in a reasonable manner.
This document was last updated on 27 January 2025 and was tested with a clean install of Firefox 134.
Verify these steps each time Firefox is updated.

  1. Go to uBlock Origin and click Add to Firefox
       This will filter out most of the advertisements on websites, saving you a shitload of network traffic (and if your computer is slow, not having to show all that crap is a big speedup). Once you get it set up you can just ignore it, but if you care it will tell you how much stuff it's blocked on your behalf.
  2. Go to LocalCDN and click Add to Firefox
       Most websites load the same files over and over from the same places -- primarily Google servers. This thing puts all that right in your browser, making for less network traffic and denies Google the privilege of inspecting your usage patterns. Once it's installed you can ignore it.

[…]

Mozilla's Original Sin

by Jamie Zawinsky 

Some will tell you that Mozilla's worst decision was to accept funding from Google, and that may have been the first domino, but I hold that implementing DRM is what doomed them, as it led to their culture of capitulation. It demonstrated that their decisions were the decisions of a company shipping products, not those of a non-profit devoted to preserving the open web.

Those are different things and are very much in conflict. They picked one. They picked the wrong one.

[…]

In my humble but correct opinion, Mozilla should be doing two things and two things only:

  1. Building THE reference implementation web browser, and
  2. Being a jugular-snapping attack dog on standards committees.
  3. There is no 3.

Vision for W3C

for World Wide Web Consortium (W3C)  

A pithy little declaration.

This document articulates W3C’s mission, its values, its organizational principles, and our vision for W3C as an organization in the context of our vision for the Web itself. The goal of this vision is not to predict the future, but to define shared principles to guide our decisions.

The goals of this document are to:

  • Help the world understand what W3C is, what it does, and why it matters
  • Communicate shared values and principles of the W3C community
  • Be opinionated enough to provide a framework for making decisions, particularly on controversial issues
  • Be timeless enough to guide W3C yet flexible enough to evolve when needed

Paramount Is Taking Down Decades Worth of Old TV Clips from the Web

in IndieWire  

A rep for Paramount told IndieWire: “As part of broader website changes across Paramount, we have introduced more streamlined versions of our sites, driving fans to Paramount+ to watch their favorite shows.”

For now though, many of these series are not currently available on Paramount+, such as “The Colbert Report” or “The Nightly Show.” Even “The Daily Show” has only two of the most recent seasons, encompassing 2024 and 2023, available, despite decades of the show’s history. “South Park” clips used to be hosted on Comedy Central’s website, but the only place to watch full episodes of those are via Max, not Paramount+.

The likely reason for this? Cost cutting. In a town hall this week, Paramount’s “Office of the CEO” including co-chiefs George Cheeks, Chris McCarthy, and Brian Robbins, expressed plans to save $500 million in order to stave off profit drops and one day make Paramount+ profitable.

via Dan Gillmor

Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines

Many users of web search engines have been complaining in recent years about the supposedly decreasing quality of search results. This is often attributed to an increasing amount of search-engine-optimized but low-quality content. Evidence for this has always been anecdotal, yet it’s not unreasonable to think that popular online marketing strategies such as affiliate marketing incentivize the mass production of such content to maximize clicks. Since neither this complaint nor affiliate marketing as such have received much attention from the IR community, we hereby lay the groundwork by conducting an in-depth exploratory study of how affiliate content affects today’s search engines. We monitored Google, Bing and DuckDuckGo for a year on 7,392 product review queries. Our findings suggest that all search engines have significant problems with highly optimized (affiliate) content—more than is representative for the entire web according to a baseline retrieval system on the ClueWeb22. Focussing on the product review genre, we find that only a small portion of product reviews on the web uses affiliate marketing, but the majority of all search results do. Of all affiliate networks, Amazon Associates is by far the most popular. We further observe an inverse relationship between affiliate marketing use and content complexity, and that all search engines fall victim to large-scale affiliate link spam campaigns. However, we also notice that the line between benign content and spam in the form of content and link farms becomes increasingly blurry—a situation that will surely worsen in the wake of generative AI. We conclude that dynamic adversarial spam in the form of low-quality, mass-produced commercial content deserves more attention.