The Unluckiest Paragraphs

A Tale of CSS and Why Parts of Medium Sometimes Disappear

Nick Santos
Medium Engineering
Published in
7 min readDec 4, 2015

--

Yes, this is the kind of story that starts by quoting the W3C’s CSS 2.1 Specification on how to apply style rules to web pages.

6.4 The cascade

Style sheets may have three different origins: author, user, and user agent [the web browser]. […]

Style sheets from these three origins will overlap in scope, and they interact according to the cascade.

The CSS cascade assigns a weight to each style rule. When several rules apply, the one with the greatest weight takes precedence.

By default, rules in author style sheets have more weight than rules in user style sheets. Precedence is reversed, however, for “!important” rules. All user and author rules have more weight than rules in the UA’s default style sheet.

This text is a small-picture look at one of the grand underpinning ideas of the web. Every web page is a democracy of three parties: the author, the user, and the web browser. Each gets to contribute to the overall style of the page. And a set of laws decides how those contributions collaborate and merge, to get us to the web page you see on your screen.

The precedence rules always remind me of Isaac Asimov’s “I, Robot,” where he lays out simple rules for how robots should behave, then tells stories of edge cases where the rules collide to cause problems. But Asimov’s an optimist. He believes that simple precedence rules can lead us to enlightenment, even though the road may be bumpy. It’s a comforting read for anyone that writes technical specs.

This post is not about enlightenment. This post is about the problems.

In early January, a user complained that they were not able to load the post “Advertising Is Not For Geniuses.” On the author’s profile page, they would click on the post link, and get an error.

“Something went wrong.” It’s hard to write a good error message when we have no idea what the problem is.

Our engineers couldn’t reproduce the bug. One user support guy could. He made a screencast. This became our shining example of why screencasts are useful: one inconspicuous icon in the corner gave the bug away.

That stop sign is an ad blocker.

On most webpages, if you click a link, the browser automatically handles loading a new page. On Medium, we speed this up a bit with JavaScript. We send a request to “https://medium.com/@ritasustelo/advertising-is-not-for-geniuses-5d1ffbc505ac?format=json,” download the article text, and render it in your browser.

The ad blocker believed we were requesting advertising to show, and blocked the request.

The media has been paying more attention to ad blockers in the past few months. In August, Adobe and PageFair released a report showing a steady rise in users installing ad blockers. In September, Apple launched iOS9, which allowed ad blockers on iPhones.

If you think of the separation of powers between author, user, and web browser as a weird sort of government, ad blockers are The Freedom Caucus. Also known as The “Hell No” Caucus. The ad blockers are the hell-no backlash against a web plastered with too much advertising.

If this is new to you, go read “Welcome to the Block Party,” which is a great summary of what’s going on.

Most of the media coverage focuses on whether ad blocking is a good idea. That’s not the part I’m interested in here, because — to be obnoxiously pedantic — ad blockers do not “block ads.” They create a set of rules to try to classify ads, and implement a set of measures to block what they classified.

There’s a big gap between what is classified by those rules, and what’s an ad.

Some of that gap is philosophical. How do you define “advertising,” maaaaannnnnn 🌿? Some is technical. Often the classification rules are laughably simple, like checking if the address has the word “advertising” in it. Sometimes they’re more complex.

One of Medium’s main theses is that “page views” is a terrible metric. It incentivizes some of the worst parts of the web, like short articles split across multiple pages to increase ad impressions. Our data science team experiments with better metrics, like how much time people spend reading.

To track these metrics, we send requests to the `/_/stat` route.

Many ad blockers also double as privacy protection. Someone added `/_/stat` to the EasyPrivacy block list.

We used `/_/stat` for other types of statistics, including:

  • Are you experiencing errors?
  • How slow is the page?
  • Have you seen the onboarding dialogs yet?

Counterintuitively, users with this ad blocker installed saw lots of redundant onboarding popups. When they sent us emails help, we found it difficult to diagnose their problems.

It gets better, though! Not all ad blockers implemented the EasyPrivacy list consistently. Some matched only `/_/stat`. Others matched anything beginning with `/_/stat`, including `/_/static/icons.svg`. Many users saw their icons vanish.

We wrote a post to assure people that, no, our icons are not clandestine payloads for advertising. After some conversations with the block list maintainers, and the EFF, we simply changed the URL.

By this point, we’re used to random things on Medium disappearing due to ad blockers mis-classifying. But the best one came in a few weeks ago, when someone complained that a random paragraph in a post “6 step East European weight loss system” was missing.

In the browser on the left, item (3) is missing a paragraph.

To understand what happened, start with the Medium data model.

All posts are represented as a list of paragraphs. We give each paragraph a unique name. The code to generate names is a one-liner:

Math.round(Math.random() * 0xFFFF).toString(16)

The oxFFFF is a hexadecimal number that translates to 65,535.

This is a programmer in-joke. In “normal” math, numbers are written in base ten. You have ten digits, 0–9, and when you get to the tenth thing, you add a new position with a ‘1’ digit. Programmers sometimes use hexadecimal (base 16) numbers, which have 16 digits: 0–9, a, b, c, d, e, and f. The main advantage of this is that you can spell cool, instantly recognizable numbers like “deadbebad” or “eatbeef.”

“Math.round(Math.random() * 0xFFFF)” means “pick a random number between 0 and 65,535.” Our longest posts are on the order of 1,000-ish paragraphs, so this seems reasonable. “.toString(16)” means “format that number in hexadecimal.”

Coincidentally, “ad” is a hexadecimal number.

Can you guess what happened next?

One of the ad blockers rolled out a rule that blocks anything with the ID “ad01” or “ad02.” Two out of every 65,536 paragraphs in Medium posts disappeared.

The fix we added was simple, but unsettling. Way back at the top of this post, we talked about the rules of CSS precedence. They are not absolutely calibrated towards one party. They follow a set of weighting rules. There’s an old (slightly inaccurate) joke about it.

Q: Who’s the most important person in CSS?

A: The one with the most class.

We can escalate by adding an extra “.post-article” class to our selector.

.post-article #ad01, .post-article #ad02 {
// No, really, display the paragraph,
// even if somebody else says not to.
display: block !important;
}

Some blocker could still roll out a yet more heavily-weighted rule to override our rule. We’re hoping they don’t.

I feel frustrated by this dynamic.

On the one hand, we have a three-party system that decides how web pages display: the author, the user, and the browser. Each party gets to add their own rules. But the author — site owners — get most of the blame when things go wrong. Power is distributed; accountability is not.

On the other hand, I’m sympathetic to what the ad blockers are trying to accomplish. We don’t like the glut of ads and tracking on the web either. The tools that ad blockers have to fight back are blunt, and imperfect.

Ad blockers will create collateral damage, and page authors are responsible for dealing with it. But here’s the catch: this dynamic is good for Medium. We have a user support team responding to these bug reports, and an engineering team fixing them. We can handle this damage.

Random blogger Pat running their personal blog likely does not have the time or energy or expertise to handle this.

A web ecosystem with ad blockers is more complicated for authors to run and maintain. A more complicated web favors big, centralized players and disfavors small, independent ones. But maybe a web of small independent players is impossible to save.

Thanks to the many people who did the hard work of reporting, diagnosing, and fixing the bugs described in this post, most notably Koop, Greg, and one engineer who asked to remain anonymous (and who also suggested the title). Let us know about your favorite bugs on Medium by writing in to yourfriends@medium.com.

Many of the ideas about CSS are from F A T’s talk on the history of CSS and the cascade, which you can watch online.

--

--

Written by Nick Santos

Software Engineer. Trying new things @tilt_dev. Formerly @Medium, @Google. Yay Brooklyn.

Responses (13)

What are your thoughts?