It Takes a Village To Read an RSS Feed

Published in

Medium Engineering

6 min readSep 29, 2014

This Sunday morning, I was sitting at home, sipping my coffee, and browsing the feeds of my favorite Medium collections with Feedly. My friend Dan shared a post “@medium your rss feeds are @ mess.” In the post, Alan Levine explains that Medium RSS feeds are broken in all sorts of ways. This surprised me.

I snapped into action! I helped build Medium. Alan took the time to put together a thorough list of bugs to fix, and by God, fixing bugs is what engineers enjoy best. Then I started working through the list, and what I discovered was…not what I expected.

This is not just a story of a list of bugs. It is a story about how the World Wide Web works in theory, how it actually works in practice, and why it takes a village to make an RSS feed work.

URLs

Alan points out that Medium uses the “@” symbol in URL paths.

Everything I publish on medium is available at https://medium.com/@cogdog I am suspicious of having “@” symbols in the URL, but technically it is valid.

Later on, he suspects that this might be causing problems for some feedreaders that don’t handle the “@” symbol properly.

I think I know exactly where this confusion started. RFC 2396 (the spec for URLs) has this to say about “@”:

 Many URI include components consisting of or delimited by, certain
 special characters. These characters are called "reserved" [...]

 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
 "$" | ","

But that’s in section 2.2. Further down, in section 3.3, we find:

The path component contains [...]

 path = [ abs_path | opaque_part ]

 path_segments = segment *( "/" segment )
 segment = *pchar *( ";" param )
 param = *pchar

 pchar = unreserved | escaped |
 ":" | "@" | "&" | "=" | "+" | "$" | ","

What this is basically saying is that “@” is OK in some parts but not in other parts. The URL “https://medium.com/@nicksantos” is valid, but “https://medium@/” is not.

This is an obnoxious distinction if you’re writing a library for escaping strings for URLs. No one cares enough to think about whether they’re escaping for the path, or the domain, or whatever. Just make it safe, dammit. JavaScript’s escapeURIComponent(“@”) returns “%40” and calls it a day.

Thus, a lot of users on the web became skeptical of “@” in URL paths, and suspicious when they see it.

Medium uses NodeJS’s url package and closure-library’s goog.Uri class, which both handle URLs holistically and use a narrower escape function for each part.

RSS Subscribe

Alan wishes that Medium had an RSS “subscribe” button.

Or at least in the old days, sites would publish little RSS icons with “Syndicate Here” links. It was a one click operation to add a site to your Google Reader.

But that ignores so many interesting things that have happened between 1999 and 2014! First, every blog had an RSS icon button. Then, we realized that this was dumb, and browsers should show the icon by default. Then browsers measured how often they were used, realized that no one uses them, and made them an option. Here are the relevant issues for Firefox and Chrome.

You can still get the RSS button in Firefox by right-clicking on the address bar and turning it on, or in Chrome by installing an extension. (I don’t know what’s available on Safari or IE or Opera.)

The idea of every site having their own RSS button has been tried and failed for building a user base. I’m a big believer in RSS, but we need to try new things.

Human-Readable HTML

Alan also wishes that Medium’s HTML was more human-readable.

in the effort to minify (minimize the file size), medium decides to remove the carriage return characters it its own source HTML.
It says, “no one looks at the HTML source so we decided to make it unreadable to humans”.

Yes, Medium generates our HTML with Closure Templates, which strips comments and redundant whitespace.

There are a wide range of tools that can auto-format the HTML for you, color-code keywords, and even translate the text. There is no good reason for every site to pretty-print their HTML.

Medium’s html looking pretty after going through xmlstarlet’s formatter

Relative URLs

Alan wrote a tool for auto-discovering RSS feeds, and it doesn’t work with Medium’s HTML.

medium.com does not even report a full URL for my feed — it returns simple /feed/@cogdog — a relative URL a browser might understand, but not another site expecting to autodiscover an RSS feed URL.

The documents on HTML are pretty clear on this: a <link>’s href is a URI, and any URI in HTML may be absolute or relative.

The bigger misunderstanding here is who these documents are for. I’ve met many web users who believe that because specs are big and complicated, only big complicated browsers need to follow them.

To me, this is one of the miracles of the modern web. It’s populated by a wide range of engineering levels: browsers written by thousands of people, and one-off tools written in an afternoon, all governed by a single set of messy technical specs that people routinely believe don’t apply to them.

We may compromise and decide to use absolute URIs here anyway, because the web is built on compromise.

Channel Author

Alan points out that the Medium RSS feed doesn’t validate, because it contains a channel author.

This is a good call. We’ll remove this in the next release.

Feedburner

Alan points out that Feedburner can’t scrape feeds from Medium.com.

It might take a knowledgable developer 20 minutes to diagnose and fix this problem on medium.com. It’s a template fix (I guess).

But when we look at the error, we find something more mysterious.

http://feedburner.google.com/fb/a/myfeeds

A 400 error code? I looked this up on http status dogs. A 400 means that the request failed before Medium even gave a response. A template fix definitely won’t fix the problem.

So what’s going on? Medium encrypts all traffic with https. Feedburner does not support encrypted traffic.

We’re a big fan of HTTPS everywhere to protect user privacy. We even use the strict-transport-security header to tell browsers that they shouldn’t even bother trying to connect to Medium over http, because we’ll redirect you to an encrypted version.

Google has also said that they’re a fan of https by default, so we’re surprised that Feedburner doesn’t support it. If people believe that this is a widespread issue, it might make sense to serve Medium feeds over a separate, unencrypted domain.

WordPress Plugins

I saved the juiciest bug for last. Alan observes that if you try to grab Medium feeds with the Feed WordPress plugin, the server returns an empty response.

It might be that medium.com blocks such requests. It might be a problem with “@” symbols in URLs for the curl function (here I am wildly guessing).

Yes, Medium does block the request.

Several months ago, Medium started getting staggering amounts of traffic from machines identifying themselves as WordPress. We had no idea what was happening. But we have many friends who contribute to WordPress and work at Automattic, so we reached out to them.

WordPress has a feature called pingback. Anybody on the web can tell a WordPress machine, “hey, Wordpress, make a bunch of requests at site X.” Because there are so many WordPress sites, it does not take much technical skill for a kid to turn those WordPress sites into a botnet, and send crushing amounts of traffic at a site.

WordPress botnets were making sustained attacks on Medium. The recommended solution from the WordPress devs was to block the traffic.

https://twitter.com/stephdau/status/482387243748061184

https://twitter.com/nacin/status/482388589557202944

We are deeply unhappy with this solution. We have been looking for ways to make a more targeted block, to stop bad traffic while allowing good traffic. This is a difficult problem.

I think I covered all of Alan’s complaints. Leave notes in the comments if I missed any.

To me, RSS embodies the great idyllic vision of the early web.

Yes, it’s governed by a mess of specs that are frequently misunderstood or ignored. It’s stitched together with tools that were written in an afternoon, or written years ago and are no longer maintained. We routinely have to make compromises to make it all work, or just to keep our sites running in the face of attack.

It is a chaotic mess, but it is a chaotic mess that we all built together. ❤

Medium Engineering

It Takes a Village To Read an RSS Feed

URLs

RSS Subscribe

Human-Readable HTML

Relative URLs

Channel Author

Feedburner

WordPress Plugins

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Medium Engineering

Written by Nick Santos

Responses (5)