The case of the eternal blur

a bug hunt horror novella

Alex Kimi Wolfe
Medium Engineering

--

This story documents a bug fix, a particularly elusive one. The kind you encounter when trying to build something in a stack where the original architects were brilliant, but no longer around, and maybe didn’t completely agree on how some fundamentals should work together.

Damn, it would be satisfying to rewrite the stack one day, but until then you have to navigate + make the best of what you’ve got. It’s probably the skill I’ve developed most working at Medium.

So Brad + I have been working on these elevated collection pages + posts for a small eternity. We forked the post page, ripped out all the cruft, and made this rad index page that shows off all the posts in the collection with some branding. We’ve had a few different collection concepts on Medium, but this one is optimized for our editorial team to showcase some of the collabs they’ve been working on, like Unruly Bodies with Roxane Gay.

our new shiny collection index page

We identified the last stray bugs, patched them, and I made some time for some purely engineering improvements I couldn’t squeeze into our first version before I bounced for 🌴 hawaii 🌴 for a bit of workcation.

One of these improvements was to make that giant blue Unruly Bodies cover image progressive + to have a nice placeholder for it while it loads. (Before, the text would jankily shift to the right on first render, then the image sloooowly painted in over mediocre connections)

Our progressive image code takes a super tiny version of the image that loads in quickly + blurs it out as a placeholder until the real one is ready. This unfortunately synonymous with when javascript is ready, but it’s still way better than a jagged paint-in.

We used to only use this image treatment on our post page (rendered in the backend) and for horizontal images. I moved the template generation code over to the client, flipped the logic on its side to support vertical images and was extremely pleased with myself.

ahhh yeah once that image loads in this is gonna be so premium
TLDR: my progressive image is definitely broken

Top of the next morning, our valiant editorial team files a bug with me over slack. The issue is a weird one, transient, and I can’t reproduce at all. The blurry placeholder image loads in fine but then is never swapped with the real asset.

I have some wild stabs in the dark theories, but it’s impossible to say until I have my very own eternally blurry image in my grasp.

I give up for the evening, 🙏 for transience and eat a spam musubi on the beach for dinner + fall asleep.

Design + Editorial do not give up trying to help me conjure this eternal blur on my laptop, and are on it top of the morning.

brad is the best, and this is sherlock level sleuthing
pray to be blessed with wonderful co-workers willing to help you repo obscure things, especially when you’re the only eng on a project with a million threads to pursue

Brad + Euni notice the issue only happens at very specific browser sizes and seems to happen reliably in one window size vs another.

Equipped with something to look for, I brace myself to flail around some of our more ancient code.

THE SCENE

Alright before we delve into the progressive image code looking for view height clues, let’s set the scene of this heinous bug crime.

This page looks pretty basic, but there are a few moving parts that have some fairly interesting front-end components. There’s the full bleed progressive image, and also an infinite scroller that handles pulling in new pages of posts.

There’s also a footer that scrolls into view after you reach the end of the potentially infinite div full of posts. (The code for the footer only exists stashed on an in-progress branch on my computer at the time.)

a rough dom → component breakdown. We have a js framework that imposes enough reality to get the job done, and a single Screen component coordinates the rest. It also can fetch data from some app wide singleton services they all share.

All of these components (and my fledgling footer code) subscribe to two services: DomMonitor and ElementTracker . These are singleton services, that intercept scroll events and read element positions in a performant way that doesn’t trigger a re-paint.

These two are key to the rest of this rabbit hole, so lemme quickly flesh them out for you.

ElementTracker is a clever piece of code by Daryl Koopersmith that’s a cornerstone of how we handle scroll events in our web client.

Basically scrolling is trash, the browser emits events constantly, and even accessing an element’s layout properties to see where it is on the page can trigger a repaint. If you’re constantly triggering repaints while scrolling tied to the native scroll event you’ve got yourself some choppy garbage animations.

🗑 ☠️ 🗑

ElementTracker “tracks” elements by measuring their start position, keeping track of user scroll actions, and estimating where they’ll be on the page given the scroll delta and the current viewport. It then hooks into window.requestAnimationFrame, (which if you’ve ever done canvas animation will be familiar), which is only emitted when the browser does a natural repaint, and piggybacks on that to do our element checks. This is a key optimization in our rendering, because it avoids the excessive repaints in the naive approach.

The draft W3C interaction observer API does something similar and I’ve been dying to mess with it, but our version works on IE and is fairly battle tested.

ElementTracker sits on top of our DomMonitor service that does the heavy lifting of listening to the browser’s real scroll events and emitting the optimized versions we use for our code.

SLEUTHING

Alright back to the bug at hand, our image loading code. Progressive images are loaded when ElementTracker is “refreshed”, e.g. on throttled scroll or a performant repaint.

I dig into the event handler, and it dictates we decide whether we need to load an image by extending the viewport by 3x (which captures a page above and below), and then we check to see if there are any progressive images within that extended viewport. If so, it triggers some javascript to load the full-resolution image + implement the swap.

on post pages it works a bit like this, checking for images within the window (one page up and one page after) and ensuring their images are loaded in. It does this by making sure the pink rectangle intersects with our progressive image.

Our image is smack in the middle of the viewport + this code is also run once when the ProgressiveMediaLoader is activated. We should be fine for triggering an interception/image load.

But it also seemed like a sensible place to start debugging and sanity check. I print out the viewport + the rect that represents my progressive image

……theres some random negative garbage in here
I submit a plea for help into the slack void
these are the two rectangles that progressiveMediaLoader is looking at

Instead of accurately representing the elongated viewport, the ProgressiveMediaLoader on the index page is looking waaaaaay to the left. The bounds for the image are also way off!

Ahh damn. They’re like a hair’s width apart. It’s probably a pixel rounding thing on certain screen sizes triggering an overlap.

But why are these rectangles so recklessly flinging themselves into the void?? Why is this happening only to me?? I get a shaved ice to fend off my existential crisis and eat it on the beach.

the usual culprit

FLASHBACK: a week earlier

euni: oh hey alex aztec yoga is getting cut off, it’s only showing 10 posts.

me: damn, this page used to pull down every post in the sequence async, and i just put in some real paging. 10 posts come down now with the first page load, and the rest are supposed to come in on scroll.

me: let me see what’s up with these scroll events

euni: cool, we’re rolling out 15 posts for unruly bodies on monday, can we fix it by then?

me: for sure

(this conversation is summarized + semi fictional 😂)

FLASHBACK (bug inception!! don't stress, quick pagination detour):

Lists of thing that paginate on infinite scroll is Medium’s bread + butter. The code patterns for this are pretty set, I just broke them trying to be clever with this layout. We’re under time pressure for a fix so I want to patch it fast.

Normally you’d just make the image on the left + the metabar fixed. Fixed elements can’t interact with any others in the dom — so instead I set the posts elements to have an overflow-y: scroll to preserve the scrolling behavior, and have the posts + the image as flex siblings so they can figure out the best width for themselves.

Despite the most ominous sounding warnings I could cook up, we’re supporting some text baked into images here. Full bleed split screens kind of look like garbage as they get more square, especially if you have to be super careful never to clip your image. It’s also way too wide to kick the layout to tablet.

I wanted to use flexbox for more granular responsive rules, I needed an overflow-y: scroll on the posts div to make it work, it was all fine. Until we needed a second page of posts for the first time and I realized the posts div wasn’t triggering the scroll event it needed to request more items.

The InfiniteScroller code that appends the next page of items at the end of the list ALSO uses ElementTracker / DomMonitor to check whether we need to load in a new page.

They decide what element to check for events on by querying this method on the master Screen component, that coordinates all our lesser components, which always tells them to watch document.body.

base screen returns `document.body` every time without fail unless you override it, something we don’t do anywhere in our current incarnation of our codebase.

Since I made the overall layout fixed, this is no good. document.body isn’t actually the one scrolling, the element that contains the post is.

Scroll events don’t bubble for performance reasons, and our event listener for scroll…..is wrapped so deeply in our custom api that changing it to capture would be a bit of a nightmare.

This is unfortunate since we do find the correct scrolling element, and theoretically set InfiniteScroller to watch it.

this piece of code looks for the closest scrolling element to where you initialized the infinite scroller code, and attaches it to our main Screen component which controls the whole page

However, when it calls this._infiniteScroller.attachToScreen() here, InfiniteScroller ignores its scrollElement and just relies on whatever the Screen’s instance of DomMonitor (the service that intercepts all scroll events) is watching, which is always document.body.

Looking deeper, the scroll element for InfiniteScroller really is only used for measuring, and not for grabbing events off of.

aaaaahaha it turns out this method is effectively useless, as diligently documented

This seems like an obvious bug caused by loosely coupled components bound with a layout that demands behavior that we haven’t had to accommodate before.

I move things around so that BaseScreen.getScrollingElement returns the post elements container for this screen vs. document.body.

Paging works and we’re gooood tooo goooo 👍

PRESENT DAY: BACK ON THE TRAIL OF THE ORIGINAL PROGRESSIVE IMAGE BUG

……..progressive images are still broken, it’s my fault, and its definitely due to my paging patch

Alright so I’m pretty sure I caused the eternal blur myself with my quick fix for paging. By setting the scrollingElement to the posts container waaay off to the right, all of the other pieces of code (including ProgressiveImageLoader) that rely on the Screen returning document.body broke.

Somehow that culminated with apparating my bounds check we use to see if an image is in the viewport way off to the left, I’m not going to question it too much

the current state of the world for recap. I really just wanted that full bleed image to load in, but its not because my progressiveImageLoader doesn’t think it’s in the viewport.

cool. cooooooool.

Alright, rather than changing this semi global scrollingElement that a bunch of random code relies on to fix paging, it would have been a much stronger pattern to wrap up koop’s note on the effectively useless method in InfiniteScroller.

I want to set a private scroll handler for paging that watches my overflow-y: scroll container vs relying on the same global domMonitor events on document.body as everything else, so I go ahead and do that.

This allows me to have my overflow-y: scroll element that I needed for the precious layout, but no other service or component has to know.

A lot of the web client is littered with mostly finished just in case api endpoints. InfiniteScroller has one of these that just claims it will attachToElement , passing in a specific element vs the attachToScreen. It’s perfect. I point it to my overflow-y: scroll element.

Progressive images load in reliably + beautifully + paging works on my element on Hatch. I’m very satisfied, eat a bowl of poke on the beach and go to bed

THE NEXT DAY

(hatch is our staging environment)

…..my patch breaks paging on the homepage, so Eduardo reverts it.

THE SAGA CONTINUES

ok so i check out the homepage

and ….it’s definitely still attaching itself to document.body like it should. So there’s got to be some difference between the private scroll listener + the domMonitor one that’s breaking infinite scroll there.

Errr it looks like chrome is cancelling our deferreds we’re shooting off on scroll that checks to see if we need to load more items.

I check the convenient “just in case” api call I hooked my code up to earlier. It set’s a private scroll handler, but the throttling behavior is kind of wonk. It’s definitely triggering chrome to physically block these this.onScroll calls which would summon the next page of posts.

Our hard coded throttle delay is 200ms and whatever obv.util.throttle is doing is messing with the listener. If i remove the bit of code that does the scroll throttling, paging works fine again.

I can remove the delay, but then we lose the optimizations we get from throttling, and potentially get choppy scroll animations again. Ooooook. This attachToElement api in InfiniteScroller is only used one other place and is probably old/out of date.

DomMonitor obviously has an up to date and working version of scroll throttling, that this api call was probably deprecated in favor of.

It would be annoying to rewrite, and weird to have two places with that logic. I should probably abstract it into a shared throttled scroll handler and then also do a gradual rollout so I don’t inadvertently break a different part of the site…

I eat another poke.

I look sagely into the infinite void of stars where they meet the waves. The infinite void that definitely doesn’t scroll and instead just exists in peace.

Ok at this point there are 3 ways to fix this horrible slew of problems.

  1. Re-implement domMonitor’s throttled scroll into InfiniteScroller. InfiniteScroller will be a real component tightly coupled to the div it’s adding posts/streamItems to. It will also work with horizontal scroll (like in a carousel) if we ever want to do that in the future.
  2. Re-implement the way ElementTracker looks at the viewport. It probably should use the actual viewport and not derive it relative to the scrolling element (which caused the broken negative left bounds earlier) when deciding whether a progressiveImage is in view.
  3. …..re-write my template so it scrolls with goddamn document.body

The void stares back from behind my eyes and I choose option 3, to restructure my html ☠️ 💀 👻 so the image is fixed and document.body is the one that scrolls like the rest of medium.com. You don’t get nice flexbox image widths but max-width: 45% is roughly what it was doing most of the time and is definitely good enough. I put a horrible placeholder div that lives under my fixed one that keeps the text pushed to the right and somewhat flexed.

…I’m also going to patch solutions 1. and 2. even though they are no longer relevant to this layout but because hot damn there probably will be a case where I have to make infinite scrolling work in an isolated element, like a future carousel of doom…..a death carousel full of progressive images.

I re-write my dockable footer…..haaaah remember that feature……….and it’s easier this time since I don’t have to account for apple’s native rubberbanding on the FOOTER’s scroll events when I hit the bottom of my overflow-y: scroll div, since it can just naturally live on the bottom of document.body and I can look for the border between the two.

I flip the switch to roll out the finished project + my sweeping rampage of bug fixes across all collections to all users.

I lay down on my own grave.

yes i have charles, yes i have.

if you too would like to brave the eternal fires of our web client, and help it emerge like a phoenix from the ashes, come work with us

--

--