There is a really interesting story that came out just a few days ago, on how a tech executive was (allegedly) able to get negative content about him removed from Google.

This was reported by investigative journalist Jack Poulson on his Substack here - The dirty tricks of reputation management: from PI firms to sabotaging Google Search and as it was relating to the non-profit organisation Freedom of the Press Foundation, they also wrote about it here - Censorship Whac-A-Mole: Google search exploited to scrub articles on San Francisco tech exec.

You should definitely check out at least the last piece above before continuing here 🙂

The interesting thing with this story is that Google's "Request a refresh of outdated content in Google Search" tool was being misused, but importantly, a bug existed which allowed a 3rd party to get content removed from Google with no questions asked.

Google's remove outdated content tool

As SEOs will know - within Google Search Console you can get content removed easily if you have access to manage the website in question (via the Removals tab) - but you couldn't (and shouldn't) be able to use this tool for sites you don't manage or control when the content is still live.

The tool is meant for when you want Google to "update search results for pages or images that no longer exist, or pages that have deleted important (sensitive or critical) content".

This might be because there's genuinely wrong/harmful/damaging info that was published, and has since been corrected, but Google hasn't yet updated to reflect that new information. So, there is a genuine need/use case scenario for this tool.

In a similar vain, blackhat SEOs (SEO consultants who are happy to do things that are deemed illegal - or go strongly against Google's guidelines) have been known to do fake DMCA takedown requests to attack competitor sites. This is where they might make an illegal copy of a website on a domain they register, and then put in the DMCA request claiming that the original source of an article copied their new piece. This was known to work well and was a constant battle for genuine website owners to compete with (the DMCA request would often be successful and was often done under a fake name). This still happens today, and as it's a legal process it seems Google often gets caught in the middle.

But what is going on here in the above story is quite different and involves a different mechanism - the Refresh Outdated Content tool (this is a decent explainer from Google). This is a tool I'd honestly not noticed or needed to use before.

So to recap the above story - someone had been using Google's Refresh Outdated Content tool to hide negative stories about their client which had been appearing on the Freedom of the Press Foundation website AND on investigative journalist Jack Poulson's Substack. There were a lot of legal threats made to Poulson (again, it's worth reading his piece to get a better understanding of the full backstory).

As mentioned earlier - Google's tool wasn't working correctly. Someone had even flagged this in the Google Webmaster Forum a few years ago - and Google's Search Liason said they would investigate further. Either they didn't, or they did and didn't do a very good job at it.

Screenshot of the Google webmaster forum thread on this tool and its abuse.

The Importance of URL Case Sensitivity with regard Content Censorship

[Sidenote: there's a weird irony here as this whole story is based on a Tech Exec and their "case sensitivity" (accusations of domestic violence...) when the actual Google bug/exploit is also based on URL "case sensitivity" of the website in question...]

The Freedom of the Press Foundation website doesn't appear to use a standard web CMS - it might be using the Django Framework (a free and open-source, Python-based web framework that runs on a web server).

And vitally, if you were to visit one of their page URLs, using capitalised letters of the URL itself, you will find you visit a 404 page, aka the page cannot be found. For example, try visiting both URLs listed below:

Normal / working URL:

https://freedom.press/issues/anatomy-of-a-censorship-campaign-a-tech-execs-crusade-to-silence-a-journalist/

Same URL with uppercase characters included:

https://freedom.press/ISSues/anatomy-of-a-censorship-campaign-a-tech-execs-crusade-to-silence-a-journalist/

You should find that the latter URL takes you to a 404 error page (Page Not Found).

Now - this potentially leads us more towards the direction of infosec as a topic - but in web terms, servers should usually be configured so that if you try to access an upper-case URL variant, you end up being taken to the lower-case URL variant.

As an example - try going to https://www.GOOgle.com and you will end up at the correct lowercase version of that URL (without noticing anything happening in your browser).

The lower case URL variant is what SEOs would call the canonical version of a URL - the version you want search engines to index. Otherwise technically you could have multiple variants of a URL in existence, which could be very confusing for search engines to deal with, especially if websites link to upper-case versions of a URL.

This is the critical point here and the reason I write this entire follow up article - what happened to the Freedom of Press Foundation (and later also to Poulson's Substack site) would not have happened if it fixed this URL case sensitivity issue which is still affecting their websites.

In the case of Poulson, fixing this may be trickier as Substack appears to suffer from this very same issue. You will noticed this when trying to visit a capitalised version of a Substack post.

To highlight this - try visiting this capitalised URL in your browser which is where Jack Poulson's story was published:
https://jackpoulson.substack.com/p/dirty-TRICKS-reputation-management-blackman

I won't go deep into the technical setup of Substack (because I couldn't - I don't know their setup!) but in general terms, on Linux/Unix-based servers (such as Apache or Nginx on Ubuntu), file paths are case-sensitive. But on Windows servers (such as IIS) file paths are case-insensitive. There may be exceptions to this rule but that appears to be the default.

Poulson's write up on this also mentioned how someone at The New York Times was able to replicate the exact same issue - and after checking myself, this is because they also suffer from the same URL capitalisation issue (visit a capitalised URL and you end up on a 404 page).

Google Have Since Fixed This Issue

According to the write up, Google did take actions to fix this "bug" and to prevent it from happening to other sites in the future, which is a rare step for them to take. It's likely that they've realised it's quite a serious problem - to be able to suppress content from their search engine so easily.

But we should also remain aware that by having the URL case sensitivity issue on their site/server, their site was vulnerable to such an "attack".

For many technical SEOs out there checking for URL case sensitivity is likely one of many checks they will make as part of auditing a website for issues. And even if there's a relatvely small chance of such a problem occuring on their site, it's definitely something to be wary of.

I did some tests here to check if the tool may still be vulnerable to abuse, and it does seem to be ok. When submitting one of there removal requests, it looks like you can still send them - but later when refreshing the page you will notice the status updates to show "Denied: Page not indexed".

Denied: page not indexed status from Google

This appears to confirm that Google will check the case-sensitivity of the indexed URL vs the requested indexation; so it is able to differntiate between the two, which previously wasn't happening.

It's very eye-opening to see how open this system was to abuse, so it's great that Poulson was brave enough to bring this to light (despite the law suits sent his way from the tech exec in question) and for the Freedom Of The Press Foundation for their own write up.

What's interesting to note here is that when you make this Removal Request, when you have no relationship to the website in question (aka you have no Google Search Console access) you can still see these requests being reflected within your own Search Console property. To prove this - I used a different Google account to make a removal request on a Substack blog that I used to run.

Outdated content requests in Search Console

Whilst the Status column confirms this request was denied - this might be an important report for site owners to look at if they fear that their content might have previously been removed, as a result of this bug.

As owner of the Search Console property above, I received no kind of notification to warn me that these content removal requests took place - and I'd argue this is one of those occasions where I would welcome such an email notification, at least to alert me (better than being alerted about superfluous issues like new reasons a page can't be indexed...).

Outdated content tab in Search Console

So as a takeaway - I'd urge site owners (particularly those who run news publishing sites) to keep a wary eye on the Outdated Content report within Search Console, under the Removals tab, just to keep an eye out for any nasty surprises.

Side note - the image below shows what the Outdated Content report looked like from the Freedom Of The Press Foundation, highlighting the constant requests that were being made. So if your report looks anything like this, be wary of what might be going on!

Outdated page removal abuse in Search Console

If your website is on Apache you can make a fairly simple .htaccess update to force lowercase rewrites on the server side - but for most news publisher sites (affected by this) its unlikely that they are hosted on an Apache server - they're likely using Unix/Linux servers, where this URL case-sensitivity is enforced by default.

Also worth pointing out that you shouldn't jump into making such a technical change across your site without fully understanding the implications and potential impact of doing so - particularly if you care about your SEO performance!

Additional note that if you use a CMS like WordPress (like me) you are probably fine.