A look at PeoplePerHour’s Organic Search Strategy, and issues with indexable internal search results
- March 27, 2018
- Technical SEO
I’m not normally a big fan of the public SEO review/critique that some search marketing blogs make use of. I do like Sistrix’s blog (to give one example) but I also feel sorry for those who are dealing with the SEO for the particular brands that they dissect online for all to see – sometimes it’s not as clear-cut as it may look from the outside (SEO team may be at loggerheads with the development team, there may be technological limitations they’re having to adhere to, etc etc). But that being said – today I will be doing no different, as I take a look at the SEO setup/tactics of the popular UK based freelancer site, PeoplePerHour.com and have a look at the use of internal search results from an SEO perspective.
I’ve got nothing against the People Per Hour (PPH) platform on the whole – I used it quite a bit when I first started off as an SEO freelancer a few years back, having made the jump from working in an agency environment. I found my first few PPC and SEO clients on the platform, which really helped me to get myself up and running as a freelance consultant.
My only complaint is that they take a huge cut of any gigs you do get – upwards of 20% on any jobs awarded. So, get a job doing a technical SEO audit for a client at £200 and you’ll be paying PPH £40 of that – so you’re left with £160. On top of that they have additional withdrawal fees, and they charge various other fees to make your Hourlies (what they call jobs) and profile more visible. Many freelancers aren’t happy with their service fees (take a look at the comments on their blog post announcing it here) so I consider this post justified (somewhat!). I did also reach out to them before publishing this post but didn’t hear anything back.
And – as a further caveat, this article isn’t solely directed at PPH and their own SEO strategy, but more so at Google and their stance with regard to indexing internal search content. As I’ll explain later, it’s not completely clear what Google’s opinion is on this type of strategy, as further research has shown many websites do also have the same type of indexed content. I think that many SEO’s believe the use of internal search results (which are indexable and crawlable) is a clear no-go, but to me it looks like it’s very much industry dependant (as with most areas of SEO in 2018!).
PeoplePerHour’s Organic Rankings
You can see below PPH have had somewhat of an uplift in search rankings in the past few months.
Taking a quick look at PPH’s top organic keywords within SEMRush’s tool, you can see that branded search is the number #1 provider of traffic, as you’d expect. You might be quite surprised at some of the other search terms though…
There are a few surprising results here – namely the redtube keywords, appearing 2nd and 11th on the list. So according to this data (which should always be viewed with a grain of salt), “redtube.com” as a search term is the 2nd biggest provider of organic search traffic from the US in Google for PPH – weird!
Digging in a bit deeper, we can see the particular URL that has been indexed by Google as being https://www.peopleperhour.com/freelance/redtube+com
If you heard about what happened not long back with Giphy then you probably know where this is going – on the PPH site it seems that searching for specific jobs (or freelancers) creates its own specific URL, many of which seem to get indexed (somehow – which I’ll cover shortly).
Looking back at the specific page, I can see that it does indeed generate its own URL, but this does in their defence have a canonical tag pointing back to https://www.peopleperhour.com/freelance (one of their key organic pages).
So who’s to blame here – Google for not respecting the canonical (although search engines don’t have to – it’s only a signal, not a directive), or PPH for not “noindexing” these types of pages (via their robots.txt or directly on the page which is best).
In this particular case, all signs point to People Per Hour. The reason? Their shady (or just lazy?) use of XML sitemaps, which we found from inspecting their robots.txt file (see below).
Using XML Sitemaps to Get Content Indexed
The most suspect XML sitemaps linked to from their robots, via their sitemap parent, are these ones listed below – but be careful if you’re going to open them in your browser as they’re huge files (totalling around 30MB for all of them):
These look like a self-built feed containing some of the most searched for terms on PPH – but I’ve no idea how they’d generated it. The issue here is that it’s a case of “location”, and various other terms, keyword stuffing. Take a look at some of the highlights below, found from trawling through their huge file (I have way too much time on my hands it seems…).
You may be thinking yeah well, that’s not going to do anything – the content is so thin they’ve got 0 chance of ranking for it… but then you see:
The big surprise is that they rank so well for a number of these types of terms, and yet their content on page is extremely thin. This shows that sites with high domain authority, and lots of content, do tend to get preferential treatment in search.
Based on many of the URL’s included within their sitemap, I think it’s obvious that no-one is reviewing them at all, and that they’re in need of a good clear out.
It seems they used a similar tactic for their Hourlies section (jobs advertised by freelancers), and as you can see below it even caught some “spam” Hourlies which were likely removed from the platform at a later date.
The use of location + keyword + freelance + jobs is used extensively throughout the sitemaps, as you can see from the below.
If we visit one of these (choosing Bournemouth as it’s my home town!) we can see search results listed for “SEO Bournemouth” search within PPH.
Although it’s obviously not a huge search term to rank for, if done at scale this kind of SEO method is clearly going to bring them plenty of organic traffic. It appears that they do this for thousands of cities around the world, although from SEMRush’s data it’s not evident that these particular pages rank particularly well (although I only checked those with very high search volumes).
The use of internal search results and the risk allowing Google to index them
To me it isn’t clear what Google’s stance is on the indexing of internal search results. According to what happened with Giphy.com a while ago, it seemed they were hit by an algorithmic penalty (manual or otherwise) which tanked only their /search/ URL’s within Google. And there had been other sites that had been hit with previously penalties before, at least according to Sistrix’s post.
Trawling through the Google Webmaster Guidelines I couldn’t find anything specifically related to internal search results. Could it be that PPH in this case were avoiding the wrath of Google by turning these search results into static pages, and then using them to feed Google through their sitemap? Or maybe it’s just how PPH’s search function works – it’s not clear you’re seeing search results within the URL, and this is how they can happily avoid detection from Google?
My next question is whether or not it actually matters if search results are being indexed. You could argue that actually, someone landing on an internal search page of a job site still does have a good user experience, and that Google did indeed satisfy the users original search intent – the user had good dwell time on the page, they didn’t return to repeat their search, and Google’s machine learning algorithm picked this up and returned PPH’s results more often. The user was probably forced to refine their search a bit within PPH, but they persevered nonetheless.
Is it just PeoplePerHour that use this tactic to index their content?
Definitely not – although in the case of PPH there’s some very dodgy results that they’d probably want to clean up from their sitemap, there are many other sites that do use similar methods as part of their SEO strategy.
Again, in the case for UpWork.com the page does have a non self-referencing canonical (just like PPH) – so does this clear UpWork from any underhand tactics, instead placing blame on Google?
UpWork.com (a big rival of PPH) I feel risk incurring the wrath of Google more as their internal search results are more obvious and likely much easier for Google to pick up upon – the URL clearly ends with the ?q= parameter added. I couldn’t see any examples of them employing the same strange XML sitemap methods as PPH though – UpWork’s seemed to have more strategy about them, as they were listing pages based on popular job/freelance roles, as opposed to including thousands of internal search result pages.
On a side note UpWork also quite cleverly manipulated title tags to label it as “Top 10 SEO Experts” – giving the impression of a manually curated list, and then reinforcing the SERP’s relevancy by including the month and year in the result. This likely gives their Meta Tags a CTR boost, which in turn will improve their rankings.
I personally think that if the likes of PPH had pages which did contain top 10 style lists, and which were curated properly, and on-site copy was improved upon, they would see much bigger organic search growth and bigger conversion rates. To swap industries briefly, they could for example build out a page in the style of this one at Booking.com (which ranks highly for “hotels in London”), obviously replacing hotels with freelancers and surrounding the page with local-specific info followed by various reasons to use PPH (money back guarantees, testimonials, client feedback, brands using PPH, etc).
What is Google’s stance on indexed internal search content?
It used to be that Google were quite clear on your internal search results – they should be marked noindex to prevent them ending up in Google’s search results – as first mentioned by Matt Cutts on his blog a few years ago. This was also reflected in the Google Webmaster Guidelines back in 2007 (visible in the screenshot below).
This reference has since been removed – there’s no mention of this in the 2018 version of this doc.
I think this confirms that Google’s stance has definitely changed with regard to indexed search content – they will allow it when it does match the users search intent and provides a good experience to the user (which I’d argue is what PPH weren’t doing). But based on this I’d find it hard to argue why Giphy were dropped from their search results when they were caught doing the exact same thing.
To look again at Giphy since their SEO drop, we can clearly see that since adding a noindex tag to their internal search results, and changing their strategy slightly, they are now back on track to making a recovery (and they rank highly for “Happy Birthday” again).
The dangers of indexable internal search content
There are lots of things to be concerned about if your website does make use of user-generated search content, which is not noindexed or where crawler access isn’t controlled. Just a few days ago, it was reported that the customer service number of Spotify (which doesn’t exist – they don’t want customers calling them) was returned as a Featured Snippet within Google – resulting in a number of phone calls going directly to the number given by the spammers. You can read more about how via Martin McDonald’s write up.
Wow, a brilliant bit of evil:
Spotify, like so many silicon valley companies, doesn’t believe in having a customer service #. So someone took their reflector search box, SEO’ed google with that URL, so “Spotify phone number” instead goes to a bunch of fraudsters. pic.twitter.com/7YcJV3Q3N4
— Nicholas Weaver (@ncweaver) March 17, 2018
There are many sites out there that do indeed allow search results to be indexed – in the job industry this seems commonplace (see screenshot below), but I think that’s an example where it’s fine to do so – jobs change on a regular basis, so presenting these as search results to the user having arrived via a Google search isn’t necessarily a bad user experience.
In the world of retail this is very common too – take a look at the example of Argos.co.uk below. Many of their top ranking pages are indeed search pages – see the Bunk Beds example below.
Key Takeways on using internal search results as part of your SEO strategy – TLDR
As always I’ve gone off on several tangents within this article, but some of the key takeaways on the use of internal search results on websites (and their SEO impact) are listed below:
• Showing internal search results within Google does work as an SEO strategy in some industries.
• You have to be careful as to how you implement it; and it’s by no means very easy to do correctly.
• Having user-generated search content created “on the fly” is incredibly risky – see what happened with Spotify above.
• Sites need to have strategies in place to prevent low quality search content being crawled/indexed.
• Sites (PeoplePerHour in this case) need to take care about what they include in their XML sitemaps.
• Crawl budget (although not mentioned in the article) will be a big concern for any sites that do display internal search results to Google – log file monitoring is a must!
• I’m still not convinced about how (and if) Google differentiates between industries, and which ones can use internal search as a tactic without penalisation; job/freelancer sites yes, but sites like Giphy? It seems not (even if they likely did meet search intent satisfaction).
• It’s quick to make technical SEO changes and see an uplift – see what’s happening with Giphy.com currently.
I’d welcome any comments and feedback on anything discussed here. Do you have other examples I might have missed about what can go wrong with user-generated search content? Or any examples of internal search queries ranking highly within Google? I’m sure there’s plenty – feel free to share.
great job researching this, I never thought search results could index urls… oh and I hate pph