Mittwoch, 28. Oktober 2015

How to prevent negative SEO impacts caused by poor HTML quality

Poor HTML impacts SEO
The question rises over and over again: whether and how could HTML markup negatively impact SEO. Googlebot is indeed a smart HTML interpreter:
  • it has a high tolerance to HTML syntax errors,
  • it doesn't force websites to comply with W3C validation rules.
But nevertheless there are some HTML misuses, which could painfully hurt SEO. To begin with this topic i relate to a pair of posts by two respectable Googlers and by commenting the posts i list HTML issues causing negative SEO effects:

Negative SEO impact by wrong HTML tag placement

+Gary Illyes wrote a warning post, meaning, that the <meta name="robots" content="noindex" /> mistakenly placed in the <body> instead of the <head> will be nonetheless interpreted, and the page will be deindexed. What it means in general, if this would be true? It means, if a webpage filters User Generated Content not restrictive enough and allows users to publish more HTML tags as <a>, <b>, <i>, <u>, <img>, it could be an easy way to harm such website. Just imagine, what could do with a page a user comment, containing <meta name="robots" content="noindex" /> in the <body>.

In the next Google Plus post i quote +John Müller said, HTML validation isn't necessary for crawling, indexing, or ranking. That's true, but HTML validation and, specially, the usage of the HTML code validator, like https://validator.w3.org/, can warn webmaster against issues, like meta tags mistakenly placed in the <body>.

Negative SEO impact by iframe

iframe as the HTML element is violating the basic web conceptual model, where one page has exactly one URL. Because iframe has its own source, the webpage containing an iframe has two URLs, associated with it (the own one and the iframe source). Google tries to associate the iframe content with the parent page, which contains an iframe, but there is no warranty of correct processing.

This iframe-caused SEO issue and its massive negative impact I've seen with own eyes - that was disgusting, and the most weird thing was, that nobody is insured against such a pain. Look: many provider of automated ad delivery offer similar implementation schema: webmaster implements a JavaScript snippet and provides a place for ads inclusion as an iframe. The ad provider populates the iframe with ads, which relate to the page content, day time or similar.

The search visibility of a site, which is Google News publisher and implemented advertising on the way i described above, dropped substantially down. I've looked into the source code and realized following setup: the iframe content, which was coming from the ad provider, was including <head><title>Advertisement</title></head>. It seemed to be clear for me: after Google associates the iframe content with the parental page, it wasn't amused about a Google News publisher's page titled as "Advertisement". The SEO ranking potential of a page titled Advertisement can't be as high as it would be (or was) without this title - this should be a most likely explanation of monitored search visibility drop.

Negative SEO impact by non-valid structured data

The last issue related to the HTML code validity is about structured data. Shortly i've had to do with two e-commerce sites:
  • the first site has implemented just a few structured data, but the implementation was absolutely error free,
  • the second site has implemented a real bunch of structured data, nearly every single type and property related to Product and Offer. But the implementation was pretty buggy, structured data testing tool said.
The first site outranked the second, because of different factors. It would be simple-minded to argue, the error free structured data implementation was THE cause for better ranking. But i guess it is one of side factors, which is, btw. influencing on the page crawling time - error free structured data is crawled quicker as the buggy implementation.

Prevent negative SEO impacts by poor HTML code

The lessons are pretty clear:
  • Validate your HTML: not for the W3C standard conformity, but to be notified by validator about potential issues
  • Limit and filter user generated content: disallow all HTML tags you don't really need 
  • Know your code: inspect iframes implementation from you and from your ad or content providers
  • Don't disorient Googlebot: use iframes without meta data and, in general, without head area.
  • Validate structured data: errorfree is loaded and crawled quicker
Yandex.Metrica