Technical SEO Guidelines - Web Marketing School

Websites have become ever more advanced over the past couple of years, with thousands of potential ways of building sites, often using proprietary technology search engines sometimes struggle to understand them.

Technical SEO guidelines mainly revolve around ensuring that search engine spiders (often known as “crawlers”) are able to navigate around, and understand your website.

The Technical Guidelines in the Google Webmaster Guidelines are covered below in easy to understand language thats jargon free.

Each point highlighted in a grey is taken directly from the Google Webmaster Guidelines, the descriptions underneath each are intended as practical examples and descriptions of what each means.

If any of the points are still unclear, or you don’t know how or why they apply to your site, then just post a question in our community!

[well]Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. [/well]

This is a fundamental concept in ensuring that search engines can understand the content on your site. A great example is sites that are built predominantly using technologies such as Flash or Silverlight.

While pages built using these languages are often very nice to look at, they are nearly impossible for search engines to interpret, so you should look at your site either using a text only browser (like Lynx), or if you dont want to install it you can use various plugins for more popular browsers to disable all the style and advanced elements of a page.

If you can’t use your site in text only mode, then don’t expect search engines to be able to use it either.

[well]Allow search bots to crawl your sites without session IDs or arguments that track their path through the site.[/well]

While less common in 2014, this is historically a very big problem for a lot of websites. Many CMS applications have session IDs appended to the end of a URL, which can cause a significant problem to search engines.

If a spider or search engine gets a new session ID on the end of your URLs every time it visits your site, it may assume that each page is different, even if the content is exactly the same as the last time it visited.

While there are many workarounds for this, such as implementing a canonical tag in your webpages, you shouldn’t need to rely on dynamic session IDs in your page addresses as there are better ways of handling this – for instance browser cookies.

An important consideration is how much time search engines will spend on your site. Every time Google or Bing visit your site it takes resources, bandwidth, hard disc space & processor cycles. If you make it difficult for them to process your site, they will crawl it less regularly and your rankings will be impacted.

[well]Make sure your web server supports the If-Modified-Since HTTP header. [/well]

This relates to the amount of resources it takes search engines to crawl your site. As above, every time a spider visits your site it takes resources from not only the search engine, but also your hosting which ultimately costs you money.

By including the If Modified Since header in your pages, when bots visit they can establish immediately whether they should read the rest of the page to find content that may have changed.

If you have them configured correctly then the search engines are more likely to spend time finding the content that HAS changed or been created since they visited previously, and your fresh content will be indexed more rapidly. If you do not have them configured then the search engines have to check every page on your site “manually” to find any changes that may or may not have occurred.

[well]Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it’s current for your site so that you don’t accidentally block the Googlebot crawler. [/well]

The Robots.txt file is a server configuration file which is open to spiders to check, and is always hosted on the root domain of your site with /robots.txt on the end (for example, on this site it would be webmarketingschool.com/robots.txt ).

Its purpose is to tell any crawlers which bits of the site they can, and can not look at. Its main purpose is to ensure that parts of your site you do not want indexed don’t accidentally end up listed on search engines, and by directing them away from these directories it makes the spiders more efficient and will potentially allow more of the parts of your site you do want indexed to be read.

Most CMS systems handle this for you dynamically, but in the circumstance where you can not work out why certain parts of your site don’t appear in the index its a good idea to check this file to see if you’ve accidentally stopped the search engines from seeing them.

[well]Make reasonable efforts to ensure that advertisements do not affect search engine rankings. For example, Google’s AdSense ads and DoubleClick links are blocked from being crawled by a robots.txt file.[/well]

This Google Guideline is not particularly clear, simply because the examples it uses don’t apply in the real world.

What they are getting at here is to ensure that any paid advertisements are not treated as “clean” links to the websites that have paid for them, and might then be interpreted as paid links which are forbidden in Google’s terms of service for inclusion in the organic listings.

If you have third party ad networks included on your site, or links that are paid for by advertisers (whether they be text or image/banner links) then you should either ensure they contain a nofollow tag in the link, or the ads are blocked from being read by using a robots.txt directive.

[well]If your company buys a content management system, make sure that the system creates pages and links that search engines can crawl.[/well]

This should be pretty obvious, but its always worth checking before you buy a new content management system. If search engines can not read the pages created by the system then they will not index them, and you won’t be able to get any free traffic from them. It would also make advertising them on adwords much more expensive.

The easiest way to test this on your own site is simply to copy and past the URL of the page you want to test and checking it in Google Webmaster Tools using their “fetch as Googlebot” function.

If the page you expect to see appears in the results, you know the CMS is producing content that the search engines can read.

[well]Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.[/well]

This has become particularly important since Google introduced their “Panda” updates a few years ago.

When people run a search on your site, they normally see the results at a URL which is dynamic to the search. Many CMS systems (including wordpress) also create dynamic pages such as tag or archive pages of your content. These can produce duplicate content on your websites which is bad for indexing and SEO, but it might be useful for your users.

If your site relies on these kind of pages for users, but they wouldn’t add any value to search engines, then it makes sense to disallow them from being indexed.

If you can do this by using a robots.txt file that is the optimal solution as it means that search engines wouldn’t spend time downloading them. If you can not achieve this using the robots.txt then the next best alternative is to include a noindex tag in the header of the page itself, to make sure you don’t suffer duplicate content issues.

[well]Test your site to make sure that it appears correctly in different browsers.[/well]

This might seem obvious to some web developers, but if you are new to building or marketing websites you should be aware that even if your site looks perfect to you – it might look terrible to other people depending on how they are accessing it.

All webpages are rendered on your devices screen by interpreting code (normally HTML and CSS), and some browsers display things differently.

Always try and check your website in as many browsers as possible, the common ones are Chrome, Firefox, Internet Explorer, Safari and Opera. All of these are on desktop or laptop computers, but you should also be aware that a lot of people in 2014 access your sites from mobile devices, such as tablets or phones, all of which potentially look slightly different.

[well]Monitor your site’s performance and optimize load times. [/well]

In the last couple of years Google have started using site speed as a factor in its ranking algorithm, but thinking about site speed should be a prime consideration even without that fact.

The faster your site loads, the more people will consume its content. Faster sites have lower bounce rates. Faster ecommerce sites also have higher conversion rates. Of course, the faster a site is the easier it is for search engines to crawl as well, so you will also benefit from better indexation and potentially better rankings.

There are many ways to speed up your site, but ensuring that its efficient is really important so make sure you get the basics right, such as serving optimised images, using compression and so on.