So let’s get Technical, Technical.
The aim of the technical SEO audit is to identify opportunities to improve or optimise the accessibility of content (crawling for search engine robots).
Statement of Purpose
This is highly important as the rest down’t matter unless we know where it is we’re trying to get to. The key question being what is the client trying to achieve for the business with the website. Broken down further, how does it go about serving the purpose? What kind of content is on there?
Do a site: search
Not because you literally want to count what Google shows as the count of indexed pages but to be generally curious how the pages appear in the SERPS as it can tell you lots about how the site is optimised in terms of the tagging, the hierarchy of the page content, the url formation etc all from one command
Levels of priority
In many cases you could think of the several different levels of auditing:
1. Dangerous : where you have website configurations that block or prevent search engines from accessing your content
2. Moderate : the basics are in place but there are things that could be done to prevent duplicate content via technical issues
3. Advanced: Checking no PageRank is being left on the table via 404 error pages
Whether the search engine chooses to include the content in the index (i.e. indexing) is in my opinion a matter for the content audit. For now though we’re simply concerned with crawling and site health. So some key issues to consider:
A major fashion label managed to kill it’s search engine traffic by regressing to an older version of it’s content management system (CMS) by uploading a user agent: * Disallow:/ i.e. block all robots from accessing it’s content. This is the most instant way of ensuring your website content is inaccessible to search engines. Make sure you allow Google and other desired search engines to access the desirable content.
Since we’re discussing Robots, the Robots Meta Tags doesn’t prevent crawling but it does prevent indexing:
meta name="robots" content="noindex, nofollow"
Placed in the <head> section, this will instruct search engines not to index the content. More detail on how Google treats Meta Robots tags may be found here: http://googlewebmastercentral.blogspot.co.uk/2007/03/using-robots-meta-tag.html • The X-Robots-Tag entry – this is not included in the example above. If the header contains the lines X-Robots-Tag:noindex,nofollow or X-Robots-Tag:nofollow or X-Robots-Tag:noindex then the search engines will not correctly index a site. Again we’re going slightly OTT here as explained earlier.
There may be pages you’d like to prevent from being accessed that have zero benefit to the public such as admin pages. If you really want to prevent search engines from accessing such pages, then either block via IP Address if you have such a list OR more likely password protect.
The good news is Google Webmaster Tools has the functionality to trend by date when the domain couldn’t connect to the server. Obviously if there is a server outage then you have a DNS problem which is worth monitoring. Some sites discuss Page Speed in the context of technical audits however the search engine merely needs to able to connect to the server successfully in order to access and download the content, therefore we will deal with page loading times in the Content Audit features piece.
The HTTP Status Code entry describes the HTTP Response code returned by the server. The HTTP Response Code is important to tell the search engines about the status of the page. In the example above the HTTP Response code is HTTP/1.1 200 OK.
The problem is the Search Engines may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content. As a result, the Search Engines may consume much more bandwidth than necessary which may impact website performance and dilute the authority of the domain away from commercially valuable target webpages (more information is given here). This is not a one off audit activity as websites add and retire content over time thus this is a continuous issue that must be monitored, analysed, and resolved typically by permanent redirection.
The above technologies may indeed facilitate a more interactive experience for users when they come across the search engine. The question is whether the search engine can actually access the content. If not, the likelihood is the visitors are not coming from search engines! One quick check is to use the text search engine cache to see if the content is visible. This not only applies to text and image Alt text but also to menu links. Afterall if your navigation is in flash, the chances are the search engines can’t access the content reliant on flash links alone.
XML Sitemaps in my view are a mere formality. HTML Sitemaps on the other hand are important because humans are likely to click on them therefore it makes sense if you have one that is sensible in size (i.e. not too many links whatever that magic number may be) and if your mileage is high then creating section specific sitemaps are even more helpful in signposting users and search engines. Essentially, the website navigation is a sitemap in itself i.e. the web design should be logical and intuitive enough so that the user knows where they are in the website and can easily move around without much need to resort to a HTML sitemap bring this particular audit issue in question. Afterall if the user has to resort to a HTML sitemap – is the web design for navigation really working?
“Valid HTML” does not necessary mean the same as “standards compliant HTML” – it is not necessary for the site to conform to HTML5 or XHTML standards. Instead the site should avoid errors in structure or HTML tag formation that prevent the engines from parsing the page, and especially those that could stop engines from determining the definition between code and content. Some serious HTML coding issues can stop search engines from properly reading and navigating a site and it’s vital that:
• All HTML blocks that should be blocks are correctly opened and closed. A <head> tag needs a </head> tag and a <script> tag needs a </script> tag for example. Be especially careful of accidentally ending a block with errors like <script> or </scipt> instead of </script>.
• All HTML parameters are correctly opened and closed, and ideally demarked by inverted commas when appropriate. <a href=test.html> and <a href=”test.html> should be <a href=”test.html”>. Watch out for “ in copy – for example <a href=”test.html” title=”This is a 30” flat screen TV”> should be <a href=”test.html” title=”This is a 30" flat screen TV”>
• All HTML tags should be correctly closed. <a href=”test.html”<img src=”image.gif”></a> should be <a href=”test.html”><img src=”image.gif”></a>.
From an SEO point of view it is not necessary for the site to conform to any particular standard (HTML 5 / XHTML for example), but validating against these standards is a good way to be sure that there are no errors. Like web browsers, search engines are extremely good at dealing with bad HTML code. As with browsers, not all code is possible to read, and the errors that search engines cannot deal with are not necessarily the same as those that browsers cannot deal with. A page can look fine in a web browser but be unreadable by a search engine. It is therefore highly recommended (as per web development best practice) that the site uses fully valid HTML. Schem.org and microformats will be dealt with in the Content features post.
Character Sets and Language Tags
Each page on the site should start with a DOCTYPE definition describing the code and language used on the page. An example might be:
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.0//EN”>
It should also include a Content-Type description denoting the type of character set used on the page. This should be held within the <head></head> tags. An example might be as follows, for a page using an ISO-8859-1 character set:
<meta http-equiv=”Content-Type” content=”text/html;charset=iso-8859-1″>
Finally, a META description to explain the language of the content should be used:
<meta http-equiv=”Content-Language” content=”en”>
Along with IP address and domain name, on-page language declarations are used to geo-target content to search engine users. HTML coding and SEO best practice should be followed and the language of each page should be declared. Pages specify the following DOCTYPE and language:
<!DOCTYPE html> <html xmlns=”http://www.w3.org/1999/xhtml”>
This means that no Content-Language, HTML or Content-Type Meta tags are included.
Root Folder Redirects
The majority of a website domains’ given authority and relevancy resides in the root folder as the root domain is the index of all the major content and is usually the most referenced web address. Therefore all content for competitive ranking and commercial traffic purposes should reside in the root domain with links to sub folders and sub pages. The main domain at [home-page-URL] should resolve with a 200 HTTP Status, loading the homepage content. The Search Engines determine the whole website content being of lower importance to being more distant from the root domain for home page i.e. [home page-URL]
So once we have got past the issues that are usually self inflicted, we can start looking at overlooked basics such as canonicalisation. Canonicalisation is a fancy word for authority.
The most common canonical problems occur from sites showing both www and non-www versions which means the site has an exact mirror of it’s existing content. Another issue to look out for is duplicate URL strings caused by internal site search parameters, affiliate link parameters, session IDs and ecommerce product colour variations.
With all canonicalisation solutions be it http 301, rel canonical, php 301 redirects, it’s important they are applied properly and not all pointing at the home page unless for a valid reason.
A link which shows the full URL of the page being linked to is an absolute link. Links that only show relative link paths instead of having the entire reference URL within the <a href tag are known as Relative links. Due to canonicalisation (i.e. the most authoritative URL a search engine will index in preference to other URL variants) and hijacking related issues it is preferred to use absolute links over relative links. The search engines will use internal linking structures and links from external sites to determine which version of a URL is the canonical URL. A more detailed article on this is planned.
Temporary 302 redirects are used by webmasters to redirect users and search engines from one URL to another. These links do not pass PageRank (as far as we can guess) and are only appropriate for pages that will be become available at a later date due to rewriting content pages for example. For SEO it is generally better to use 301 permanent redirects, as only the new URL is indexed, and link popularity is transferred from the old URL to the new one. However, used inappropriately, the Search Engines will not follow the redirect and therefore the accrued authority of the source URL will not be redistributed back to the remaining live site pages. This will impact Search Engine rankings of the remaining site pages.
Check duplicate versions of the content are not appearing in the HTTPS version to which ecommerce sites are likely to fallible to this issue.
Back up data
Like any good report, there has to be evidence of the findings in the form of an appendix listing all the URLs and any other helpful data to back up the audit findings in their respective areas.
Being a whizz at SEO is one thing, but as a good friend and industry peer Dan once agreed with me is that it’s important to transform knowledge and ideas into consultancy i.e. by communicating the relevance in a way the client can understand and appreciate. The approach to the audit must also explain the technical issues in the following manner:
Priority– How important is it?
What is it– Explaining the issue
Diagnosis– The location and extent of the issue
Analysis– Why is it happening
Bespoke Solutions– Suggest a range of bespoke solutions because the most ideal solution especially for a corporate website is likely not to be implemented
Impact– The likely improvement or increase in crawl-ability as a result of the solution suggested.
EXCEPTIONS TO THE RULES
Of course like anything in life especially when you become familiar with Technical SEO challenges as you work on more and more varied or a single but highly complex project – there will be times when you want to intentionally break the rules.