Neatness of Code:
Code should be neat. Simple. No, it doesn't have to conform to W3C standards (Google doesn't even conform and it is estimated that only 3% of all sites actually do - side note that half the sites that do claim to conform don't either). With so few sites conforming to these standards how can Google (or any other engine for that matter) offer decent results if they negate 97% of the internet? Keeping code clean includes keeping all generic information such as style sheets and scripts in separate files.
Robot Tags and Text File:
Often you may not wish for the search engine bots to index certain pages. You can easily add the NoIndex attribute to this page. For whole directories you can simply add them into your robots.txt file located in the root of your website (http://www.website.com/robots.txt). Why is this important for design? Well the crawl rate and indexing is a concern for all departments. Remember I mentioned that all generic information should be kept separate? Well this way you can simply block those directories with the robots.txt file. This way the search engines will be forced to index your actual content pages before attempting to read your styling code.
Do not replicate content on multiple pages, it's a waste of time, effort and dilutes keyword value. While the duplicate “penalty” is a myth it does confuse the search engine as to which page is more important or even which page was first. Imagine someone giving you the key to a Ferrari and then telling you it was the key to the red Ferrari parked outside? Now imagine there are 10 red Ferrari’s parked outside! Which one does the key fit? If there is only one Ferrari the choice is easy. Usually the page which is indexed first is the one that is credited with being unique. The other pages are simply diluting their keywords and purpose. Personally I've always tried to aim one keyword per page this does lend itself to long tail combinations working on a single page as well.
While it is commonly accepted that the major search engines ignore boilerplate content (such as standard navigation, headers and footers), it has since been suggested that you can point out which sections Google should ignore. This doesn't seem to be in mainstream use just yet and I am sure that this won't make much of a difference as it remains open to abuse - as with so many other on-page factors.
URL’s, or URI’s, can make a difference when it comes to ranking. As mentioned before people may link to the page (home or internal) with the actual URL. As mentioned before anchor text is vital for ranking a page so it makes sense then to include keywords in your URL. Long gone are the days when URL’s were dynamic and half the URL’s had strange characters and session ID’s (a massive source of duplicated pages).
In addition to duplicating pages session ID’s and multiple variables can also cause a search engine spider to become trapped in deep pages. Once trapped a spider will leave the website this may result in your more important pages not being indexed. We can now specify the URL of a page through the use of a specific tag in the page header. In this instance the search engines (Google & Ask.com) will ignore session variables (or others you may have generated) and only index the page as you specify.
The easiest way for human and bot to get from page to page is through links. Not all links were created equal. Links hidden in flash, images or scripts may look good to the human but be impossible for the search engine bot to read. Content remains king and while community (social media) has recently been crowned queen but it is the text link that remains everyone’s servant. On your own website you can use desired anchor text to describe the page you are linking to.
From another website, if a link to a website is a vote, then the anchor text tells you what they are voting for. Because so many webmasters, bloggers and publishers link to pages using the URL as the link text it becomes quite clear as to just how valuable it can be to include your desired keywords in your URL. However, no matter how hard you try you will always have broken links to your site. This could be due to a typo or because you've moved the page (or restructured the website) in which case a custom 404 page is crucial. When rebuilding a website and changing your URL structure, it is advisable to 301 (permanent redirect) the old URL to the corresponding new one.
Forms and Restricted Pages:
Don’t hide your great content behind forms and other forms of logins. Robots can’t fill these in and won’t be able to reach these pages. Simply put they won’t know that it exists. There are ways around this, but why make it difficult of the Robots or even Humans who are now becoming more and more reluctant to part with personal info on the web.
XML sitemap for robots (often simply referred to as a Google Sitemap). If you have many pages, consider breaking these down into themes. At present I prefer to set up a static XML sitemap for the pages that won't change and a dynamic XML sitemap for listings, products, etc that will change on a regular basis.
HTML or plain text sitemap for humans can be a perfect place to get all those keywords in either the link itself or next to it. This is also an easy way for a visitor to find something listed on the website. Make sure that this page is easily accessible from the homepage.
It is reported that Google has over 200 criteria points when it comes to ranking a website. Many of those aren’t part of the design. But a few that are include:
- Keep code to the minimal required
- Minimise the use of code that search engines can’t read (hide it when possible)
- Unique content - keep navigation consistent
- Use descriptive URL’s
- Keep unique URL's
- Descriptive internal linking
- Use text links to reach all of your pages
- Custom 404 page
- Don’t hide great content behind forms and login pages
- Use XML Sitemaps for the search engines
- Use a descriptive HTML sitemap