There are thousand of websites and web applications that are never found in the search engines results, or are just bad ranked. Beautiful websites and useful information are lost in the interspace and we never reach them just because the authors don’t read the f* manual. Actually there is a manual that helps me every time I need to optimize search engine. However there are a few basic and essential rules which we need to focus on.
The mandatory rules and tags
Make sure you do the site sectioning and the URL’s representative of what will be (in fact) seen by the user. URL are meant to be parsed by our browsers of course, but if a human reads the URL? Nowadays everyone has contact to this kind of address, even your mom. If your home address describes well where you leave before Google maps why your URL’s don’t!
Do not generate URL’s. It’s easy to generate new URL slugs to skip user browser cache, but if you do this, the yesterday’s URL that someone shared on facebook could be gone forever! That’s not pretty and search engines will put you on a blacklist sooner or later.
Be sure that every page have the title and meta descriptions tags well filled using proper information. All the terms used will be (re)used by the search engines.
But of course, you already did all this well. There is still one thing that we forgot almost every time. Is what we don’t actually see - usually the robots.txt and the sitemaps.
A old file used by the search engine crawlers (the robots) to specify where to start and stop, to search and even places to avoid. The aim of crawling a site is to find out information. It’s not to abuse in terms of load and/ or access more classified zones that the owner don’t want to be public. What we can specify in the robots:
- Permitted user agents
- Specific user agents rules
- Allow and disallow paths
- Specify sitemaps
We can actually specify that Yahoo can crawl one of our sections and Google don’t (we truly don’t want this but it's possible)
The Cloudoki’s site map as example:
User-Agent: * Allow: / Sitemap: http://cloudoki.com/sitemap.xml
Because we are nice guys every user agent could crawl our site and we give a nice help showing where to start getting the richest content by specifying the sitemap index.
Have present the fact that a missing robots.txt pushes your site down in the search engine rank. So, do not forget to add a simple robots.txt at least. 404 is very bad!
In my opinion is one of the most important aspects of the SEO work. With sitemaps you can do magic! There a lot to talk about but I’ll be focus on the essentials: sitemaps itself, indexes and performance.
Basically with the sitemaps you describe your site navigation for the web robots (crawlers) - which sections, articles, images to start crawling and indexing first. In the most simple one, you can describe the URL, last modification, the update frequency and the priority weight of each place the crawler can.
This is a simple example of a sitemap:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>
Sometimes it’s not easy to describe all the sections and content in one only sitemap, mainly because crawlers have rules regarding the size of a sitemap and for that reason we have a index sitemap format.
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap1.xml.gz</loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap2.xml.gz</loc> <lastmod>2005-01-01</lastmod> </sitemap> </sitemapindex>
A nice pattern is doing a sitemap to each content type. I mean webpages, articles, images, etc. There a few content formats that you should customize separately, so the search engines can classify the content and do nice things with them, like the Google News.
It’s good practice to do a gz (gzip) versions of the sitemaps. Search engines will thank you for that and rank your site up.
You have the essentials: Index sitemaps, URL sitemaps, content sitemaps.
As to the robots.txt, a simple URL sitemap missing is a reason to the search engine rank you down.
There is much more to talk about sitemaps, I'll write you more later about the topic and how to take advantage of the full potential of these powerful files.
The Google webmaster tools
Is everything well done? Well... usually we never do all things right in the first time. However Google has a tool that help us to see if we did the right thing or not.
You will find a tool to test and see if everything is ok with your robots.txt, sitemaps and indexes. You can actually see in real time if the sitemaps has content valid to be crawled and see the progress of your crawled content growing over the time.
Don’t miss it!
The search engine giant Google has a guide to help in this task, simply because everyone wins with good SEO. Don’t miss the opportunity to read it. It’s a short and effective read you must do:
Other good references to be in touch:
The sitemaps specification