Magento robots.txt
Magento comes without robots.txt functionality. It can be useful to
add one yourself to tell the search engines where they are not allowed
to index. It will hide your javascript files, hide SID parameters and
prevent some duplicate content. It will help your SEO process and
reduces resources on your server. In this blogpost I explain you how to
set your own Magento robots.txt using an existing example and using an
extension. Both solutions are easy to handle.
Manual installation of Magento robots.txt
The Magento robots.txt file we use a lot on websites is one that is
around on the net since January 2010. I would love to credit the creator
of the file, but the original forum post on Magento is not available anymore.
Implementation is easy. Just copy the content below, paste it in a
new file called robots.txt, change the location of the sitemap.xml and
upload the file into the root of your website. Be sure to upload it to the root even if your Magento installation is in a subdirectory. Search engines will only read robots.txt in the root of a website.
We commented out the allow of catalogsearch/results. This is because
we use Google CSE for our Magento shops. Read our previous blogpost on how to implement Google CSE on your Magento shop.
Since we want to let the search engines index the images of our
products we set /media/catalog to Allow and Disallow the rest of the
directories in /media.
Use an extension for robots.txt instead
If you’re more comfortable working from Magento backend instead of
changing files by hand there is an extension you can use to generate a
Magento robots.txt file. Robots.txt management tool and can be downloaded via Magento connect.
Out of the box this module can generate a robots.txt file for Magento.
Via the settings you can alter some main options. After that go to CMS
>> Robots.txt >> Manage and install some standard rules for
Magento. You might want to change some standard settings. I would like
to allow the search engines to index /media/catalog/. Out of the box
this rule is not present therefor you have to add it as a new rule. When
you’re done setting up the rules click the button to generate
robots.txt
Reindex robots.txt
Sometimes it might take a while for search engines to read the changed Magento robots.txt. Using Google Webmasters Tools
you can see when Google indexed your robots.txt for the last time.
Google should download robots.txt every 24 hours or after 100 visits. If
you want Google or other search engines to get the updated version
sooner you can use Header Cache-Control in your .htaccess file. Copy the
statement below into your .htaccess file.
1
2
<filesmatch"\.(txt)$"="">
Header set Cache-Control"max-age=60, public, must-revalidate"
With this statement you tell that all .txt files will expire after 60
seconds and require the user to download the file again. Depending on
how often Google crawls your site it will notice an outdated robots.txt
and downloads a fresh copy. Increase the max-age once you notice that
Google uses the new robots.txt and set it back to 60 seconds the day
before you’re going to change the file again. It will save resources.
Update 18-09-2012: naar aanleiding van blogpost Kennisartikel: serverload verlagen Byte hebben
we de Magento robots.txt aangepast. URL-parameters welke we eerst
blokkeerden worden nu weer toegelaten. Het is aan Google Webmasters
tools om deze te blokkeren.
source of image: www.sxc.hu/photo/1171276
Комментариев нет:
Отправить комментарий