Support > Promoting Your Website > Search Engine Optimisation
A Robots.txt file is a text file associated with your website that is used by the search engines to determine which of your website's pages you want them to visit and which they should not.
The structure of a robots.txt file is very simple. Essentially, it's a note that tells search engines how you want them to index your pages. The most basic robots.txt file looks like the example below which allows any search engine to index everything it can find:
User-agent: *
Disallow:
This direction is broken down into two parts, the first is the User-agent. Theseare(for the most part) search engines that are crawling your site. You can structure your robots.txt file to apply rules to specific search engines. For example, you could use the following rule to refer to Bing robots:
User-agent: bingbot
Disallow:
In most cases, User-agent is followed by a * which represents that the rules apply to all robots.
The second part is the Disallow: function which specifies a page or directory you do not want the search engines to index. So the above example is telling Bing that they can access everything as no command has been specified.
We automatically set up your robots.txt file to be the following:
user-agent: *
Sitemap: http://yourdomain.co.uk/sitemap.xml
disallow: /include/
disallow: /shop/basket_new.php
disallow: /shop/checkout_process.php
disallow: /account/
disallow: /websiteusers.html
To access your robots.txt file as the search engines will, enter your full domain name into the address bar of your browser and add "/robots.txt" to the end of your website's address.
For example - https://www.yourdomain.co.uk/robots.txt
You can fully customise your robots.txt file by following the instructions below:
On your computer, open NotePad (or TextEdit on a Mac)
Use this program to write your new robots file in plain text without styling or formatting
Save the file as the name: robots.txt
Below are several scenarios you may want for your website and how you can set your robots.txt file to allow this:
1. Allow all search engines access to images
To specify all search engines you will need to add a * symbol as your user-agent, as this represents all search engines:
User-agent: *
Allow: /siteimages/
2. Disallow all search engines access to images
User-agent: *
Disallow: /siteimages/
3. Allow only some search engines
If you would like to allow only certain search engines you would need to specify these, as below:
User-agent: *
Disallow: /sitefiles/
Disallow: /siteimages/
User-agent: googlebot
Disallow:
User-agent: bingbot
Disallow:
In the example above, all search engines are blocked from crawling your files and images apart from Bing (bingbot) and Google (googlebot).
4. Only allow some search engines access to images
If you want only certain search engines to crawl your images you will need to specify these, as below:
User-agent: *
Disallow: /siteimages/
User-agent: googlebot-image
Disallow:
5. Allow your images to be crawled by Google but not appear in Google Images
If you would like Google to crawl your images but for them not to appear in Google Images you will need to specify this by listing the Google Image robot in your robots.txt, as shown below:
User-agent: *
Disallow: /sitefiles/
User-agent: googlebot-image
Disallow: /siteimages/
By specifically listing this Google Image robot you are stopping your images from appearing in a Google Image search, however by allowing Google to continue to crawl them this does mean they may still pop up in a Google web search.
6. Disallow some search engines so they cannot crawl anything
If you would like a specific search engine to not crawl your website at all you will need to add a / symbol, as this represents all of your content:
User-agent: *
Disallow: /sitefiles/
Disallow: /siteimages/
User-agent: bingbot
Disallow: /
In the example above, your robots is not allowing Bing to access your website but all other sites such as Google can!
7. Allow all search engines access to all pages on the site
If you would like to allow all search engines access to everything, you will need to add the following to your robots.txt file:
User-agent: *
Disallow:
8. Disallow search engines access to some pages on the site
If you would like all search engines to not have access to certain pages you will need to add the page filename to your file as below:
User-agent: *
Disallow: /guestbook/
Disallow: /onlineshop/
With password protected pages, these pages can be crawled by your robots but it cannot be accessed by a site visitor without a username and password. Due to this and the fact that the page is likely to have very little SEO benefit, if you did not want this page indexed at all you could add this to your robots.txt file but it is not necessary, as it cannot be accessed by all visitors.
9. Disallow search engines access to private documents
User-agent: *
Disallow: /sitefiles/27/3/6/273678/contact_form.pdf
Please bear in mind that if you want to disallow any pages or private documents from being indexed on your website using robots.txt, that people can still find these through your robots.txt file if they looked it up. If you do want to restrict access to these resources on your website, we would recommend password protecting your page.
To upload your robots.txt file and replace the Create one, please follow the steps below:
Click on Content on the top menu
Click on Files on the left hand menu
Click on the green button Add File in the top right-hand corner
Click on the Upload button and choose your file.
Click the green button Upload The File
Publish your website for the change to take effect
Your robots.txt will now be changed.
If you have any further questions, please get in touch and we will be happy to help.
Get in Touch