Create A Robots.txt File And Increase Your Search Engine Rankings

ADVERTISEMENTS

The robots.txt is a simple text file used to tell search engine bots which pages on your web site should be crawled and indexed. Neil Patel wrote a post on the Link Building Blog that on his personal blog he created a robots.txt file so he could remove any junk pages and duplicate content from the search engines. After doing this, his web site traffic went up 11.3%.

Here are the things Neil did with his blog:

  • removed comment feeds from search results so that no duplicate comment text is indexed
  • removed trackback URLs from being indexed because it was causing blank pages to be indexed
  • denied search bots access to his blog installation folder (MovableType).

If you are using WordPress, a sample robots.txt file would be:

User-agent: *
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /cgi-bin/

In the above code, Disallow: /wp- makes sure that the search engine bots will not crawl the WordPress files.

Update: Another example of a robots.txt file for WordPress that I found in WordPress.org:

User-agent: *
# disallow files in /cgi-bin
Disallow: /cgi-bin/
Disallow: /comments/
Disallow: /z/j/
Disallow: /z/c/
# disallow all files ending in .php
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
#disallow all files in /wp- directorys
Disallow: /wp-*/
# disallow all files with ? in url
Disallow: /*?
# disallow any files that are stats related
Disallow: /stats*
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /tag
Disallow: /docs*
Disallow: /manual*
Disallow: /category/uncategorized*

You can also find an example of a robots.txt file for WordPress here.

If you are not using WordPress, you can substitute the disallow lines with files or folders on your web site that you do not want to be crawled.

After you finish creating the robots.txt file, don’t forget to upload it to your site’s root directory.

I also recommend you to display post excerpts instead of full posts on the homepage to ensure that you are not incurring any search engine penalties for duplicated content (on the homepage and single post pages).

banner

Search JohnTP.com or view a random post

To receive this blogs articles for FREE on your email inbox, just enter your email address below and click 'Go':

Enter your email address: or .

Find out what I am doing currently by .

77 responses so far, Leave a comment

  1. 1

    Benedict Herold

    March 29, 2007 at 1:58 am

    Hey.. thanks for the heads up… I too face some problem with the comment feed and trackback. Will implement the same in my site too.. Thanks John

  2. 2

    egon

    March 29, 2007 at 2:02 am

    Do I want to include the “Disallow: /wp-content/” in my robots.txt file if I want my pictures to show up in Google Images? I receive a good amount of traffic from that and uploads are in that directory.

  3. 3

    Abdul Aziz

    March 29, 2007 at 3:21 am

    Thanks for readymade robots.txt file for Wordpress blogs. I will implement it as soon as possible.

    For the second tip, for those who are not aware, you can search for the_content() and change it to the_excerpt() in your index.php, archives.php, category.php, etc files via the Theme Editor.

    By default, Wordpress shows 120 words in excerpt.

  4. 4

    JohnTP

    March 29, 2007 at 10:57 am

    Benedict Herold- wait for a week and let me know if creating a robots.txt file helped you.

    egon- try adding

    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    in your robots.txt file to allow Google Image bot to search all images of your site.

    Abdul Aziz- Thanks for the reminder

  5. 5

    Mr.Byte

    March 29, 2007 at 2:07 pm

    When I gave my site link for automatic review in a website, it said there is no robot.txt, now I know what it is, I have to implement it.

  6. 6

    Navjot Singh

    March 29, 2007 at 6:05 pm

    Thanks for the tip John…it definitely helps me. I am using it currently on my blog.

  7. 7

    Madhur Kapoor

    March 30, 2007 at 2:46 am

    Do i need to add the permalinks of all the posts in the file if i want them to be indexed .

  8. 8

    Runa

    March 30, 2007 at 4:01 pm

    From the second one I would remove the following line:

    Disallow: /comments/

    If you have a good spam-detector, there wouldn’t be any problem in allowing search engines indexing comments.

  9. 9

    Deep

    March 30, 2007 at 7:22 pm

    Nice tip but I do not think so it will help to increase the traffic as robots.txt just tells search engines what to scan and what not to. No relation with increase of traffic at all.

  10. 10

    egon

    March 30, 2007 at 7:31 pm

    Search engines look down upon duplicate and unrelated content, so by not letting the robots see that content could help your ranking by making it look more legit.

  11. 11

    Deep

    March 30, 2007 at 7:38 pm

    Nopes, it does not work like that. SE’s work totally different way for duplicate content. Matt Cutts had some details on duplicate content stuff. But as far as robots.txt is concerned I do not see any ways by this it can help to increase the traffic.

  12. 12

    Awsaun Ronald

    March 30, 2007 at 10:35 pm

    From two above, according by you whichever that was good for our wordpress John? I have put robot.txt into my wp like this :

    # This rule means it applies to all user-agents
    User-agent: *
    # Disallow all directories and files within
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    # The Googlebot is the main search bot for google
    User-agent: Googlebot
    # Disallow Google from parsing indididual post feeds and trackbacks..
    Disallow: */feed/
    Disallow: */trackback/
    # Disallow all files with ? in url
    Disallow: /*?*
    Disallow: /*?
    # The Googlebot-Image is the image bot for google
    User-agent: Googlebot-Image
    # Allow Everything
    Allow: /*
    # This is the ad bot for google
    User-agent: Mediapartners-Google*
    # Allow Everything
    Allow: /*

  13. 13

    Garry

    March 31, 2007 at 5:48 am

    John,

    I want to make sure I do the right thing… what is the difference between robots.txt file and doing this in the header file:

  14. 14

    Garry

    March 31, 2007 at 5:49 am

    Sorry.. that didn’t work here you go:

  15. 15

    egon

    March 31, 2007 at 6:01 am

    Well mostly robots.txt is much easier and less time-consuming. With the meta tag you can only say “follow” or “nofollow” and you can’t specify directories without complicated rules. robots.txt is very simple and quick.

  16. 16

    Garry

    March 31, 2007 at 6:03 am

    Thanks man,

    I think I figured out what to do. Tell me if this is correct. I have the robots.txt file uploaded and I have added this line between my head tags:

    I think I got it right?

  17. 17

    egon

    March 31, 2007 at 6:11 am

    No Garry all you have to do is upload your robots.txt file to your root directory.

    To clarify, you can see JohnTP’s robots.txt file by going to https://johntp.com/robots.txt

    So just get an FTP program if you don’t have one, and upload it there.

  18. 18

    Mr.Byte

    March 31, 2007 at 11:33 am

    I was working on this and was wondering if adding ‘Disallow: /tag’ is a good move or not? Can You explain that?

  19. 19

    askApache

    March 31, 2007 at 1:53 pm

    There is a newer and better robots.txt article and example at askapache..

  20. 20

    Robert Irizarry

    March 31, 2007 at 8:34 pm

    I’m missing how this will increase traffic. Can someone elaborate?

  21. 21

    Garry

    March 31, 2007 at 10:59 pm

    Basically you control where crawlers such as the googlebots go. If you go to Google.com and type site:yourname.com you will see a listing of every page Google has indexed on your site. If you control where the bots go, then you control how people find your pages in the search engines.

  22. 22

    Deep

    March 31, 2007 at 11:09 pm

    hmm as mentioned earlier, it does NOT help to increase the traffic because there is nothing in it which will help to increase the traffic. It simply tells search engines what to scan and what to keep protected nothing else.

  23. 23

    Robert Irizarry

    March 31, 2007 at 11:57 pm

    John - I’m curious what you have to say about the matter. Also, are you going to be implementing this as well - maybe reporting on your findings?

  24. 24

    JohnTP

    April 1, 2007 at 11:04 pm

    Madhur Kapoor- No, you do not need to add the permalinks of all your posts in the robots.txt file

    Awsaun Ronald- This is what I have basically added in my robots.txt file

    User-agent: *
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /cgi-bin/

    I will be doing some tests after a few more days. I will try adding these

    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.txt$

    It may take upto 10 days to take effect.

  25. 25

    JohnTP

    April 1, 2007 at 11:21 pm

    Mr.Byte- If your tag pages don’t show the entire post, you may not need to add ‘Disallow: /tag’ to your robots.txt file as it already prevents content duplication.

    Check one of my tag pages. It does not show the entire post.

    Also, it is better to not show the entire post on the homepage, so you can prevent content duplication on single post pages and the homepage. Make use of the more feature.

    Robert Irizarry- I have already created a robots.txt file for this site. This is what I have basically added in my robots.txt file

    User-agent: *
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /cgi-bin/

    I will test further later. I am currently waiting to see the effect of adding the above lines to my robots.txt file.

    About the robots.txt file increasing traffic, I really don’t think it will increase traffic.

    But as far as I know Neil Patel is good at SEO and he says after creating a robots.txt file, his web site traffic went up 11.3%.

    But at the same time my friend Deep,who is good at SEO too, says that robots.txt file won’t help in increasing traffic.

    After a few more days I will be able to see if the robots.txt file helped me.

  26. 26

    Mr.Byte

    April 1, 2007 at 11:35 pm

    John, thanks for the reply. Can you also explain how to activate only the excerpt in the tags page? And how to find out whether the given robots.txt has been taken into consideration and how to identify its impact?

  27. 27

    JohnTP

    April 3, 2007 at 10:11 am

    Mr.Byte- My K2 based themes has always had excerpt in the tags page by default.

    it may take upto 10 days for the robots.txt to take effect.

  28. 28

    Patrix

    April 8, 2007 at 7:18 am

    Thanks, John for this tip. I hope you don’t mind that I have used your robots.txt file for my blog as well.

  29. 29

    Habitaquo

    April 10, 2007 at 3:29 pm

    I have a post related to seo for wordpress in spanish, you can go to my site and check that if you want.

  30. 30

    Ronald

    April 13, 2007 at 11:14 am

    Thank’s for your advice John. I’ve edited my robot.txt.

  31. 31

    Ronald

    April 24, 2007 at 11:00 pm

    Hmm my blog got suplement result :(

  32. 32

    shashank

    May 1, 2007 at 4:28 pm

    what about robot.txt for blogger ..
    from the above disallowing the search bots not to index someof the website pages increase traffic…is there any post where the importance of robot.txt file has been explained clearly..

  33. 33

    Robert Irizarry

    May 1, 2007 at 6:27 pm

    Shashank - If you’re on free Blogger, then you don’t have access to place a robots.txt.

    As far as the robots.txt driving traffic, what I gather is that its largely a tool to avoid things like duplicate content which would result in search engines penalizing you. For example, Wordpress users may have multiple ways of getting to the same page - the permalink, the www inclusive address, the non-www inclusive address, the page number, etc. These would appear as duplicate content.

  34. 34

    David Bradley

    May 1, 2007 at 9:15 pm

    Egon, I too see a goodly amount of traffic picking up my images folders (e.g. in wp-content on another site). However, I once checked the actual details of this traffic and discovered that the majority of it was simply images being hotlinked by MySpace users, needless to say, I enabled an antihotlink rewrite in .htaccess to stop that happening. Of course, actual search traffic from Google Images is a different and welcome matter.

    db

  35. 35

    Tech Crunch 2.0

    May 1, 2007 at 9:16 pm

    Nice tip John but I cant use it my blogger though.

  36. 36

    David Bradley

    May 1, 2007 at 10:50 pm

    Lots of comments mentioning how this ain’t poss with Blogspot/Blogger sites. Here’s a kind of workaround for that: http://www.sciencetext.com/blo.....omain.html

  37. 37

    Investing Blog

    May 2, 2007 at 4:32 am

    I tried blocking some files, but Google still indexes the pages. A lot of the posts landed in the supplemental index, which sucks.

    Perhaps I’m blocking too much or doing something wrong?

  38. 38

    Jay

    May 2, 2007 at 4:43 am

    I’ve got to say I really enjoyed this post, I’ve noticed some quirky pages from our blog showing up on some rankings.. Everything from the comments to even the feeds seems to rotate in and out of the Serps.. Anyways we will definitely put this to good use thanks John!

  39. 39

    Dennis Bjørn Petersen

    May 6, 2007 at 9:23 pm

    Thank you very much for this tip. I’ve added a robots.txt to my blog today, so I’m looking forward to how this will work.

    Thank you again!

  40. 40

    SwordMouth

    May 10, 2007 at 11:21 am

    What about blogger.com users? are they sent to hell - lol
    i was asking about robots.txt what about it? we guess have to settle with the rss sitmap

    SwordMouth
    [TechBlo.com]

  41. 41

    SEO

    June 2, 2007 at 2:00 am

    Very good information. Thank you and keep up the good work.

  42. 42

    John T. Pratt

    June 8, 2007 at 7:22 pm

    This is very, very helpful. I see you have it linked as a related post, but you should update this post with the sitemap autodiscover info too, everyone should add this to their robots.txt file asap:

    Sitemap: http://www.jtpratt.com/sitemap.xml

  43. 43

    from home

    June 14, 2007 at 10:36 pm

    should i exclude php too

  44. 44

    Thomas

    June 15, 2007 at 4:33 am

    Thanks a lot for the tipp John. I will use it.
    Regards

  45. 45

    derek

    June 16, 2007 at 10:06 pm

    When I first started I was told that putting your post in multiple categories with the permalink structure set up was a good idea. So if I had a stocks blog and I was talking about Gold I might put the post in three catergories such as Gold Stocks, Hot Stocks, Minerals

    Not the best example but you get the idea, so three links will flow into this post, does that actually hurt me? setting up the Robot in a certain way help me? Sorry if this is a rookie question but that is exactly what I am:)

  46. 46

    Dekoracje

    July 11, 2007 at 3:59 am

    Thanks to all who contributed to this page it’s realy helpful. I found some interesting hints :) Thank’s again and keep up a good work

  47. 47

    Ade Martin

    July 20, 2007 at 11:04 am

    Hi John,

    This is great information and very topical for me as I am trying to drag 40+ pages from the Supplemental Index.

    Quick question. Why disallow php files? I thought the WordPress index page was php so I am confused.

    Thanks,

    Ade

  48. 48

    Kay

    July 31, 2007 at 6:28 pm

    If I installed my WordPress in http://www.k-director.com/blog/
    how do I configure the robot.txt to the most suitable setting?

  49. 49

    TechZilo

    August 4, 2007 at 3:01 pm

    Thanks, I was searching for robots.txt code and went to official sites, but they were not helpful.

  50. 50

    AskApache

    August 11, 2007 at 1:21 am

    JohnTP- Both the example robots.txt from dailyblogtips and the WordPress Codex are from my 1st attempt at creating a robots.txt file. Now I am on my third revision, which I am still checking and tweaking, but yesterday I published it on my blog so check it out..
    http://www.askapache.com/seo/u.....press.html

  51. 51

    suchcenter

    August 20, 2007 at 2:14 am

    do somebody knows if its true that google don´t want the pagrank in the future? is this right? - how long will the pagerank exist? - this year, only next year? greetings from cologne

  52. 52

    Alex

    August 26, 2007 at 8:32 pm

    Does anyone have an opinion on disallowing category on a Wordpress installation? I’ve read that this is done to help eliminate duplicate content but it seems that a category page, with excerpts from all the posts in that category, is useful to readers so it shouldn’t be a problem for Google.

  53. 53

    Web Design Wexford

    September 6, 2007 at 1:54 am

    Search engine positioning, optimization, and increased website traffic are critical elements of a successful Internet business solution. High visibility of your website can make the difference between driving a high volume of sales leads and targeted traffic to your company’s website or being lost in “cyber space.”

    With the burgeoning popularity of the internet, new developmental tools are created daily. With these tools come new challenges, marketing, design, cross-browser transitions, etc. All of these can be a daunting task for those web gurus who aren’t well-versed in the W3 Standards.

  54. 54

    Symbian

    September 25, 2007 at 4:34 pm

    Unfortunately my blogger account doesn’t allow to create custom robots.txt file.

  55. 55

    Ashwini

    September 30, 2007 at 2:58 pm

    Does Anyone know how to change default Robots.txt file of Blogger(I know it sucks) ?

  56. 56

    Suchmaschinenoptimierung

    October 18, 2007 at 3:10 pm

    great post about the robots.txt. I have also a portal, but not so good information about the robost.txt. I will link to your post that my users also get the great informations.

  57. 57

    Udayweer Singh Yadav

    October 25, 2007 at 12:50 am

    This is more useful information for robot.txt. If you do not disallow to any one crawler than upload a black robot.txt file and upload it on root.

  58. 58

    Amit N

    October 29, 2007 at 6:23 pm

    Thanks for the nice tip here John, I will implement on my WP blogs.

  59. 59

    property investor

    November 24, 2007 at 3:49 am

    thanks for the ready made robots.txt
    i’m just making some changes to my blog.

  60. 60

    ukmalayalikal

    January 2, 2008 at 2:22 am

    I wasn’t using a robot.txt file on my site http://www.ukmalayalikal.com till now. I have added this file now, let see how it going to help.
    thanks

  61. 61

    Hani

    January 11, 2008 at 5:27 pm

    I didn’t understand what robots.txt all about till I read this post. I just uploaded robots.txt and we’ll see how it goes.

  62. 62

    John

    January 21, 2008 at 2:42 am

    Maybe you should take a closer look at the specification:

    http://www.robotstxt.org/faq/robotstxt.html

    Robotstxt does not allow wildcards except for the User-Agent.

  63. 63

    fedmich

    February 16, 2008 at 10:10 pm

    A little anxious to try this. I guess I’ll rely on askapache’s robots.txt

    Thanks anyway, nice idea :)

  64. 64

    Adi

    April 30, 2008 at 1:32 am

    thank you for this article

  65. 65

    venkat

    July 8, 2008 at 12:33 am

    this was fairly simple and straight forward !

  66. 66

    Paloika (tech magazine)

    July 20, 2008 at 11:08 pm

    While i do believe that Robots.txt is a great tool. but it will perform better on large blogs with a lot of pages

Have something to say?




Copyright ©2005-2008 JohnTP, All rights reserved.