Create A Robots.txt File And Increase Your Search Engine Rankings
ADVERTISEMENTSThe robots.txt is a simple text file used to tell search engine bots which pages on your web site should be crawled and indexed. Neil Patel wrote a post on the Link Building Blog that on his personal blog he created a robots.txt file so he could remove any junk pages and duplicate content from the search engines. After doing this, his web site traffic went up 11.3%.
Here are the things Neil did with his blog:
- removed comment feeds from search results so that no duplicate comment text is indexed
- removed trackback URLs from being indexed because it was causing blank pages to be indexed
- denied search bots access to his blog installation folder (MovableType).
If you are using WordPress, a sample robots.txt file would be:
User-agent: *
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /cgi-bin/
In the above code, Disallow: /wp- makes sure that the search engine bots will not crawl the WordPress files.
Update: Another example of a robots.txt file for WordPress that I found in WordPress.org:
User-agent: *
# disallow files in /cgi-bin
Disallow: /cgi-bin/
Disallow: /comments/
Disallow: /z/j/
Disallow: /z/c/
# disallow all files ending in .php
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
#disallow all files in /wp- directorys
Disallow: /wp-*/
# disallow all files with ? in url
Disallow: /*?
# disallow any files that are stats related
Disallow: /stats*
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /tag
Disallow: /docs*
Disallow: /manual*
Disallow: /category/uncategorized*
You can also find an example of a robots.txt file for WordPress here.
If you are not using WordPress, you can substitute the disallow lines with files or folders on your web site that you do not want to be crawled.
After you finish creating the robots.txt file, don’t forget to upload it to your site’s root directory.
I also recommend you to display post excerpts instead of full posts on the homepage to ensure that you are not incurring any search engine penalties for duplicated content (on the homepage and single post pages).
Search JohnTP.com or view a random post
Related Articles
Find out what I am doing currently by .
Posted on March 29th, 2007 | Category: SEO, WordPress |
Benedict Herold
March 29, 2007 at 1:58 am
Hey.. thanks for the heads up… I too face some problem with the comment feed and trackback. Will implement the same in my site too.. Thanks John
egon
March 29, 2007 at 2:02 am
Do I want to include the “Disallow: /wp-content/” in my robots.txt file if I want my pictures to show up in Google Images? I receive a good amount of traffic from that and uploads are in that directory.
Abdul Aziz
March 29, 2007 at 3:21 am
Thanks for readymade robots.txt file for Wordpress blogs. I will implement it as soon as possible.
For the second tip, for those who are not aware, you can search for the_content() and change it to the_excerpt() in your index.php, archives.php, category.php, etc files via the Theme Editor.
By default, Wordpress shows 120 words in excerpt.
JohnTP
March 29, 2007 at 10:57 am
Benedict Herold- wait for a week and let me know if creating a robots.txt file helped you.
egon- try adding
User-agent: Googlebot-Image
Disallow:
Allow: /*
in your robots.txt file to allow Google Image bot to search all images of your site.
Abdul Aziz- Thanks for the reminder
Mr.Byte
March 29, 2007 at 2:07 pm
When I gave my site link for automatic review in a website, it said there is no robot.txt, now I know what it is, I have to implement it.
Navjot Singh
March 29, 2007 at 6:05 pm
Thanks for the tip John…it definitely helps me. I am using it currently on my blog.
Madhur Kapoor
March 30, 2007 at 2:46 am
Do i need to add the permalinks of all the posts in the file if i want them to be indexed .
Runa
March 30, 2007 at 4:01 pm
From the second one I would remove the following line:
Disallow: /comments/
If you have a good spam-detector, there wouldn’t be any problem in allowing search engines indexing comments.
Deep
March 30, 2007 at 7:22 pm
Nice tip but I do not think so it will help to increase the traffic as robots.txt just tells search engines what to scan and what not to. No relation with increase of traffic at all.
egon
March 30, 2007 at 7:31 pm
Search engines look down upon duplicate and unrelated content, so by not letting the robots see that content could help your ranking by making it look more legit.
Deep
March 30, 2007 at 7:38 pm
Nopes, it does not work like that. SE’s work totally different way for duplicate content. Matt Cutts had some details on duplicate content stuff. But as far as robots.txt is concerned I do not see any ways by this it can help to increase the traffic.
Awsaun Ronald
March 30, 2007 at 10:35 pm
From two above, according by you whichever that was good for our wordpress John? I have put robot.txt into my wp like this :
# This rule means it applies to all user-agents
User-agent: *
# Disallow all directories and files within
Disallow: /wp-admin/
Disallow: /wp-includes/
# The Googlebot is the main search bot for google
User-agent: Googlebot
# Disallow Google from parsing indididual post feeds and trackbacks..
Disallow: */feed/
Disallow: */trackback/
# Disallow all files with ? in url
Disallow: /*?*
Disallow: /*?
# The Googlebot-Image is the image bot for google
User-agent: Googlebot-Image
# Allow Everything
Allow: /*
# This is the ad bot for google
User-agent: Mediapartners-Google*
# Allow Everything
Allow: /*
Garry
March 31, 2007 at 5:48 am
John,
I want to make sure I do the right thing… what is the difference between robots.txt file and doing this in the header file:
Garry
March 31, 2007 at 5:49 am
Sorry.. that didn’t work here you go:
egon
March 31, 2007 at 6:01 am
Well mostly robots.txt is much easier and less time-consuming. With the meta tag you can only say “follow” or “nofollow” and you can’t specify directories without complicated rules. robots.txt is very simple and quick.
Garry
March 31, 2007 at 6:03 am
Thanks man,
I think I figured out what to do. Tell me if this is correct. I have the robots.txt file uploaded and I have added this line between my head tags:
I think I got it right?
egon
March 31, 2007 at 6:11 am
No Garry all you have to do is upload your robots.txt file to your root directory.
To clarify, you can see JohnTP’s robots.txt file by going to https://johntp.com/robots.txt
So just get an FTP program if you don’t have one, and upload it there.
Mr.Byte
March 31, 2007 at 11:33 am
I was working on this and was wondering if adding ‘Disallow: /tag’ is a good move or not? Can You explain that?
askApache
March 31, 2007 at 1:53 pm
There is a newer and better robots.txt article and example at askapache..
Robert Irizarry
March 31, 2007 at 8:34 pm
I’m missing how this will increase traffic. Can someone elaborate?
Garry
March 31, 2007 at 10:59 pm
Basically you control where crawlers such as the googlebots go. If you go to Google.com and type site:yourname.com you will see a listing of every page Google has indexed on your site. If you control where the bots go, then you control how people find your pages in the search engines.
Deep
March 31, 2007 at 11:09 pm
hmm as mentioned earlier, it does NOT help to increase the traffic because there is nothing in it which will help to increase the traffic. It simply tells search engines what to scan and what to keep protected nothing else.
Robert Irizarry
March 31, 2007 at 11:57 pm
John - I’m curious what you have to say about the matter. Also, are you going to be implementing this as well - maybe reporting on your findings?
JohnTP
April 1, 2007 at 11:04 pm
Madhur Kapoor- No, you do not need to add the permalinks of all your posts in the robots.txt file
Awsaun Ronald- This is what I have basically added in my robots.txt file
User-agent: *
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /cgi-bin/
I will be doing some tests after a few more days. I will try adding these
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
It may take upto 10 days to take effect.
JohnTP
April 1, 2007 at 11:21 pm
Mr.Byte- If your tag pages don’t show the entire post, you may not need to add ‘Disallow: /tag’ to your robots.txt file as it already prevents content duplication.
Check one of my tag pages. It does not show the entire post.
Also, it is better to not show the entire post on the homepage, so you can prevent content duplication on single post pages and the homepage. Make use of the more feature.
Robert Irizarry- I have already created a robots.txt file for this site. This is what I have basically added in my robots.txt file
User-agent: *
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /cgi-bin/
I will test further later. I am currently waiting to see the effect of adding the above lines to my robots.txt file.
About the robots.txt file increasing traffic, I really don’t think it will increase traffic.
But as far as I know Neil Patel is good at SEO and he says after creating a robots.txt file, his web site traffic went up 11.3%.
But at the same time my friend Deep,who is good at SEO too, says that robots.txt file won’t help in increasing traffic.
After a few more days I will be able to see if the robots.txt file helped me.
Mr.Byte
April 1, 2007 at 11:35 pm
John, thanks for the reply. Can you also explain how to activate only the excerpt in the tags page? And how to find out whether the given robots.txt has been taken into consideration and how to identify its impact?
JohnTP
April 3, 2007 at 10:11 am
Mr.Byte- My K2 based themes has always had excerpt in the tags page by default.
it may take upto 10 days for the robots.txt to take effect.
Patrix
April 8, 2007 at 7:18 am
Thanks, John for this tip. I hope you don’t mind that I have used your robots.txt file for my blog as well.
Habitaquo
April 10, 2007 at 3:29 pm
I have a post related to seo for wordpress in spanish, you can go to my site and check that if you want.
Ronald
April 13, 2007 at 11:14 am
Thank’s for your advice John. I’ve edited my robot.txt.
Ronald
April 24, 2007 at 11:00 pm
Hmm my blog got suplement result
shashank
May 1, 2007 at 4:28 pm
what about robot.txt for blogger ..
from the above disallowing the search bots not to index someof the website pages increase traffic…is there any post where the importance of robot.txt file has been explained clearly..
Robert Irizarry
May 1, 2007 at 6:27 pm
Shashank - If you’re on free Blogger, then you don’t have access to place a robots.txt.
As far as the robots.txt driving traffic, what I gather is that its largely a tool to avoid things like duplicate content which would result in search engines penalizing you. For example, Wordpress users may have multiple ways of getting to the same page - the permalink, the www inclusive address, the non-www inclusive address, the page number, etc. These would appear as duplicate content.
David Bradley
May 1, 2007 at 9:15 pm
Egon, I too see a goodly amount of traffic picking up my images folders (e.g. in wp-content on another site). However, I once checked the actual details of this traffic and discovered that the majority of it was simply images being hotlinked by MySpace users, needless to say, I enabled an antihotlink rewrite in .htaccess to stop that happening. Of course, actual search traffic from Google Images is a different and welcome matter.
db
Tech Crunch 2.0
May 1, 2007 at 9:16 pm
Nice tip John but I cant use it my blogger though.
David Bradley
May 1, 2007 at 10:50 pm
Lots of comments mentioning how this ain’t poss with Blogspot/Blogger sites. Here’s a kind of workaround for that: http://www.sciencetext.com/blo.....omain.html
Investing Blog
May 2, 2007 at 4:32 am
I tried blocking some files, but Google still indexes the pages. A lot of the posts landed in the supplemental index, which sucks.
Perhaps I’m blocking too much or doing something wrong?
Jay
May 2, 2007 at 4:43 am
I’ve got to say I really enjoyed this post, I’ve noticed some quirky pages from our blog showing up on some rankings.. Everything from the comments to even the feeds seems to rotate in and out of the Serps.. Anyways we will definitely put this to good use thanks John!
Dennis Bjørn Petersen
May 6, 2007 at 9:23 pm
Thank you very much for this tip. I’ve added a robots.txt to my blog today, so I’m looking forward to how this will work.
Thank you again!
SwordMouth
May 10, 2007 at 11:21 am
What about blogger.com users? are they sent to hell - lol
i was asking about robots.txt what about it? we guess have to settle with the rss sitmap
SwordMouth
[TechBlo.com]
SEO
June 2, 2007 at 2:00 am
Very good information. Thank you and keep up the good work.
John T. Pratt
June 8, 2007 at 7:22 pm
This is very, very helpful. I see you have it linked as a related post, but you should update this post with the sitemap autodiscover info too, everyone should add this to their robots.txt file asap:
Sitemap: http://www.jtpratt.com/sitemap.xml
from home
June 14, 2007 at 10:36 pm
should i exclude php too
Thomas
June 15, 2007 at 4:33 am
Thanks a lot for the tipp John. I will use it.
Regards
derek
June 16, 2007 at 10:06 pm
When I first started I was told that putting your post in multiple categories with the permalink structure set up was a good idea. So if I had a stocks blog and I was talking about Gold I might put the post in three catergories such as Gold Stocks, Hot Stocks, Minerals
Not the best example but you get the idea, so three links will flow into this post, does that actually hurt me? setting up the Robot in a certain way help me? Sorry if this is a rookie question but that is exactly what I am:)
Dekoracje
July 11, 2007 at 3:59 am
Thanks to all who contributed to this page it’s realy helpful. I found some interesting hints Thank’s again and keep up a good work
Ade Martin
July 20, 2007 at 11:04 am
Hi John,
This is great information and very topical for me as I am trying to drag 40+ pages from the Supplemental Index.
Quick question. Why disallow php files? I thought the WordPress index page was php so I am confused.
Thanks,
Ade
Kay
July 31, 2007 at 6:28 pm
If I installed my WordPress in http://www.k-director.com/blog/
how do I configure the robot.txt to the most suitable setting?
TechZilo
August 4, 2007 at 3:01 pm
Thanks, I was searching for robots.txt code and went to official sites, but they were not helpful.
AskApache
August 11, 2007 at 1:21 am
JohnTP- Both the example robots.txt from dailyblogtips and the WordPress Codex are from my 1st attempt at creating a robots.txt file. Now I am on my third revision, which I am still checking and tweaking, but yesterday I published it on my blog so check it out..
http://www.askapache.com/seo/u.....press.html
suchcenter
August 20, 2007 at 2:14 am
do somebody knows if its true that google don´t want the pagrank in the future? is this right? - how long will the pagerank exist? - this year, only next year? greetings from cologne
Alex
August 26, 2007 at 8:32 pm
Does anyone have an opinion on disallowing category on a Wordpress installation? I’ve read that this is done to help eliminate duplicate content but it seems that a category page, with excerpts from all the posts in that category, is useful to readers so it shouldn’t be a problem for Google.
Web Design Wexford
September 6, 2007 at 1:54 am
Search engine positioning, optimization, and increased website traffic are critical elements of a successful Internet business solution. High visibility of your website can make the difference between driving a high volume of sales leads and targeted traffic to your company’s website or being lost in “cyber space.”
With the burgeoning popularity of the internet, new developmental tools are created daily. With these tools come new challenges, marketing, design, cross-browser transitions, etc. All of these can be a daunting task for those web gurus who aren’t well-versed in the W3 Standards.
Symbian
September 25, 2007 at 4:34 pm
Unfortunately my blogger account doesn’t allow to create custom robots.txt file.
Ashwini
September 30, 2007 at 2:58 pm
Does Anyone know how to change default Robots.txt file of Blogger(I know it sucks) ?
Suchmaschinenoptimierung
October 18, 2007 at 3:10 pm
great post about the robots.txt. I have also a portal, but not so good information about the robost.txt. I will link to your post that my users also get the great informations.
Udayweer Singh Yadav
October 25, 2007 at 12:50 am
This is more useful information for robot.txt. If you do not disallow to any one crawler than upload a black robot.txt file and upload it on root.
Amit N
October 29, 2007 at 6:23 pm
Thanks for the nice tip here John, I will implement on my WP blogs.
property investor
November 24, 2007 at 3:49 am
thanks for the ready made robots.txt
i’m just making some changes to my blog.
ukmalayalikal
January 2, 2008 at 2:22 am
I wasn’t using a robot.txt file on my site http://www.ukmalayalikal.com till now. I have added this file now, let see how it going to help.
thanks
Hani
January 11, 2008 at 5:27 pm
I didn’t understand what robots.txt all about till I read this post. I just uploaded robots.txt and we’ll see how it goes.
John
January 21, 2008 at 2:42 am
Maybe you should take a closer look at the specification:
http://www.robotstxt.org/faq/robotstxt.html
Robotstxt does not allow wildcards except for the User-Agent.
fedmich
February 16, 2008 at 10:10 pm
A little anxious to try this. I guess I’ll rely on askapache’s robots.txt
Thanks anyway, nice idea
Adi
April 30, 2008 at 1:32 am
thank you for this article
venkat
July 8, 2008 at 12:33 am
this was fairly simple and straight forward !
Paloika (tech magazine)
July 20, 2008 at 11:08 pm
While i do believe that Robots.txt is a great tool. but it will perform better on large blogs with a lot of pages