If you are new to ‘webmaster’ing, one of the first things you should be aware of – is of robots.txt. For a starter who is figuring what robots.txt is, these two links will give you fair idea about it: link 1 and link 2.
When you discuss the advantages of robots.txt, different people have different views: some think it’s a very good idea to have robots.txt on their site, while others opine it’s not all that necessary. As they say, it’s to each his own. If you believe that having a robots.txt file on your domain will give you an edge, then go ahead and direct the Search Engine Bots what they should index and what they should exclude. If you are not concerned about everything on your server being indexed, then don’t bother using one. If you ask my personal opinion, I’d say, just for the heck of it have one, even if you want everything on your server to be indexed.
=> This link will give you some insight to advantages of robots.txt.
If you are wondering what possibly can go wrong with one ‘robots.txt’ file, then continue reading:
I had never put up a good robots.txt and one fine day I decided to put up one – Mistake !
Instead of carefully handcrafting a good robots.txt file, I decided to just randomly pick some code which was freely floating around on web, saved it in a text file named robots.txt and uploaded it onto the server- Big Mistake !!
So here’s what exactly happened:
Techbuzz is built on php, and the code which I found was blocking all the ‘.php’ files, as it was custom built for ‘.html’ sites. I overlooked this factor and went ahead. And the price I had to pay for it was: I lost more than 50% of my traffic in less than 24 hours !!
Lesson’s I learnt from this mistake:
- Don’t mess with robots.txt without complete knowledge ! Read the links given above, they’ll give significant idea about robots.txt, and then create a tailor made robots.txt to suit your website requirements.
- There is no way to Re-Submit your robots.txt file.
- You cannot force update your robots.txt file once they have been crawled.
- Most importantly- you do not SUBMIT robots.txt file, instead the bots read them during every crawl-cycle!
- If you end up screwing your robots.txt file, then the only best thing you can do is- create a new robots.txt file which is good to save your site, upload it to replace the older robots.txt and wait till the spiders crawl your site and update their database with the latest file. There is no fixed time when they crawl your site (it was around 24hrs in my websites case and another 24 hours before changes were noticeable), but there is no set time scale as such ! You may be lucky to get the traffic back, or just a bit unlucky to loose the traffic (I was lucky to regain the positions), but you really have to research what’s the wait period before you get back all your traffic.
- The robots.txt file behave very differently than I expected them to: You’ll be surprised to see that most of your URL’s will still be indexed even if you ban the Robots.txt from crawling your website. It only prevents the content found on those URLs from being indexed (Ref: Webado) !
I guess by now you have enough information that’ll protect your website from any ‘robots.txt’ malfunction.
I’m no expert, but just an experienced webmaster. If there are any doubts regarding ‘robots.txt‘ please do drop a comment. I’ll answer if I have any idea about it, or point you to a solution at the least!