Have you ever wondered how Google and other search engines can efficiently organize all the content that ends up on their search engine results pages? They do it with the help of search robots. They aren’t literal robots, but virtual ones that crawl the internet, indexing titles, summaries, and entire contents of websites faster and more completely than human beings ever could. This content includes web pages, PDFs, images, and videos. Then, they rank that information for search queries.
Because of how these virtual crawlers work, many people call these bots spiders, crawling the world wide web. It is their work that allows us to retrieve information so quickly when we search the internet. It’s pretty mind-blowing when you think about it.
Managing Bot Traffic with robots.txt Files
There will be times when you won’t necessarily want everything on your website to be found by everyday people searching the internet. These would be pages meant for employees, thank you pages, and other internal business-use-only pages.
You can usually manage to keep search engine bots out of these pages, at the directory level, by using a robots.txt file. The robots.txt file is just a request. Even though Google’s crawlers are generally respectful of these requests, there is no guarantee that their bots or those belonging to other search engines won’t ignore your request.
Robot Meta Tags
Robot meta tags (also known as robots meta directives), like other meta tags, are pieces of code. They tell the search engine bots that crawl websites how to index web page content. Unlike robots.txt files, these meta tags aren’t suggestions. And rather than trying to keep bots out of your site, they tell crawlers that a page should not be indexed.
Two Kinds of Robot Meta Tags
There are two kinds of robot meta tags:
- Directives that are part of the HTML page
- Directives that the web server sends as HTTP headers
These directives tell search bots how to crawl and index specific web pages on your site. Even though these are directives (meaning they are orders), bots can still ignore them.
Why Wouldn’t You Want Something Indexed?
- To block an element on a page, such as an image or a video, rather than the entire page.
- Content not written in HTML, like flash or video, should not be indexed.
- If you can’t access the <head> section of a page’s HTML
- When you can’t change your site’s global header
Since bots need to crawl your site to read robot meta tags and follow them, having robots.txt files will be counterproductive. The robots.txt file will keep the bots out, so they never see your directive. If you are unsure which one you should use, opt for a meta robots tag with “noindex, follow” parameters over a robots.txt file.
Even if you correctly use a robots.txt file or meta tags to allow or disallow search bots from visiting and indexing your website, you will want to get your website set up in Google Search Console. This will let you see what is being indexed. It will also allow you to request that Google remove specific URLs from its search index.
Never count on either of these to keep private information out of the public eye. For that level of security, you will want to keep your data under password protection.
Always be careful when using these files. They have their place, but occasionally, people will inadvertently make their entire site inaccessible to Google’s bots which is no good for your SEO at all.
Your Best Bet with Bots
Iceberg Web Design builds custom websites every day. We also have SEO services available for new and existing sites. Are you looking for your next website-based business solution? Contact us today to see how we can help.