Googlebot is Google’s web crawling bot (sometimes also called a “spider”). Crawling is the process by which Googlebot discovers new and updated pages to be added to the Google index.
We use a huge set of computers to fetch (or “crawl”) billions of pages on the web. Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often, and how many pages to fetch from each site.
Googlebot’s crawl process begins with a list of webpage URLs, generated from previous crawl processes and augmented with Sitemap data provided by webmasters. As Googlebot visits each of these websites it detects links (SRC and HREF) on each page and adds them to its list of pages to crawl. New sites, changes to existing sites, and dead links are noted and used to update the Google index.
For most destinations, Googlebot shouldn’t get to your site more than once at regular intervals all things considered. Nonetheless, because of system delays, it’s conceivable that the rate will have all the earmarks of being marginally higher over brief periods.
Googlebot was intended to be circulated on a few machines to enhance execution and scale as the web develops. Additionally, to eliminate data transmission utilization, we run numerous crawlers on machines situated close to the locales they’re ordering in the system. In this manner, your logs may demonstrate visits from a few machines at google.com, all with the client operator Googlebot. We will probably creep the greatest number of pages from your site as we can on each visit without overpowering your server’s transfer speed. Demand an adjustment in the slither rate.
It’s relatively difficult to keep a web server mystery by not distributing connections to it. When somebody takes after a connection from your “mystery” server to another web server, your “mystery” URL may show up in the referrer tag and can be put away and distributed by the other web server in its referrer log. Essentially, the web has numerous obsolete and broken connections. At whatever point somebody distributes an erroneous connect to your webpage or neglects to refresh connects to reflect changes in your server, Googlebot will attempt to download a wrong connection from your website.
In the event that you need to keep Googlebot from creeping content on your site, you have various choices, including utilizing robots.txt to piece access to documents and catalogs on your server.
Once you’ve made your robots.txt record, there might be a little deferral before Googlebot finds your progressions. In the event that Googlebot is as yet creeping content you’ve obstructed in robots.txt, watch that the robots.txt is in the right area. It must be in the best registry of the server (for instance, www.example.com/robots.txt); setting the record in a subdirectory won’t have any impact.