Web Crawlers/Spider/Robot
- is a program or automated script which browses the World Wide Web in a methodical, automated manner (i.e. this process is called Web crawling or spidering)
Web Search Engines
- use web crawlers as a means of providing up-to-date data
Introduction
Subpages
- google’s web crawler
- to see the information extracted on your domain use google’s search console
- or use Chrome DevTool’s Lighthouse Tool in the Audits tab
List indent undo
Web Search Engine Problems
- JavaScript sometimes determines what is shown several websites, this requires the search engine to execute JavaScript on their end in order to extract the relevant information
- creating HTML snapshots
- Google starts executing JavaScript - started since 2008
- prerender.io - allows your Javascript website to be crawled perfectly by search engines
Interfacing With Web Search Engines - Search Engine Optimization (SEO)
- domain.com/any-path-prefix/sitemap.xml - informs search engines
- domain.com/any-path-prefix/robots.txt - tells search engine what can be crawled and/or not