／var／log marcus chiu

❯

❯

Software／Fullstack Development

❯

1 - Code and Extra

❯

Search Engines (Index Servers - Search Servers)

❯

Search Engines - Types

❯

Web Search Engines - Web Crawlers／Spider／Robot

Common Crawler

Created on Oct 11, 2025

Common Crawler

maintains a free, open repository of web crawl data that can be used by anyone
as of Oct 2025 it contains over 300 billion web pages from the last 18 years and adding 3 to 5 billion new pages each month
the total compressed size of its monthly archives is in the hundreds of terabytes (TiB), with recent crawls exceeding 460 TiB
some older estimates state the entire corpus is about 6.4 petabytes (PB)

Resources

https://commoncrawl.org/