Building a web scraper8/30/2023 ![]() ![]() Scraping Scope - do you need to scrape only a couple of pre-set pages or do you need to scrape most or all of the site? This part may also determine whether and how you need to crawl the site for new links.Data Volume - how much data are you going to extract? Will it be a couple of bytes or kilobytes or are we talking about giga- and terabytes?.Data Export - how do you wish to receive the data? In its original raw format? Pre-processed, maybe sorted or filtered or already aggregated? Do you need a particular output format, such as CSV, JSON, XML, or maybe even imported into a database or API?.Data Input - what kind of data are you going to scrape? HTML, JSON, XML, something binary, like DOCX - or maybe even media, such as video, audio, or images?.Scraping Intervals - how often do you need to extract information? Is it a one-off thing? Should it happen regularly on a schedule? Once a week? Every day? Every hour? Maybe continuously?.So, before we simply jump in at the deep end, let's establish a few key parameters for our scraping project, which should help us narrow down the list of potential scraping solutions. ![]() Many of us like to play Dart □, but we shouldn't necessarily pick our scraping platform (or technology) like that, right? Please feel free to check it out, should you wish to learn more about web scraping, how it differs from web crawling, and a comprehensive list of examples, use cases, and technologies. ℹ️ We have a lovely article, dedicated to this very subject - What is Web Scraping.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |