Web scraping by end users

No Thumbnail Available
Date
2025-11-25
Authors
Tacuri, Alex
Firmenich, Sergio
Fernandez, Alejandro
Riva, Florencia
Urbieta, Matías
Rossi, Gustavo Hector
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Scraping is a topic studied from various perspectives, encompassing automatic and AI-based approaches, and a wide range of programming libraries that expedite development. As the volume of available web content increases, it becomes increasingly challenging to anticipate end-user requirements regarding what, how, and when to extract data from the web. This challenge is compounded when integrating data from multiple websites, particularly when websites’ search engines dynamically retrieve unavailable data via permanent links. Complex scraping processes, such as these are difficult to develop using general-purpose programming languages and are challenging to automate with AI-based approaches. Controllability is a crucial aspect of scraping, that is, how end users can make decisions during the scraper specification process, understand information sources, and how the data are ultimately extracted, compiled, and formatted for output. In response, our study presents an innovative end-user approach for specifying scrapers that focuses on seamlessly integrating data from multiple sources. Through this approach and its supporting toolset, we aim to provide users with greater control and transparency over the extraction, integration, and formatting of data, thereby addressing the key concerns in web scraping. The approach and toolset were evaluated and they yielded promising results.
Description
Keywords
web mining, end-user computing, human computer interaction, user centered design, web scraping, data integration, scraper specification, web data extraction
Citation
Tacuri, A.; Firmenich, S. ; Fernández, A.; Riva, F.; Urbieta, M. & Rossi, G. (2025). Web Scraping by End Users. In: IEEE Access, vol. 13, pp. 205027-205044,