Web scraping by end users

dc.contributor.author Tacuri, Alex
dc.contributor.author Firmenich, Sergio
dc.contributor.author Fernandez, Alejandro
dc.contributor.author Riva, Florencia
dc.contributor.author Urbieta, Matías
dc.contributor.author Rossi, Gustavo Hector
dc.date.accessioned 2026-03-24T13:45:13Z
dc.date.available 2026-03-24T13:45:13Z
dc.date.issued 2025-11-25
dc.description.abstract Scraping is a topic studied from various perspectives, encompassing automatic and AI-based approaches, and a wide range of programming libraries that expedite development. As the volume of available web content increases, it becomes increasingly challenging to anticipate end-user requirements regarding what, how, and when to extract data from the web. This challenge is compounded when integrating data from multiple websites, particularly when websites’ search engines dynamically retrieve unavailable data via permanent links. Complex scraping processes, such as these are difficult to develop using general-purpose programming languages and are challenging to automate with AI-based approaches. Controllability is a crucial aspect of scraping, that is, how end users can make decisions during the scraper specification process, understand information sources, and how the data are ultimately extracted, compiled, and formatted for output. In response, our study presents an innovative end-user approach for specifying scrapers that focuses on seamlessly integrating data from multiple sources. Through this approach and its supporting toolset, we aim to provide users with greater control and transparency over the extraction, integration, and formatting of data, thereby addressing the key concerns in web scraping. The approach and toolset were evaluated and they yielded promising results.
dc.identifier.citation Tacuri, A.; Firmenich, S. ; Fernández, A.; Riva, F.; Urbieta, M. & Rossi, G. (2025). Web Scraping by End Users. In: IEEE Access, vol. 13, pp. 205027-205044,
dc.identifier.issn 2169-3536
dc.identifier.other 10.1109/ACCESS.2025.3636662
dc.identifier.uri https://repositorio.uai.edu.ar/handle/123456789/4747
dc.language.iso en
dc.publisher IEEE
dc.subject web mining
dc.subject end-user computing
dc.subject human computer interaction
dc.subject user centered design
dc.subject web scraping
dc.subject data integration
dc.subject scraper specification
dc.subject web data extraction
dc.title Web scraping by end users
dc.type ARTICULO
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
0000763621.pdf
Size:
5.03 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: