TableScraper.jl

Scrape WELL-FORMED tables from webpages
Author xiaodaigh
Popularity
25 Stars
Updated Last
1 Year Ago
Started In
May 2021

TableScraper.jl

In this package there is only one function

scrape_tables(url)

which lets you scrape for tables wrapped in <table> tags and return them in a vector of Tables.jl compatible row-tables.

By default the function uses Cascadia.nodeText to extract the text from each <td> node.

However, if you wish to extract more than the text node you may want to use

scrape_tables(url, identity)

to keep the cells as Gumbo.HTMLNodes and do more advanced extraction.

Also, you can put any callable into the cell_transform argument to do custom transformation of the <td> nodes before returning.

E.g.

scrape_tables(url, cell_transform)

Video Tutorial

Video: Introducing TableScraper.jl - an easy way to scrape WELL-FORMED tables in Julia

Used By Packages