JN
Julianne Nyhan
Author with expertise in Data Stream Management Systems and Techniques
Achievements
Open Access Advocate
Cited Author
Key Stats
Upvotes received:
0
Publications:
2
(50% Open Access)
Cited by:
13
h-index:
8
/
i10-index:
8
Reputation
Biology
< 1%
Chemistry
< 1%
Economics
< 1%
Show more
How is this calculated?
Publications
1

The challenges and prospects of the intersection of humanities and data science: A White Paper from The Alan Turing Institute

Barbara McGillivray et al.Aug 4, 2020
+22
S
B
B
1

defoe: A Spark-Based Toolbox for Analysing Digital Historical Textual Data

Rosa Filgueira et al.Sep 1, 2019
+10
A
M
R
This work presents defoe, a new scalable and portable digital eScience toolbox that enables historical research. It allows for running text mining queries across large datasets, such as historical newspapers and books in parallel via Apache Spark. It handles queries against collections that comprise several XML schemas and physical representations. The proposed tool has been successfully evaluated using five different large-scale historical text datasets and two HPC environments, as well as on desktops. Results shows that defoe allows researchers to query multiple datasets in parallel from a single command-line interface and in a consistent way, without any HPC environment-specific requirements.