DACURA: a new solution to data harvesting and knowledge extraction for the historical sciences

Peter N. Peregrine, Rob Brennan, Thomas Currie, Kevin Feeney, Pieter Francois, Peter Turchin, Harvey Whitehouse

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)
208 Downloads (Pure)


New advances in computer science address problems historical scientists face in gathering and evaluating the now vast data sources available through the Internet. As an example we introduce Dacura, a dataset curation platform designed to assist historical researchers in harvesting, evaluating, and curating high-quality information sets from the Internet and other sources. Dacura uses semantic knowledge graph technology to represent data as complex, inter-related knowledge allowing rapid search and retrieval of highly specific data without the need of a lookup table. Dacura automates the generation of tools to help non-experts curate high quality knowledge bases over time and to integrate data from multiple sources into its curated knowledge model. Together these features allow rapid harvesting and automated evaluation of Internet resources. We provide an example of Dacura in practice as the software employed to populate and manage the Seshat databank. 
Original languageEnglish
Pages (from-to)165-174
Number of pages10
JournalHistorical Methods: A Journal of Quantitative and Interdisciplinary History
Issue number3
Early online date15 Mar 2018
Publication statusPublished - 3 Jul 2018


  • Data harvesting
  • RDF triplestore
  • data curation
  • database metamodels
  • database ontology


Dive into the research topics of 'DACURA: a new solution to data harvesting and knowledge extraction for the historical sciences'. Together they form a unique fingerprint.

Cite this