
Data mining for “big archives” analysis: A case study
Author(s) -
Esteva Maria,
Tang Jeffrey Felix,
Xu Weijia,
Padmanabhan Karthik Anantha
Publication year - 2013
Publication title -
proceedings of the american society for information science and technology
Language(s) - English
Resource type - Journals
eISSN - 1550-8390
pISSN - 0044-7870
DOI - 10.1002/meet.14505001076
Subject(s) - workflow , computer science , data science , association rule learning , class (philosophy) , data mining , world wide web , database , artificial intelligence
We present a case of archival analysis using a combination of data mining methods. The team of researchers, composed by archivists and computer scientists, used a collection of declassified Department of State Cables as a case study. The methods implemented included Support Vector Machine (SVM) and Association Rule Mining. Combined in an analysis workflow, the results of the different methods allowed the team to identify the different security classes, understand how they changed over time and generate descriptions for the cables in each class. The interpretation of results also allowed understanding contextual aspects of the collection. Until now, the use of data mining for archival analysis and processing has not been thoroughly explored by the archival community. This study constitutes a seminal roadmap to understand how to apply, interpret and integrate data mining with the archivists experience and judgment in collaboration with computer scientists. It proposes an inductive approach to archives analysis and the possibility of verifying processing decisions.