workflow for computational dictionary refinement

Toward Computational Literature Review: Refining Expert-Built Dictionaries for Automated Analysis of Academic Texts

Jaren R. Haber, Heather A. Haveman, and Yoon Sung Hong

PDF Code

workflow for computational dictionary refinement

Toward Computational Literature Review: Refining Expert-Built Dictionaries for Automated Analysis of Academic Texts

Jaren R. Haber, Heather A. Haveman, and Yoon Sung Hong

PDF Code

Abstract

The social sciences face growing demand for reproducible tools for processing massive troves of often-complex text data (political speeches, medical notes, etc.). In response, we aim toward computational literature review by developing an inductive method of applying expert-built dictionaries for automated analysis of complex texts. Our workflow begins with developing dictionaries from foundational texts and domain expertise. Next, we apply text-analytic methods of differential domain-specificity and complexity to create vector-space representations of texts. Finally, we compare the validity of these methods by using regression models to evaluate relationships between their representations and ground truth. Taking as our use case a large corpus of academic articles in organizational science, we find that domain-specific, relatively simple embeddings were most valid–while the more sophisticated models were very weak. Thus, we suggest that social science workflows for learning from complex texts incorporate embeddings that are domain-specific and straightforward–rather than convoluted.

Date

Jan 30, 2020

12:00 AM

Event

See my 20-min talk on this project and a succinct 5-min version.