workflow for computational dictionary refinement

Toward Computational Literature Review: Refining Expert-Built Dictionaries for Automated Analysis of Academic Texts, joint with Heather Haveman

workflow for computational dictionary refinement

Toward Computational Literature Review: Refining Expert-Built Dictionaries for Automated Analysis of Academic Texts, joint with Heather Haveman

Abstract

We develop an general-use, inductive method of generating domain-specific dictionaries through word embedding models. Our workflow has three steps: construct (query model with seed terms to develop core dictionaries), refine (maximize dictionary coherence and distinctiveness), and validate (using unsupervised clustering and hand coding). We are optimizing our approach by varying the core dictionary size, WEM generation method (pre-trained vs. native), and dictionary application method (count-based vs. vector projection). We also compare results from two test cases: charter school websites and a corpus of academic journal articles. Our method of creating and validating new and even complex dictionaries allows researchers in diverse domains an accessible, reproducible, and valid workflow for analyzing researcher-generated themes in texts. This represents a significant improvement on the idiosyncratic, domain-restrictive approach to dictionaries used by social scientists for decades.

Date
Event

See my talk on an early version of this method and our developing code on this method.

Avatar
Jaren Haber
PhD candidate, Sociology