Is employee involvement universally either good or bad, a “best practice” or an exploitative tool—or do its effects depend on context? …

Work in Progress

Recent studies have established that there is a strong correlation between charter schools and school segregation. We identify …

Research shows that charter schools are more segregated by race and class than traditional public schools. I investigate an …

We develop an general-use, inductive method of generating domain-specific dictionaries through word embedding models. Our workflow has …

Reproducible, Open-Source Research

In accordance with the core values of reproducibility, transparency, and open access, solving my own domain-specific puzzles has led me to develop a complex, open-source project infrastructure portable to other computational social science applications. As project manager of a total of nearly 40 coders through the Undergraduate Research Apprentice and Data Science Discovery Programs, I’ve developed a multi-platform, reproducible approach to coordinating several coding teams (e.g., text analysis, web-scraping, and data management) using GitHub, Slack, BaseCamp, and Box. Indeed, my team and I have created solutions to a range of pressing challenges faced by many computational social scientists, from corpus creation and distributed web-crawling to Docker and virtual machine environment management. I’ve publicly shared these tools through GitHub and public documentation, and I encourage you to take and adapt them to your use case.

For guidance on virtual machine management, web-crawling, and more, see the project documentation. For current code, see the GitHub pages for myself and for the research team I supervise. To reproduce the results of my paper on charter school identities submitted to Sociology of Education (Sorting Schools), you can access the code. You can also read the public pre-registration with the Open Science Foundation.


As someone committed to making computational research methods accessible, I’ve led data science workshops at UC Berkeley, including several demos at the D-Lab’s Computational Text Analysis Working Group and the Digital Humanities Faire and a guest lecture on web crawling for a graduate course in computational social science. Example talk titles include “Introduction to a thorough, practical CTA training” and “Web-scraping at scale: How I captured the corpus of inconsistent charter school websites”. I am also experienced in guiding and mentoring undergraduates in computational projects.

As an experienced instructor in sociology, I have three goals for student learning: (1) critical awareness of socially constructed privileges, assumptions, and norms; (2) a sense of connectedness to and compassion for “others”; and (3) accountability for one’s contributions both to the learning environment and to the social world. To meet these goals, I use the active learning practices of small group discussion, a rigorous presentation structure, and incremental writing assignments. For my teaching in sociology at UC Berkeley, I received the the Certificate of Teaching and Learning in Higher Education in 2015 and the Outstanding Graduate Student Instructor Award in 2016. As further professional development, I also received a Waldorf Teaching Certificate from the Bay Area Center for Waldorf Teacher Training in 2017.

Academic Leadership

Since Spring 2018, I have coordinated the Computational Text Analysis Working Group (CTAWG): I’ve arranged speakers, led meetings, presented computational text analysis (CTA) tools and resources, contributed to CTA curriculum for the Data-Intensive Social Sciences Laboratory (D-Lab), developed workflows for collaborative coding, and implemented a collaborative project analyzing the United Nations General Debates Corpus. And I’ve organized and led special events, including two series of Sociology Job Market Practice Talks, the “Making Text Research-Ready” symposium in Spring 2018, and the TextXD (“Text Across Domains”) symposium in 2018 and 2019.

I played several key roles at TextXD 2018. In addition to serving on the core organizing committee and being an event speaker, I also worked with the D-Lab to implement cloud infrastructure for the event’s collaborative coding sessions (“hackathons”). And I led a hands-on tutorial on word embeddings during the pre-symposium CTA bootcamp, using a corpus of ancient Akkadian texts also featured in the next day’s keynote. Finally, I served as a hackathon data leader, curating and creating a workflow for exploring word embeddings built on my charter schools data. See the code for my TextXD word embeddings workshop and collaborative session.


  • PO Box 134, Woodacre, CA 94973
  • Twitter