Is employee involvement universally either good or bad, a “best practice” or an exploitative tool—or do its effects depend on context? …

Work in Progress

Recent studies have established that there is a strong correlation between charter schools and school segregation. We identify …

Research shows that charter schools are more segregated by race and class than traditional public schools. I investigate an …

We develop an general-use, inductive method of generating domain-specific dictionaries through word embedding models. Our workflow has …

Reproducible, Open-Source Research

In accordance with the core values of reproducibility, transparency, and open access, solving my own domain-specific puzzles has led me to develop a complex, open-source project infrastructure portable to other computational social science applications. As project manager of a total of nearly 40 coders through the Undergraduate Research Apprentice and Data Science Discovery Programs, I’ve developed a multi-platform, reproducible approach to coordinating several coding teams (e.g., text analysis, web-scraping, and data management) using GitHub, Slack, BaseCamp, and Box. Indeed, my team and I have created solutions to a range of pressing challenges faced by many computational social scientists, from corpus creation and distributed web-crawling to Docker and virtual machine environment management. I’ve publicly shared these tools through GitHub and public documentation, and I encourage you to take and adapt them to your use case.

For guidance on virtual machine management, web-crawling, and more, see the project documentation. For current code, see the GitHub pages for myself and for the research team I supervise. To reproduce the results of my paper on charter school identities submitted to Sociology of Education (Sorting Schools), you can access the code. You can also read the public pre-registration with the Open Science Foundation.


As someone committed to making computational research methods accessible, I’ve led data science workshops at UC Berkeley, including several demos at CTAWG (see below) and the Digital Humanities Faire and a guest lecture on web crawling for a graduate course in computational social science. Example talk titles include “Introduction to a thorough, practical CTA training” and “Web-scraping at scale: How I captured the corpus of inconsistent charter school websites”. I am also experienced in guiding and mentoring undergraduates in computational projects.

As an experienced instructor in sociology, I have four goals for student learning: (1) critical awareness of socially constructed privileges, assumptions, and norms; (2) a sense of connectedness to and compassion for “others”; and (3) accountability for one’s contributions both to the learning environment and to the social world. To meet these goals, I use the active learning practices of small group discussion, a rigorous presentation structure, and incremental writing assignments. For my teaching in sociology at UC Berkeley, I received the the Certificate of Teaching and Learning in Higher Education in 2015 and the Outstanding Graduate Student Instructor Award in 2016. As further professional development, I also received a Waldorf Teaching Certificate from the Bay Area Center for Waldorf Teacher Training (in California) in 2017.

Academic Leadership

Since Spring 2018, I have coordinated the Computational Text Analysis Working Group (CTAWG): I’ve arranged speakers, led meetings, presented computational text analysis (CTA) tools and resources, contributed to CTA curriculum for the Data-Intensive Social Sciences Laboratory (D-Lab), developed workflows for collaborative coding, and implemented a collaborative project analyzing the United Nations General Debates Corpus. And I’ve organized and led special events, including two series of Sociology Job Market Practice Talks, the “Making Text Research-Ready” symposium in Spring 2018, and the TextXD (“Text Across Domains”) symposium in 2018 and 2019.

I played several key roles at TextXD 2018. In addition to serving on the core organizing committee and being an event speaker, I also worked with the D-Lab to implement cloud infrastructure for the event’s collaborative coding sessions (“hackathons”). And I led a hands-on tutorial on word embeddings during the pre-symposium CTA bootcamp, using a corpus of ancient Akkadian texts also featured in the next day’s keynote. Finally, I served as a hackathon data leader, curating and creating a workflow for exploring word embeddings built on my charter schools data. See the code for my TextXD word embeddings workshop and collaborative session.


  • PO Box 134, Woodacre, CA 94973
  • Twitter