Is employee involvement universally either good or bad, a “best practice” or an exploitative tool—or do its effects depend on context? …

Research shows charter schools are more segregated by race and class than are traditional public schools. I investigate an …

Work in Progress

Recent studies have established that there is a strong correlation between charter schools and school segregation. We investigate …

The social sciences face growing demand for reproducible tools for processing massive troves of often-complex text data (political …

Reproducible, Open-Source Research

In accordance with the core values of reproducibility, transparency, and open access, solving my own domain-specific puzzles has led me to develop an open-source project management infrastructure portable to other computational social science applications. As project manager of over 50 research apprentices in data science, sociology, and related disciplines, my team and I have created solutions to a range of pressing challenges faced by computational social scientists, from corpus creation and distributed web-crawling to Docker and virtual machine environment management. I’ve publicly shared these tools through GitHub and public documentation, and I encourage you to learn from and adapt them to your use case. I’m also leading the development of a forthcoming, universal platform for web-crawling, Crawl4All, to allow scholars with minimal computational resources to collect massive datasets from the web.

For guidance on virtual machine management, web-crawling, and more, see the project documentation and my team’s notes on the flexible, scalable Scrapy module for Python. For current code, see the GitHub pages for myself and for the research team I supervise, especially the well-documented repository applying Scrapy and the many methods we’ve used for text analysis. To reproduce the results of my paper on charter school identities recently published in Sociology of Education (titled “Sorting Schools”), you can access the code and read the public pre-registration with the Open Science Foundation. I’ve also registered an experimental and text-analytic procedure to study racial cues in charter school websites, which I am currently studying with collaborators Nick Camp and Jae Yeon Kim.


As someone committed to making computational research methods accessible, I’ve led numerous data science workshops at UC Berkeley and Georgetown University (GU). These teaching and learning materials are open-source and free online, including introductions to computational text analysis (CTA), scalable web-crawling, and text classification with machine learning. I’ve also given many demos, including at the UC Berkeley D-Lab’s Computational Text Analysis Working Group (CTAWG) and the GU Interdisciplinary Text Analysis Research working group (GUITAR), the UC Berkeley Digital Humanities Faire, and guest lectures on CTA and web crawling for graduate and undergraduate courses in computational social science. I am also experienced in guiding and mentoring undergraduates in computational projects, recruiting for diversity by race and gender and supporting new programmers with frequent meetings and feedback.

As a teacher in sociology and education, I have taught as a Graduate Student Instructor (GSI) at UC Berkeley for 11 courses in sociology, including twice designing and solo-teaching a course on school choice. My pedagogical vision is that students see clearly the social roots of personal ills and their own ability to make a difference. I aim to instill a sense of proportion about the human stakes of social problems, preparing students to be responsible observers and change-makers. This leads to these goals for student learning: (1) critical awareness of socially constructed privileges, assumptions, and norms; (2) a sense of connectedness to and compassion for “others”; (3) accountability for their contributions to the class and the social world; and (4) respect and openness to all perspectives.

To meet these learning goals, I use the active learning practices of small group discussion, a rigorous presentation structure, and incremental writing assignments. I balance lecture with activities and discussion to meet multiple learning styles and promote all students’ participation. An example is the “spectrogram” activity: students line themselves up across the room according to their agreement with a claim (e.g., “All students should learn in formal English”). This brings usually quiet voices out through movement, peer support, and my selective “interviews” across the lineup. My pedagogy also adapts to student needs revealed by midterm evaluations, required office hours, and campus situations—such as when I halved the minimum term paper length in response to students struggling to relocate from the dorms during the COVID crisis. For my teaching in sociology at UC Berkeley, I received the Certificate of Teaching and Learning in Higher Education in 2015 and the Outstanding GSI Award in 2016. As further professional development, I also received a Waldorf Teaching Certificate from the Bay Area Center for Waldorf Teacher Training in 2017. In addition, students consistently remark on my enthusiasm, effectiveness in encouraging participation, and the comfortable, open atmosphere of class discussion.

Academic Leadership

I’ve organized text data working groups since 2017: I coordinated CTAWG while at UC Berkeley, and I founded and lead GUITAR. In these groups, I’ve arranged speakers, led meetings, presented computational text analysis (CTA) tools and resources, contributed to CTA curriculum and workflows, and designed and coordinated collaborative projects—including analysis of charter school websites, Congressional speeches, and the United Nations General Debates Corpus. I’ve also organized and led special events, including two series of Sociology Job Market Practice Talks, the “Making Text Research-Ready” symposium in Spring 2018, the TextXD (“Text Across Domains”) symposium in 2018 and 2019, and the San Francisco Bay Area’s first Summer Institute in Computational Social Science (Bay-SICSS) in 2020.

In addition to serving on the core organizing committee at TextXD and being a featured speaker, I also worked with the D-Lab to implement cloud infrastructure for the event’s collaborative coding sessions (“hackathons”). In 2018, I led a hands-on tutorial on word embeddings during the pre-symposium CTA bootcamp, using a corpus of ancient Akkadian texts also featured in the next day’s keynote; in 2019, I kicked off the symposium with an introductory tutorial on CTA. I also served as a hackathon data leader, curating and creating a workflow for exploring word embeddings built on my charter schools data. See the code for my TextXD word embeddings workshop and collaborative session.

I spearheaded Bay-SICSS, building relationships with local non-profit organizations to give participants the opportunity to apply computational skills for social good. Led by our community partners, Bay-SICSS teams studied how disadvantaged groups bore the brunt of the COVID crisis, as shown by changes in food stamp applications, school funding, and online communities. See the blog post about our successes and our post-mortem. I have used community-engaged research in my teaching in similar ways, bridging students with disadvantaged populations through local partner organizations (e.g., Oakland High, urban tutoring programs). Indeed, I often design my courses and data science workshops around hands-on exercises applying scientific methods and reasoning to real-world data and social problems.


  • McCourt School of Public Policy, 37th and O Streets, NW, Old North, Suite 100, Washington, DC 20057
  • Twitter