Chris.

I am in my final year at UC Berkeley pursuing an undergraduate degree in Computer Science and Data Science. I work on campus at the Data Science Education Program as a lead developer where I manage a team of developers helping to create technologies used for furthering the goal of data science pedagogy at UC Berkeley and other institutions. I am also a member of course staff for one of the advanced data science courses on campus, for which I teach sections and also develop course infrastructure. Lastly, I am currently an intern at the Federal Reserve Bank of San Francisco where I work on the Statistics team to analyze financial report data from various depository institutions and utilize this data to build products to make the Statistics business process run more smoothly.

Education

University of California Berkeley

Class of 2021, Majoring in Computer Science and Data Science (Business & Industrial Analytics), Minoring in Demography

I am currently attending UC Berkeley to earn my Bachelor's degree in Computer Science and Data Science, with an anticipated graduation of Spring 2021. I am also trying to take classes that are relevant to my interest in financial systems while working on campus and being active in extra-curriculars. When I came to UC Berkeley, I had intended to major in Chemical Biology; that quickly changed, however, as I realized just how much math interested me and I had the opportunity to begin learning data science. The long and winding path that I took to arrive at my choice of majors has led me into becoming very active on campus within the ecosystem that my majors inhabit, working in various roles in the data science/CS communities, including developing and co-teaching a course about the economic applications of data science.

I recently completed an honors thesis for my data science program. My thesis topic was to determine the efficacy of predicting movie ratings using convolutional neural networks trained on poster images. I trained several different neural networks on posters queried from The Movie Databases's API and compared them to control models trained on other movie data to see whether these neural networks were viable models. Some of the other projects that I have completed in the course of my studies include an RDBMS built in Java, Gitlet (a functional miniature-version of Git programmed in Java), BearMaps (a mapping and navigation program based on OpenStreetMaps and graph algorithms), and a Spam/Ham classifier based on logistic regression.

Projects

Teaching

Undergraduate Student Instructor, UC Berkeley Electrical Engineering and Computer Science

Since January 2020, I have been a UGSI for Data 100: Principles & Techniques of Data Science (approx. 900 students). My role includes leading a discussion and lab section comprised of about 30 students each, developing teaching materials for these sections, and holding office hours for studens in the course. I also try to write materials for students to review and to use to study, including posting my discussion materials, and post them on my website.

Data Science Curriculum Development, UC Berkeley

As a part of my time at the Division of Data Sciences at UC Berkeley, I participated in curriculum development for some courses that are taught by Division staff and for other “data-enabled” courses (courses outside the Division but which use Division infrastructure). The first course that I worked on, L&S 88, focused on reproducibility and open science. It was a connector course for Data 8 (the foundational course for data science students) and it was my role as a connector assistant that spurred me into working more and more at the Division.

I present here some of the materials that I developed for courses at UC Berkeley as a part of the work I did at the Division, as well as some details on the courses they are for and my role therein. Most of what I present here is work relating to curriculum development, but I also worked as something of a lab assistant on courses, including L&S 88.

Data 88: Economic Models

Spring 2020, Fall 2019. This course is another Data 8 connector course that looks at how to apply the methods and tools of data science to economic questions. Lecture topics include SymPy, supply & demand, utility, the Cobb-Douglas production function, inequality, and other applied topics. I am a connector assistant for this class, and my contribution was the economic demography lecture along with some ipywidgets-backed applets for use in other notebooks.

SW 282: Social Welfare Research

Fall 2019. This is a module (a set of notebooks presented in non-DS courses) that I am building from scratch. It brings the power of data science to students who have no coding experience so that they can leverage the tools we show them to use in research. It covers subjects including data abstractions for rectangular data, creating data visualizations, and estimating population parameters using the boostrap.

MCB 32: Introduction to Human Physiology

Summer 2019. This module brings several physiological concepts into the data science framework. My role on this module as mainly in upkeep and updating the notebook styles and code, but I also worked on Lab 9, which deals with building a k-nearest neighbors classifier for diabetes, by adding a section in which we explain hypothesis testing and run an A/B test on the data used in the notebook.

L&S 88: Reproducibility and Open Science

Spring 2019. This course was a Data 8 connector course that focused on questions of reproducibility and open science within the Data Science community. It featured lectures on things like Project Jupyter, Licensing, and Data Repositories & Archiving. My role in this class was as a connector assistant, which primarily involved curriculum development and lab assisting in class. I developed quite a few labs for this course, including a matplotlib tutorial and a lab on Python vs. R in Jupyter notebooks.

Co-Curriculars

Connector Assistant & Modules Developer

UC Berkeley Data Science Education Program

I started at the Division in January 2019 as a connector assistant for L&S 88: Reproducibility & Open Science (discussed above). As connector assistant, my role was twofold: I attended class and acted as a lab assistant during the lab portion of the class, and I worked with the course instructors to develop assignments that fit with the narrative they had for the course. After L&S 88 ended, I stayed on at the division and worked on a few different modules (sets of notebooks taught in non-DS courses). The modules that I worked on include a wide range of subjects, including human physiology, sociology, social welfare, and French.

Academic Intern

Department of Electrical Engineering & Computer Science, UC Berkeley

I was an AI for two courses: Data 100: Principles & Techniques of Computer Science & CS 88: Computational Structures in Data Science. Being an AI is similar to lab assisting, in that I spend my time in the course lab sections assisting students with completing the assignments, answering theoretical questions, and troubleshooting technical issues with assignments and students' machines.

Eagle Scout

As of October 19, 2016

The highest award in the Boy Scouts of America, I worked my way up through seven ranks and twenty-something merit badges before completing an Eagle Scout Service Project in order to obtain this honor. The Eagle Project involved designing, funding, and completing a project to benefit a local nonprofit; my project involved repainting ceiling tiles in my high school’s MPR. You can see my project notebook (from proposal to conclusion) here.

Work Experience

Statistics Intern

Federal Reserve Bank of San Francisco, May 2020 - Present

As a part of my internship, I perform inspection and analysis on data from various bank financial reports utilized for analysis of economic activity by the Federal Reserve Board of Governors, including detecting and writing remarks for anomalous data points and assessing trends and issues by comparing against past financial data. This involves using internal databases to assess financial data reported to the Statistics function, managing data pipelines from various data sources in order to query, merge, shape, filter, and transform data to meet business needs, and applying cleaned data to business problems by creating data products that utilize the data and algorithms to reduce the workload of analysts.

I built a remark prediction algorithm (a two-part program) to accompany anomaly detection performed on weekly balance sheet data from the FR 2644 for 65+ depository institutions. The first part pulls 52 weeks of historical data for the report from the current as-of date and merges the data queried from 4 tables into a usable form for predicting remarks and the second matches detected anomalies with the most recent viable remark to carry forward in order to auto-fill the remark for analyst.

Undergraduate Student Instructor

UC Berkeley Electrical Engineering and Computer Science, January 2020 - Present

As a UGSI, I am responsible for teaching discussion sections, holding office hours, grading exams, and developing and maintaining infrastructure for a course of near 1,000 students. This includes managing the course JupyterHub distribution and developing new features for the class autograding solution, Otter Grader, which I built as a part of my position at the Data Science Education Program.

Lead Developer

UC Berkeley Data Science Education Program, May 2019 - Present

I manage a team of 6 developers working on 3 projects using Agile methodologies and other technologies (e.g. JIRA, Trello) to manage development cycles and encourage parallelization of tasks. This involves performing code review for projects in order to ensure adherence to DevOps best practices and that business requirements are being met and leading weekly meetings and tri-weekly standups to keep leadership abreast of project statuses and developers on top of user stories to be completed during the sprint. I also participate in ongoing campus discussions surrounding standardizing autograding solutions and practices throughout campus for major courses, including presenting on two subjects for the 2020 National Workshop on Data Science Education, covering autograding and UC Berkeley’s existing solutions and pitching Otter Grader (see below) as the new UC Berkeley standard for autograding, and meeting with data science campus leadership and course instructors across institutions to discuss autograding solutions, pitch Otter Grader, and advise on setting up grading infrastructure.

The main projects I work on relate to autograding solutions that are designed to reduce the barrier to entry for setting up data science courses. I built a serverless autograding solution, Otter Grader, that grades students’ notebooks locally and integrates with 3rd party autograding services that is used in UC Berkeley courses, including a class of over 1,000 students. My team also improved an existing autograding solution by adding telemetry to track students’ submissions and attempts for a massive open online data science course offered through EdX with more than 44,000 enrolled students.