Christopher Pyles

CHRIS
PYLES

I am a site reliability engineer at Google, focusing on capacity management. I also have lots of experience in software engineering and web development, as well as educational technology. I graduated from UC Berkeley with a Bachelor's degree in Computer Science and Data Science (with Honors), where I spent a couple of years working on building a couple of autograders for Python and R which I still contribute to. In my free time, I like to watch movies and contribute to open-source projects.

Education

University of California, Berkeley

Major: Computer Science and Data Science (with Honors), Minor: Demography

August 2017 - May 2021

I graduated from UC Berkeley, earning a Bachelor's degree in Computer Science and Data Science. While at Berkeley, I tried to take classes that were relevant to my interest in financial systems while working on campus and being active in extra-curriculars. When I came to UC Berkeley, I had intended to major in Chemical Biology; that quickly changed, however, as I realized just how much math interested me and I had the opportunity to begin learning data science. The long and winding path that I took to arrive at my choice of majors has led me into becoming very active on campus within the ecosystem that my majors inhabit, working in various roles in the data science/CS communities, including developing and co-teaching a course about the economic applications of data science.

I completed an honors thesis for my data science program. My thesis topic was to determine the efficacy of predicting movie ratings using convolutional neural networks trained on poster images. I trained several different neural networks on posters queried from The Movie Databases's API and compared them to control models trained on other movie data to see whether these neural networks were viable models. Some of the other projects that I have completed in the course of my studies include an RDBMS built in Java, Gitlet (a functional miniature-version of Git programmed in Java), BearMaps (a mapping and navigation program based on OpenStreetMaps and graph algorithms), and a Spam/Ham classifier based on logistic regression.

Skills

Open-Source Projects

Here are a few selected projects from my portfolio. You can find a more complete list on my GitHub profile.

Otter-Grader

Otter-Grader is a Python-based server-optional autograding solution that can grade both Python and R assignments. It is designed to be serverless and platform-agnostic, allowing instructors to create an autograding pipeline with whatever hardware or learning management systems they prefer.

PyBryt

PyBryt is a Python pedagogical auto-assessment tool designed to avoid the rigidly-structured unit test based autograding format of conventional autograders. Instead, it allows isntructors to compare student submissions against any number of correct reference assignments, examining the values created by the student to determine whether they have implemented a correct solution.

Scheme Interpreter

This project is a Scheme REPL that runs entirely in the browser. The Scheme interpreter is a Node.js module and its front-end is written with React, bundled with webpack, and transpiled with babel.

Data 88E

Data 88E is a course I co-taught that extends the material of UC Berkeley's introductory data science course into economic applications and a survey of upper division economics topics. For my part, I built and ran the course infrastructure, and taught a unit on game theory.

OK Test Generator

The OK test generator is small web application that creates OK-formatted autograder test files for the Python autograders developed and used at UC Berkeley. It has gone through several versions as I used it to cut my teeth on new technologies, including versions built with Ruby Sinatra and Angular.

datascience

datascience is a Python package and powerful pedagogical tool developed and used in UC Berkeley's introductory data science courses. My involvement surrounded expanding the use of interactive plotly-based plots and expanding the mapping functionality with folium.

Work Experience

Site Reliability Engineer

Google, May 2022 - Present

I am an incoming site reliability engineer at Google, working on capacity management.

Software Engineer

IXL Learning, July 2021 - May 2022

At IXL, I was a member of the Teacher Experience team, which works on maintaining and building new features for educators on IXL's platform. My role involved full-stack development in Struts and React, working on an Agile team with biweekly standups and releases. Within my first six months at IXL, I had already led a full-scale project to allow instructors to pin skill plans for specific classes instead of their entire roster, for which I designed the back-end, wrote a detailed design doc, and led the team through implementation. One of the other big projects I worked on at IXL was a complete refactor of the back-end for personalized skill plans for exams like the SAT, ACT, and NWEA MAP, a feature used by more than 250,000 users. For this project, I designed the new back-end and did most of the implementation.

Contract Software Engineer

Microsoft, October 2020 - June 2021

At Microsoft, I architected and engineered an open source Python auto-assessment solution called PyBryt that implements a unique autograding structure grounded in the philosophy that intermediate to advanced programming courses should not require students to follow a rigid scaffold for solving problems. I provided engineering support to a series of pilots of PyBryt in courses of up to 1,200 students per semester at UC Berkeley, Imperial College London, and Tel Aviv University, and developed and designed software on an Agile team with biweekly standup schedule emphasizing short feedback cycles and iterative development with weekly internal releases. I spent most of the first half of my time at Microsoft working on the initial release of and developing new features for PyBryt.

In the latter half of my time at Microsoft, I shifted focus to work more on furthering the adoption of PyBryt beyond its pilots at UC Berkeley, Imperial College London, and Tel Aviv University (while still working on feature development). I created a GitHub Action to automate the use of PyBryt as a continuous integration tool for student repositories; I then used this Action to orchestrate a full-scale implementation of a real-world grading pipeline using GitHub Classroom for collecting different implementations of algorithms to be used to construct exercises for interactive Microsoft Learn modules. I also authored a blog post and two Microsoft Learn modules on an introduction to and advanced uses of PyBryt geared towards academics looking to adopt the solution.

Statistics Software Development Intern

Federal Reserve Bank of San Francisco, May 2020 - August 2020

As a part of my internship, I perform inspection and analysis on data from various bank financial reports utilized for analysis of economic activity by the Federal Reserve Board of Governors, including detecting and writing remarks for anomalous data points and assessing trends and issues by comparing against past financial data. This involves using internal databases to assess financial data reported to the Statistics function, managing data pipelines from various data sources in order to query, merge, shape, filter, and transform data to meet business needs, and applying cleaned data to business problems by creating data products that utilize the data and algorithms to reduce the workload of analysts.

I built a remark prediction algorithm (a two-part program) to accompany anomaly detection performed on weekly balance sheet data from the FR 2644 for 65+ depository institutions. The first part pulls 52 weeks of historical data for the report from the current as-of date and merges the data queried from 4 tables into a usable form for predicting remarks and the second matches detected anomalies with the most recent viable remark to carry forward in order to auto-fill the remark for analyst.

Lead Developer

UC Berkeley Data Science Education Program, May 2019 - June 2021

I lead a team of 8 developers working on 3 concurrent open source projects often using previously-unfamiliar technologies using Agile methodologies (include behavior- and test-driven development and continuous integration) to organize development cycles and encourage parallelization of tasks, including performing code review and facilitating weekly meeting standups.

The main project I work on is an open source Python and R autograding solution, Otter-Grader, that scalably grades students’ programming assignments and abstracts away autograding internals for instructors, which has been adopted in 15+ courses at and outside of UC Berkeley and has impacted 2,500+ students. Part of this work has been creating an open source community for development around GitHub and a public Slack to allow instructors and contributors to communicate effectively and to allow iterations on the package to be a community effort. I also led an interactive hands-on demonstration of autograding solutions and a presentation on the engineering and infrastructure constraints of those solutions at the 2020 National Workshop on Data Science Education.

Another of the projects I led was the end-to-end development and deployment of a Django web application for the Data Science Discovery program, allowing students to apply to various projects run a data science research program on campus and hosted on Microsoft Azure. I created an open source GitHub workflow following engineering best practices for collaboration including branch-per-feature, CI/CD pipelines, and multiple staging environments. I also wrote comprehensive internal documentation for the application deployment pipeline as well as how to use and develop on the application following the dev practices set out for this project.

Teaching

Undergraduate Student Instructor, UC Berkeley Division of Computing, Data Science and Society

Data 100: Principles & Techniques of Data Science

Spring 2021, Fall 2020, Spring 2020. Since January 2020, I have been a UGSI for Data 100: Principles & Techniques of Data Science, a class of approximately 1200 students. My role includes leading a discussion and lab section comprised of about 30 students each, developing teaching materials for these sections, and holding office hours for studens in the course. My main responsibility for the course covers the course infrastrucutre, including managing the autograder (Otter-Grader, referenced above) and cloud computing environment that students use to complete their assignments.

Data Science Curriculum Development, UC Berkeley Data Science Education Program

As a part of my time at DSEP, I participated in curriculum development for some courses that are taught by DSEP staff and for other "data-enabled" courses (courses outside the Division but which use DSEP infrastructure). The first course that I worked on, L&S 88, focused on reproducibility and open science. It was a connector course for Data 8 (the foundational course for data science students) and it was my role as a connector assistant that spurred me into working more and more in the campus data science community.

I present here some of the materials that I developed for courses at UC Berkeley as a part of the work I did at the Division, as well as some details on the courses they are for and my role therein. Most of what I present here is work relating to curriculum development, but I also worked as something of a lab assistant on courses, including L&S 88.

Data 88E: Economic Models

Spring 2021, Fall 2020, Spring 2020, Fall 2019. This course is another Data 8 connector course that looks at how to apply the methods and tools of data science to economic questions. Lecture topics include SymPy, supply & demand, utility, the Cobb-Douglas production function, inequality, game theory, and other applied topics. I am a connector assistant for this class, and my contribution was the game theory lecture along with some ipywidgets-backed applets for use in other notebooks.

SW 282: Social Welfare Research

Fall 2019. This is a module (a set of notebooks presented in non-DS courses) that I am building from scratch. It brings the power of data science to students who have no coding experience so that they can leverage the tools we show them to use in research. It covers subjects including data abstractions for rectangular data, creating data visualizations, and estimating population parameters using the boostrap.

MCB 32: Introduction to Human Physiology

Summer 2019. This module brings several physiological concepts into the data science framework. My role on this module as mainly in upkeep and updating the notebook styles and code, but I also worked on Lab 9, which deals with building a k-nearest neighbors classifier for diabetes, by adding a section in which we explain hypothesis testing and run an A/B test on the data used in the notebook.

L&S 88: Reproducibility and Open Science

Spring 2019. This course was a Data 8 connector course that focused on questions of reproducibility and open science within the Data Science community. It featured lectures on things like Project Jupyter, Licensing, and Data Repositories & Archiving. My role in this class was as a connector assistant, which primarily involved curriculum development and lab assisting in class. I developed quite a few labs for this course, including a matplotlib tutorial and a lab on Python vs. R in Jupyter notebooks.