Christopher Pyles

CHRIS
PYLES

I am an incoming software engineer at IXL Learning, having just graduated from UC Berkeley with a Bachelor's degree in Computer Science and Data Science. My primary interest lies in educational technology, as I spent most of my time at Berkeley working at the Data Science Education Program developing software solutions to facilitate data science pedagogy. My primary focus there was in autograding and full-stack development, developing a Python and R autograding solution and a few web applications for various programs. I have also contracted for Microsoft, developing another autograding solution for Python.

Education

University of California, Berkeley

Class of 2021, Majoring in Computer Science and Data Science (Business & Industrial Analytics), Minoring in Demography

I graduated from UC Berkeley, earning a Bachelor's degree in Computer Science and Data Science. While at Berkeley, I tried to take classes that were relevant to my interest in financial systems while working on campus and being active in extra-curriculars. When I came to UC Berkeley, I had intended to major in Chemical Biology; that quickly changed, however, as I realized just how much math interested me and I had the opportunity to begin learning data science. The long and winding path that I took to arrive at my choice of majors has led me into becoming very active on campus within the ecosystem that my majors inhabit, working in various roles in the data science/CS communities, including developing and co-teaching a course about the economic applications of data science.

I completed an honors thesis for my data science program. My thesis topic was to determine the efficacy of predicting movie ratings using convolutional neural networks trained on poster images. I trained several different neural networks on posters queried from The Movie Databases's API and compared them to control models trained on other movie data to see whether these neural networks were viable models. Some of the other projects that I have completed in the course of my studies include an RDBMS built in Java, Gitlet (a functional miniature-version of Git programmed in Java), BearMaps (a mapping and navigation program based on OpenStreetMaps and graph algorithms), and a Spam/Ham classifier based on logistic regression.

Skills

Open-Source Projects

Here are a few selected projects from my portfolio. You can find a more complete list on my GitHub profile.

Otter-Grader

Otter-Grader is a Python-based server-optional autograding solution that can grade both Python and R assignments. It is designed to be serverless and platform-agnostic, allowing instructors to create an autograding pipeline with whatever hardware or learning management systems they prefer.

PyBryt

PyBryt is a Python pedagogical auto-assessment tool designed to avoid the rigidly-structured unit test based autograding format of conventional autograders. Instead, it allows isntructors to compare student submissions against any number of correct reference assignments, examining the values created by the student to determine whether they have implemented a correct solution.

Scheme Interpreter

This project is a Scheme REPL that runs entirely in the browser. The Scheme interpreter is a Node.js module and its front-end is written with React, bundled with webpack, and transpiled with babel.

Data 88E

Data 88E is a course I co-taught that extends the material of UC Berkeley's introductory data science course into economic applications and a survey of upper division economics topics. For my part, I built and ran the course infrastructure, and taught a unit on game theory.

OK Test Generator

The OK test generator is small web application that creates OK-formatted autograder test files for the Python autograders developed and used at UC Berkeley. It has gone through several versions as I used it to cut my teeth on new technologies, including versions built with Ruby Sinatra and Angular.

datascience

datascience is a Python package and powerful pedagogical tool developed and used in UC Berkeley's introductory data science courses. My involvement surrounded expanding the use of interactive plotly-based plots and expanding the mapping functionality with folium.

Teaching

Undergraduate Student Instructor, UC Berkeley Division of Computing, Data Science and Society

Data 100: Principles & Techniques of Data Science

Spring 2021, Fall 2020, Spring 2020. Since January 2020, I have been a UGSI for Data 100: Principles & Techniques of Data Science, a class of approximately 1200 students. My role includes leading a discussion and lab section comprised of about 30 students each, developing teaching materials for these sections, and holding office hours for studens in the course. My main responsibility for the course covers the course infrastrucutre, including managing the autograder (Otter-Grader, referenced above) and cloud computing environment that students use to complete their assignments.

Data Science Curriculum Development, UC Berkeley Data Science Education Program

As a part of my time at DSEP, I participated in curriculum development for some courses that are taught by DSEP staff and for other "data-enabled" courses (courses outside the Division but which use DSEP infrastructure). The first course that I worked on, L&S 88, focused on reproducibility and open science. It was a connector course for Data 8 (the foundational course for data science students) and it was my role as a connector assistant that spurred me into working more and more in the campus data science community.

I present here some of the materials that I developed for courses at UC Berkeley as a part of the work I did at the Division, as well as some details on the courses they are for and my role therein. Most of what I present here is work relating to curriculum development, but I also worked as something of a lab assistant on courses, including L&S 88.

Data 88E: Economic Models

Spring 2021, Fall 2020, Spring 2020, Fall 2019. This course is another Data 8 connector course that looks at how to apply the methods and tools of data science to economic questions. Lecture topics include SymPy, supply & demand, utility, the Cobb-Douglas production function, inequality, game theory, and other applied topics. I am a connector assistant for this class, and my contribution was the game theory lecture along with some ipywidgets-backed applets for use in other notebooks.

SW 282: Social Welfare Research

Fall 2019. This is a module (a set of notebooks presented in non-DS courses) that I am building from scratch. It brings the power of data science to students who have no coding experience so that they can leverage the tools we show them to use in research. It covers subjects including data abstractions for rectangular data, creating data visualizations, and estimating population parameters using the boostrap.

MCB 32: Introduction to Human Physiology

Summer 2019. This module brings several physiological concepts into the data science framework. My role on this module as mainly in upkeep and updating the notebook styles and code, but I also worked on Lab 9, which deals with building a k-nearest neighbors classifier for diabetes, by adding a section in which we explain hypothesis testing and run an A/B test on the data used in the notebook.

L&S 88: Reproducibility and Open Science

Spring 2019. This course was a Data 8 connector course that focused on questions of reproducibility and open science within the Data Science community. It featured lectures on things like Project Jupyter, Licensing, and Data Repositories & Archiving. My role in this class was as a connector assistant, which primarily involved curriculum development and lab assisting in class. I developed quite a few labs for this course, including a matplotlib tutorial and a lab on Python vs. R in Jupyter notebooks.

Co-Curriculars

Connector Assistant & Modules Developer

UC Berkeley Data Science Education Program

I started at DSEP in January 2019 as a connector assistant for L&S 88: Reproducibility & Open Science (discussed above). As connector assistant, my role was twofold: I attended class and acted as a lab assistant during the lab portion of the class, and I worked with the course instructors to develop assignments that fit with the narrative they had for the course. After L&S 88 ended, I stayed on at the division and worked on a few different modules (sets of notebooks taught in non-DS courses). The modules that I worked on include a wide range of subjects, including human physiology, sociology, social welfare, and French.

Academic Intern

Department of Electrical Engineering & Computer Science, UC Berkeley

I was an AI for two courses: Data 100: Principles & Techniques of Computer Science & CS 88: Computational Structures in Data Science. Being an AI is similar to lab assisting, in that I spend my time in the course lab sections assisting students with completing the assignments, answering theoretical questions, and troubleshooting technical issues with assignments and students' machines.

Eagle Scout

As of October 19, 2016

The highest award in the Boy Scouts of America, I worked my way up through seven ranks and twenty-something merit badges before completing an Eagle Scout Service Project in order to obtain this honor. The Eagle Project involved designing, funding, and completing a project to benefit a local nonprofit; my project involved repainting ceiling tiles in my high school’s MPR. You can see my project notebook (from proposal to conclusion) here.

Work Experience

Software Engineer

IXL Learning, July 2021 - Present

At IXL, I am a member of the Teacher Experience team, which works on maintaining and building new features for educators on IXL's platform. This is a full-stack development role on an Agile team with biweekly standups and releases. The primary tools I use here are Struts 2 and React.

Contract Software Engineer

Microsoft, October 2020 - June 2021

At Microsoft, I architected and engineered an open source Python auto-assessment solution called PyBryt that implements a unique autograding structure grounded in the philosophy that intermediate to advanced programming courses should not require students to follow a rigid scaffold for solving problems. I provided engineering support to a series of pilots of PyBryt in courses of up to 1,200 students per semester at UC Berkeley, Imperial College London, and Tel Aviv University, and developed and designed software on an Agile team with biweekly standup schedule emphasizing short feedback cycles and iterative development with weekly internal releases. I also collaborated with research and academic engagements within Microsoft Cloud and Ecosystem, working together to develop PyBryt.

Statistics Software Development Intern

Federal Reserve Bank of San Francisco, May 2020 - August 2020

As a part of my internship, I perform inspection and analysis on data from various bank financial reports utilized for analysis of economic activity by the Federal Reserve Board of Governors, including detecting and writing remarks for anomalous data points and assessing trends and issues by comparing against past financial data. This involves using internal databases to assess financial data reported to the Statistics function, managing data pipelines from various data sources in order to query, merge, shape, filter, and transform data to meet business needs, and applying cleaned data to business problems by creating data products that utilize the data and algorithms to reduce the workload of analysts.

I built a remark prediction algorithm (a two-part program) to accompany anomaly detection performed on weekly balance sheet data from the FR 2644 for 65+ depository institutions. The first part pulls 52 weeks of historical data for the report from the current as-of date and merges the data queried from 4 tables into a usable form for predicting remarks and the second matches detected anomalies with the most recent viable remark to carry forward in order to auto-fill the remark for analyst.

Lead Developer

UC Berkeley Data Science Education Program, May 2019 - June 2021

I lead a team of 8 developers working on 3 concurrent open source projects often using previously-unfamiliar technologies using Agile methodologies (include behavior- and test-driven development and continuous integration) to organize development cycles and encourage parallelization of tasks, including performing code review and facilitating weekly meeting standups.

The main project I work on is an open source Python and R autograding solution, Otter-Grader, that scalably grades students’ programming assignments and abstracts away autograding internals for instructors, which has been adopted in 15+ courses at and outside of UC Berkeley and has impacted 2,500+ students. Part of this work has been creating an open source community for development around GitHub and a public Slack to allow instructors and contributors to communicate effectively and to allow iterations on the package to be a community effort. I also led an interactive hands-on demonstration of autograding solutions and a presentation on the engineering and infrastructure constraints of those solutions at the 2020 National Workshop on Data Science Education.

Another of the projects I led was the end-to-end development and deployment of a Django web application for the Data Science Discovery program, allowing students to apply to various projects run a data science research program on campus and hosted on Microsoft Azure. I created an open source GitHub workflow following engineering best practices for collaboration including branch-per-feature, CI/CD pipelines, and multiple staging environments. I also wrote comprehensive internal documentation for the application deployment pipeline as well as how to use and develop on the application following the dev practices set out for this project.