Kirk McGraw

Data Scientist  •  Data Engineering  •  Machine Learning

As a data scientist and analyst in the civil engineering field, I've built computer vision pipelines for petabytes of complex lidar roadway data and thorough historical time-series analyses. This page showcases a selection of personal and academic projects. Thanks for stopping by!

Python PySpark Machine Learning Computer Vision RAG / LLMs Data Pipelines R

Projects

A collection of data science, ML, and engineering work

Spring 2026

Lockoftheweek.com — College Football Pick'em App

Full-stack college football office pool web app. Each week a league commissioner selects NCAA matchups; league members pick winners and rank them 1–10 by their confidence the teams they choose will win. Picks lock Saturday at noon, standings update live as scores roll in.

React TypeScript Node.js PostgreSQL Docker

Spring 2026

Climate Data Analytics

A suite of interactive analyses exploring long-run climate signals across 309 coastal cities from 1966 to 2026, covering Gulf Stream heat transport, precipitation structure, and geospatial trend mapping.

GCP Plotly Time Series Geospatial

Winter 2025

Collabrium — AI-Powered Collaboration Platform

Multi-modal RAG platform that lets small teams ingest PDFs, images, and math-heavy documents into a secure vector store, then query them through an intelligent AI teammate. Built as a Harvard AC215 capstone project.

RAG ChromaDB VertexAI CLIP Docker Python

Summer 2025

Quantitative Finance Strategy with Machine Learning

Robust financial time-series workflow implementing pairs-trading mean-reversion with random forest classification, plus Augmented Dickey-Fuller and Breusch-Pagan testing for statistical validity.

Python Pairs Trading Random Forest Time Series

Spring 2025

PPE Object Detection — R-CNN vs. Transformer

Comparative investigation of R-CNN vs. Transformer object detection models on personal protective equipment data. Models can monitor PPE non-compliance in industry, preventing fines and liability.

Computer Vision R-CNN Transformers Python

Winter 2024

Data Engineering Lakehouse

Complete end-to-end ML data pipeline with a bronze/silver/gold lakehouse schema built on PySpark. The resulting architecture feeds both BI and ML business needs efficiently.

PySpark Lakehouse Delta Lake ETL

Summer 2024

Bird Strikes and You

A logistic regression analysis in R to determine what factors influence bird-strike damage costs. Animal mass and season were most predictive, and actionable recommendations were provided to airlines.

R Logistic Regression Statistics

Get In Touch

If you'd like to reach out, here are a few ways to contact me.