Kirk McGraw

Data Scientist  •  Data Engineering  •  Machine Learning

As a data scientist and analyst in the civil engineering field, I've built computer vision pipelines for petabytes of complex lidar roadway data and thorough historical time-series analyses. This page showcases a selection of personal and academic projects. Thanks for stopping by!

Python PySpark Machine Learning Computer Vision RAG / LLMs Data Pipelines R

Projects

A collection of data science, ML, and engineering work

Collabrium — AI-Powered Collaboration Platform

Multi-modal RAG platform that lets small teams ingest PDFs, images, and math-heavy documents into a secure vector store, then query them through an intelligent AI teammate. Built as a Harvard AC215 capstone project.

RAG ChromaDB VertexAI CLIP Docker Python

Quantitative Finance Strategy with Machine Learning

Robust financial time-series workflow implementing pairs-trading mean-reversion with random forest classification, plus Augmented Dickey-Fuller and Breusch-Pagan testing for statistical validity.

Python Pairs Trading Random Forest Time Series

PPE Object Detection — R-CNN vs. Transformer

Comparative investigation of R-CNN vs. Transformer object detection models on personal protective equipment data. Models can monitor PPE non-compliance in industry, preventing fines and liability.

Computer Vision R-CNN Transformers Python

Data Engineering Lakehouse

Complete end-to-end ML data pipeline with a bronze/silver/gold lakehouse schema built on PySpark. The resulting architecture feeds both BI and ML business needs efficiently.

PySpark Lakehouse Delta Lake ETL

Bird Strikes and You

A logistic regression analysis in R to determine what factors influence bird-strike damage costs. Animal mass and season were most predictive, and actionable recommendations were provided to airlines.

R Logistic Regression Statistics

Get In Touch

If you'd like to reach out, here are a few ways to contact me.