Kaitlin Khong

Academics

MS in Applied Data Science, USC Viterbi School of Engineering

Function and design of modern storage systems, including cloud; data management techniques; data modeling; network-attached storage, clusters and data centers; relational databases; the map-reduce paradigm.

Learning about the data lifecycle; data mining; NoSQL databases; tools for storage/processing/analytics of large data set on clusters; in-data techniques.


BS in Statistics & Data Science, University of California, Santa Barbara

Used statistical methods in R for machine learning, exploring data with techniques like classification, regression trees, and random forests. Found patterns in large datasets and built predictive models, playing around with different models and evaluating their performance.  

Final Project: Predicted San Francisco Airbnb prices using K-Nearest Neighbors, Boosted Trees, and Random Forest. Achieved a 73% accuracy with feature engineering on location, amenities, seasonality, and k-fold cross-validation in R. 

Familiarized with data retrieval, analysis, and visualization in Python through hands-on projects. Learned the importance of domain knowledge through case studies touching on data ethics, privacy, and statistical traps. Handled missing data and notions of causality. 

Final Project: Analyzed national wealth, positivity, and social support systems to understand the national happiness level of countries through regression analysis in Python.

Learned how to handle relational database management systems and the groundwork behind cloud computing, distributed data storage, and retrieval. Used PySpark to analyze big data, exploring concepts like clustering, dimension reduction, and both supervised and unsupervised learning. 

Final Project: Predicted 2020 Voter Turnout and Party Classification with Gradient Boosting and SVM model in Pyspark, analyzing 50TB of data with a 98% RMSE accuracy.

 Applied stationary and non-stationary models, seasonal time series, ARMA models: ACF, PACF, mean, and ACF estimation. Learned diagnostic checking, forecasting, spectral analysis, and the periodogram.

Final Project: Forecasted Uniqlo stock prices using SARIMA, STL, and Holt-Winters. Used evaluation metrics such as RMSE and MAE to assess model performance using R. 

Conducted data analyses, modeling, and data management under Dr. Jingyi Wang, focusing on temporal-memory patterns in the brain.  Additionally, I led the data analysis for Bente Winkler’s meta-analysis on working memory, which included over 200 research articles. 

Publication: Winkler, B. (2024) The influence of emotions on working memory performance: A meta-analysis. (Master’s thesis, Faculty of Behavioral and Social Sciences, Leiden University).