Overwatch 2 Performance Data

Project Description

In this project, I focus on building a robust classification model to predict win rate classifications in Overwatch, while addressing class imbalance in the dataset. I begin by exploring and preprocessing the data—identifying missing values and skewed distributions, which I handle using a custom imputation strategy based on feature skewness (mean or median imputation). I also ensure that all numerical features are scaled and categorical variables are encoded using pipelines with StandardScaler and OneHotEncoder. To build the model, I implement a Random Forest Classifier within a scikit-learn pipeline, combining both numerical and categorical preprocessing stages. Initially, I train and evaluate the model on a stratified train-validation split to ensure that class proportions are preserved. After establishing a baseline, I address the imbalance issue using RandomOverSampler to oversample the minority classes during training. I integrate the oversampling step directly into the machine learning pipeline using imblearn.Pipeline. The pipeline is retrained with oversampling enabled, and the validation accuracy is re-evaluated. This helps me assess the impact of resampling on model performance. While accuracy is the main metric reported in this setup, the pipeline could be easily extended with additional evaluation metrics such as precision, recall, and F1-score for deeper insight. Through this workflow, I learn how to structure modular machine learning pipelines, handle imbalanced classification using oversampling techniques, and improve model fairness across classes without changing the core algorithm. The result is a cleaner and more balanced model that generalizes better to unseen data.

My Role

Data Scientist

Tech Stack

Pandas, NumPy, AI Model

Link Project

https://colab.research.google.com/drive/1GrvdJNBUME7cxRm9P54Hn29o3H438tVl?usp=sharing