Student Grade Classification

Project Description

In this project, I apply the Classification and Regression Tree (CART) algorithm to predict students' final grades based on various academic and demographic features. I begin by exploring the dataset through visualizations of age distribution, grade frequency, attendance, and stress levels. I then clean the data by removing irrelevant columns, filling missing values with median or mode depending on the feature type, and encoding categorical variables using LabelEncoder. To ensure fair feature scaling, I apply StandardScaler to all numerical columns, and map the target variable (Grade) from categorical (A–F) to ordinal numerical values (5–1). I also handle outliers using an IQR-based method: for features with low outlier proportion (<3%), I remove the rows, while for higher outlier proportions, I cap extreme values to the lower or upper bound. With the data prepared, I split it into training and validation sets using stratified sampling to preserve class distribution. I then train a Decision Tree Classifier, tuning hyperparameters such as max_depth, min_samples_split, and min_samples_leaf using GridSearchCV. After training, I evaluate the model’s performance on the validation set using accuracy as the metric. Once the best model is selected, I generate predictions on the cleaned test dataset. Finally, I map the predicted grade values back to their original labels (A–F) and create a submission file. This project helps me understand the full pipeline of building an interpretable classification model—from EDA, preprocessing, and outlier handling to hyperparameter tuning and deployment—using decision trees for multiclass classification in an educational context.

My Role

Data Scientist

Tech Stack

Panda, NumPy, AI Model

Link Project

https://colab.research.google.com/drive/1W7P512Hw7Vxa30fHGlIO3WJgSCUeHcVH?usp=sharing