Manufacturing Production Efficiency Metrics

Project Description
In this project, I work on building a regression model to predict a continuous target variable—real efficiency—using feature-rich production data. I begin by exploring the dataset, identifying and imputing missing values using distribution-based strategies (mean or median depending on skewness), and checking for duplicate records. I also create multiple bar plots to visually examine the relationships between efficiency-related variables and categorical groupings such as department, weekday, and quarter. To prepare the data for modeling, I encode categorical features using LabelEncoder and drop irrelevant columns like timestamps and IDs. I then define my training and test sets, with real_efficiency as the target variable. For modeling, I use Ridge Regression enhanced with Polynomial Features to capture non-linear relationships in the data. I apply GridSearchCV to tune key hyperparameters, specifically the polynomial degree and the regularization strength (alpha), using 5-fold cross-validation and the negative mean squared error as the scoring metric. After identifying the best model configuration, I fit it on the training data and generate predictions for the test set. Through this process, I learn how polynomial transformation combined with L2 regularization (Ridge) can improve model flexibility while maintaining generalization. This project deepens my understanding of how to handle real-world regression tasks involving categorical encoding, feature interactions, and hyperparameter optimization in a structured machine learning pipeline.
My Role
Data Scientist
Tech Stack
Panda, NumPy, AI Model