This project focused on building and evaluating machine learning models to classify handwritten digits based on pixel-level image data. Using a labeled dataset of numeric character images, I implemented and compared multiple supervised learning algorithms to determine which method produced the most accurate and reliable predictions.
The project demonstrates my ability to apply core machine learning concepts, prepare structured data for modeling, and evaluate algorithm performance using real quantitative metrics.
The dataset used in this project consisted of thousands of digit images represented as numerical feature vectors. Each row corresponded to an image, with pixel intensity values used as input features and the actual digit (0–9) as the target label.
The project followed a standard machine learning workflow:
Data Preparation
Loaded and structured the digit dataset
Separated features and target variables
Split data into training and testing sets
Model Development
Implemented multiple classification algorithms, including:
k-Nearest Neighbors (kNN)
Logistic Regression
Support Vector Machines (SVM)
Tuned parameters to improve performance
Model Evaluation
Measured accuracy on test data
Compared error rates across models
Analyzed confusion matrices and misclassifications
Result Interpretation
Determined which algorithm performed best
Evaluated trade-offs between speed and accuracy
This project successfully implemented three machine learning models to classify handwritten digits using the USPS dataset. After thoroughly evaluating accuracy, confusion matrices, and individual digit performance, the findings indicate:
● Best Overall Model: Random Forest (Accuracy: 94.22%)
○ Random Forest provided the strongest and most consistent performance. Its structure allowed it to handle nonlinear patterns and handwriting variability effectively.
● KNN also performed competitively, achieving 92.83% accuracy, though it proved more sensitive to noisy or ambiguous handwriting.
● Logistic Regression delivered predictable and interpretable results, though with slightly lower accuracy compared to the nonlinear models.