Artificial Intelligence

Machine Learning Classification of Handwritten Digits

Project Overview

This project focused on building and evaluating machine learning models to classify handwritten digits based on pixel-level image data. Using a labeled dataset of numeric character images, I implemented and compared multiple supervised learning algorithms to determine which method produced the most accurate and reliable predictions.

The project demonstrates my ability to apply core machine learning concepts, prepare structured data for modeling, and evaluate algorithm performance using real quantitative metrics.

KNN Confusion Matrix

Logistic Regression Confusion Matrix

Random Forest Confusion Matrix

Data and Methodology

The dataset used in this project consisted of thousands of digit images represented as numerical feature vectors. Each row corresponded to an image, with pixel intensity values used as input features and the actual digit (0–9) as the target label.

The project followed a standard machine learning workflow:

Data Preparation
- Loaded and structured the digit dataset
- Separated features and target variables
- Split data into training and testing sets
Model Development
- Implemented multiple classification algorithms, including:
  - k-Nearest Neighbors (kNN)
  - Logistic Regression
  - Support Vector Machines (SVM)
- Tuned parameters to improve performance
Model Evaluation
- Measured accuracy on test data
- Compared error rates across models
- Analyzed confusion matrices and misclassifications
Result Interpretation
- Determined which algorithm performed best
- Evaluated trade-offs between speed and accuracy

Findings and Conclusions

This project successfully implemented three machine learning models to classify handwritten digits using the USPS dataset. After thoroughly evaluating accuracy, confusion matrices, and individual digit performance, the findings indicate:

● Best Overall Model: Random Forest (Accuracy: 94.22%)

○ Random Forest provided the strongest and most consistent performance. Its structure allowed it to handle nonlinear patterns and handwriting variability effectively.

● KNN also performed competitively, achieving 92.83% accuracy, though it proved more sensitive to noisy or ambiguous handwriting.

● Logistic Regression delivered predictable and interpretable results, though with slightly lower accuracy compared to the nonlinear models.

Page updated

Google Sites

Report abuse