KNN Classifier in C++
Project Description
This C++ project is a high-performance implementation of the K-Nearest Neighbors (KNN) algorithm, built from scratch with a focus on modularity, extensibility, and computational efficiency. The pipeline supports both brute-force and KD-Tree–based approaches, includes a custom vector class, parallel processing with OpenMP, and tools for benchmarking on real-world datasets such as Fashion MNIST.
Features
- Custom Vector Class: Abstraction over
std::vector<double>with support for mathematical operations (dot product, L2 norm, Euclidean distance) and safety checks. - Brute-Force KNN: Implements exact neighbor search with optional OpenMP parallelization and efficient voting logic.
- Data Utilities: CSV loading, synthetic data generation, and deterministic train-test splitting for reproducibility.
- Evaluation Module: Accuracy calculation with input validation and benchmarking using
chrono. - KD-Tree Accelerator: Optimized spatial index for high-dimensional neighbor search with heap-based pruning and recursive construction.
- KD-Tree KNN: Fast tree-based classification with deterministic results, OpenMP support, and clean modular design.
- CLI Interface: Built using
cxxoptsto toggle parameters (algorithm mode, neighbors, parallelism, dataset path) and output benchmark metrics.
Benchmarks
- Brute-force (OpenMP): ~6.5s (84.04% accuracy) vs ~362s serial.
- KD-Tree (OpenMP): ~7.5s vs ~19s serial (same accuracy).
Technical Stack
- Language: C++17
- Libraries: OpenMP, STL, cxxopts
- Tools: g++, VS Code, Makefile
How It Works
- Data Loading: CSV read and preprocessing (features + labels).
- Training: Data is indexed using either brute-force or KD-Tree.
- Prediction: Parallel or serial prediction using majority voting from k-nearest neighbors.
- Evaluation: Accuracy computed and benchmarked for comparison.
Usage
- Suitable for educational purposes and benchmarking ML algorithms in C++.
- Modular components enable future integration with more complex datasets or dimensionality reduction techniques.
GitHub Link
Explore the codebase and contribute on GitHub:
KNN C++ Repository