KNN Classifier in C++

Project Description

This C++ project is a high-performance implementation of the K-Nearest Neighbors (KNN) algorithm, built from scratch with a focus on modularity, extensibility, and computational efficiency. The pipeline supports both brute-force and KD-Tree–based approaches, includes a custom vector class, parallel processing with OpenMP, and tools for benchmarking on real-world datasets such as Fashion MNIST.

Features

  • Custom Vector Class: Abstraction over std::vector<double> with support for mathematical operations (dot product, L2 norm, Euclidean distance) and safety checks.
  • Brute-Force KNN: Implements exact neighbor search with optional OpenMP parallelization and efficient voting logic.
  • Data Utilities: CSV loading, synthetic data generation, and deterministic train-test splitting for reproducibility.
  • Evaluation Module: Accuracy calculation with input validation and benchmarking using chrono.
  • KD-Tree Accelerator: Optimized spatial index for high-dimensional neighbor search with heap-based pruning and recursive construction.
  • KD-Tree KNN: Fast tree-based classification with deterministic results, OpenMP support, and clean modular design.
  • CLI Interface: Built using cxxopts to toggle parameters (algorithm mode, neighbors, parallelism, dataset path) and output benchmark metrics.

Benchmarks

  • Brute-force (OpenMP): ~6.5s (84.04% accuracy) vs ~362s serial.
  • KD-Tree (OpenMP): ~7.5s vs ~19s serial (same accuracy).

Technical Stack

  • Language: C++17
  • Libraries: OpenMP, STL, cxxopts
  • Tools: g++, VS Code, Makefile

How It Works

  1. Data Loading: CSV read and preprocessing (features + labels).
  2. Training: Data is indexed using either brute-force or KD-Tree.
  3. Prediction: Parallel or serial prediction using majority voting from k-nearest neighbors.
  4. Evaluation: Accuracy computed and benchmarked for comparison.

Usage

  • Suitable for educational purposes and benchmarking ML algorithms in C++.
  • Modular components enable future integration with more complex datasets or dimensionality reduction techniques.

GitHub Link

Explore the codebase and contribute on GitHub:
KNN C++ Repository