3.1.3. k-Nearest Neighbour

The k-NN algorithm is a learning algorithm that predicts the outcome of a test value (input data) based on the *k* nearest neighbors (i.e., other close data points) and the distance computed between them [35]. By calculating the distance between the *k* points in the training data closest to the test value, the test value is considered to belong to the category with the least distance. The distance measure can be based on the Euclidean, Manhattan, or Minkowski methods [36].

In using the k-NN algorithm, first the data may need to be standardized/normalized since the outcome may be fairly different due to some features having large variances. Then, it is essential to determine an optimal *k* value, which is often obtained via a hyperparameter tuning exercise. In this case, a range of *k* values are tested, and a good value is obtained that minimizes the error rate of the model.

The k-NN has remained viable in many application areas because of its simplicity and ease of application, its dependence on only two main metrics, namely the *k* and the distance metric, and its ability to easily add new data to the algorithm. The hyperparameters fine tuned for the k-NN are the *k* value and the type of neighbor search algorithm.
