*2.3. Classification Methodology*

The proposed classification methodology theoretically adopts the AL concept, which tries to improve classification performance by adding the most informative pixels with higher uncertainty selected from unlabeled data to new training data. However, unlike the conventional AL approach which requires analysts to manually assign class labels to the most informative pixels, this new classification methodology is based on a self-learning concept by using rule information derived from past land-cover maps (e.g., past CDLs of the study area). This approach can assign class labels to the most informative pixels in an automated manner. The whole procedure of the self-learning approach employed in this study is presented in Figure 2.

**Figure 2.** Flow chart of the classification procedures presented in this study.

## 2.3.1. Initial Classification

In the first processing step, an initial classification with a small amount of training data is carried out. For this process, a support vector machine (SVM), which has been widely applied in supervised classification of remote sensing data [33,34], is applied as the main classifier. SVM tries to find an optimal hyperplane (i.e., decision boundary) that provides the maximum margin [35]. It has been reported that the SVM is superior to other conventional classifiers when a small amount of training sites and many features are used for classification [33,34].

The class-wise *posteriori* probability from the SVM classification is used for the next step, as a kind of index that defines the uncertainty of the initial classification result. Since the SVM does not directly provide probability estimates, the *posteriori* probabilities were computed using pairwise coupling [36]. Several binary classifiers for each possible pair of classes (i.e., one-versus-one) are first created. The probability for each class is then estimated using pairwise coupling [36]. Suppose that *D* and *rij* are the feature vector and the estimates of *P*(*ωi*|*ω<sup>i</sup>* or *ωj*, *D*) for a certain class (*ωi*) by a binary classifier, respectively. Then, the class-wise probability (*P*(*ωi*|*D*)) for multi-class classification is estimated by solving the system [36] as follows:

$$P(\omega\_{\vec{i}}|D) = \sum\_{\vec{j}:\vec{j}\neq\vec{i}} (\frac{P(\omega\_{\vec{i}}|D) + P(\omega\_{\vec{j}}|D)}{M - 1}) r\_{\vec{i}\vec{j}\wedge} \,\forall\_{\vec{i}\vec{\cdot}} \text{ subject to } \sum\_{i=1}^{M} P(\omega\_{\vec{i}}|D) = 1, \, P(\omega\_{\vec{i}}|D) \ge 0, \,\forall\_{\vec{i}}.\tag{1}$$

where *M* is the total number of classes in the study area.
