1. Introduction
The protection of new plant varieties is an important part of intellectual property protection and a key element of national intellectual property strategies [
1]. Distinctness, uniformity, and stability (DUS) are the three essential criteria that a plant variety must meet to be eligible for Plant Breeder’s Rights (PBR) protection [
2]. The International Union for the Protection of New Varieties of Plants (UPOV) is an intergovernmental organization based in Geneva, Switzerland, whose mission is to provide and promote an effective system of plant variety protection, with the aim of encouraging the development of new varieties of plants, for the benefit of society [
3]. UPOV has developed a series of DUS guidance, including a general introduction to DUS and the associated series of documents specifying test guidelines procedures and 338 crop-specific test guidelines. In order to provide varieties to be tested and a variety description to be established, the range of expression of each characteristic in the Test Guidelines is divided into a number of states for the purpose of description, and the wording of each state is attributed a numerical ‘Note’ [
4].
There are three types of DUS characteristics: qualitative characteristics, quantitative characteristics and pseudo-qualitative characteristics [
5]. Quantitative characteristics are those where the expression covers the full range of variation from one extreme to the other [
6]. The expression can be recorded on a one-dimensional, continuous or discrete, linear scale. The range of expression is divided into a number of states for the purpose of description (e.g., length of stem: very short (1), short (3), medium (5), long (7), very long (9)). The division seeks to provide, as far as is practical, an even distribution across the scale. It is the intention that the states and notes in the Test Guidelines are useful for the assessment of distinctness.
Test guidelines are the basis for DUS testing. The main problems for developing DUS Test Guidelines include the characteristics selecting, dividing expression states, and the selection of example varieties [
7]. For the qualitative characteristics and pseudo-qualitative characteristics, the states are divided directly based on observation results, while for the quantitative characteristics, 5 scales, “1–9” scale, “1–5 “scale, “1–3” scale,”1–4” scale and “>9” scale were recommended in TGP/7 [
8]. The suitable scale should be selected by the feature of the species. But there are only a few studies conducting research on dividing expression states of quantitative characteristics [
9]. However, the traditional equidistant grading or empirical grading has certain limitations and often fails to accurately reflect the median and discrete degree of characteristic variation and the systematic position of the values taken at each level in the overall variation [
10]. Suitable grading criteria for quantitative characteristics are an important guide for distinctness evaluation. Setting too many grades will lead to a high misjudgment rate. On the contrary, too few grades may affect the application enthusiasm because varieties need larger differences to be granted plant variety rights. It lacks a uniform scientific grading method. Two SD methods [
11], two LSD
0.05 methods [
12] and the equal intervals method are often used to establish grading criteria. However, while SD or LSD
0.05 is used, the minimum requirement for the multiplier is 2, but determining the exact multiplier factor would be difficult.
Whether directly using UPOV testing guidelines or developing national guidelines, the DUS exam, under the guidance of test guidelines, will perform before the registration of a new variety. Distinctness assessment of a new variety apparently looks easy, but actually, it is not so. Based on field and laboratory trials along with the most similar variety, a new variety is compared for all the characteristics which describe the variety according to the test guidelines. The new variety must be clearly distinguishable by one or more essential characteristics from any other variety whose existence is a matter of common knowledge at the time when the protection is applied to [
13]. Although some statistical procedures such as COYD are used to make the comparison scientific and valid [
14], note comparison method is more often applied in the DUS test. For quantitative characteristics, a difference between two notes often represents a clear difference. Varieties with the same note in the UPOV Test Guidelines for a given characteristic would not normally be considered to be clearly distinguishable with respect to that characteristic [
15]. In test guidelines, many quantitative characteristics are recommended for using measurement methods, which means using a ruler, weighing scales, colorimeter, dates, counts, etc. Then, it is transferred to note according to the grading criteria [
8]. As the quantitative traits, genetic control is high because the involved genes are numerous, with usually minor effects and very sensitive to the environment [
16]. To increase the comparability of various descriptions from different years and sites, the grading standards need to be adjusted according to the expression of example varieties in the same trial; it is also a key task to establish applicable criteria in the DUS test.
Anthurium is an attractive and commercially popular ornamental plant used as a cut flower, flowering potted plant and landscape ornamental. Among the tropical foliage, the genus
Anthurium excels in the ornamental market due to its rich diversity in shapes, beautiful leaves and durability [
17]. The volume of
Anthurium sales is ranked second in the world after orchids [
18]. Anthurium breeding is gaining importance as it is a prominent place in the floral market. Classical and biotechnological methods are used in breeding [
18]. Crossbreeding, especially the interspecific cross, contributed to a significant increase in anthurium varieties. Due to the high requirements of temperature, humidity, and light for anthurium, protected cultivation is commonly used for commercial breeding and production. Anthurium was introduced to China in the 1970s; it has developed rapidly in recent years and has become a major producer and seller of anthurium [
19]. In China, the application for new variety protection of anthurium is relatively active, only less than
Chrysanthemum and
Phalaenopsis among the ornamental plants; there were 366 applications till August 2022. Although the “1–9” scale is determined in the national test guidelines, there is still much room for interval adjustment. Normally the anthurium DUS testing should be conducted for one growing period, so COYU and COYD are seldom used; note that comparison is the main method of distinctness assessment. Suitable grading criteria will help the tester to make an accurate determination of distinctness.
To explore the feasibility of using multiple comparison methods to establish grading criteria for quantitative traits, we analyzed the variability and distribution patterns of nine quantitative characteristics of 251 anthurium varieties and applied the multiple comparison methods to establish grading criteria for anthurium. This study was conducted to provide a new method to analyze the quantitative characteristics and set scientific grading criteria.
3. Conclusions and Discussion
The quantitative characteristics of anthurium observed in this study did not follow a normal distribution, except spadix thickness at the middle and spathe size. The variation coefficient within varieties varied from 6.96% to 10.11%. The grading results showed that in most characteristics, the standard deviations and LSD0.05 were similar, except spathe size. Grading by the multiple comparison method was simpler, and the criteria were more accurate, with a lower error rate.
It is generally accepted that in the natural state, continuous or intermittent variables of biological phenomena conform to a normal distribution [
20]. Many statistical procedures, such as correlation, regression,
t-tests, and ANOVA, namely parametric tests, are based on the normal distribution of data [
21]. However, the majority of characteristics observed in this article did not conform to a normal distribution; this result was consistent with the results of previous studies [
22]. Research results and my statistical analysis showed that there were significant positive correlations among most quantitative characteristics in
Anthurium [
23]. Selection can change not only the means of quantitative traits but also their distributions, including variance and skew [
24]. It is possible that breeding preferences were the main reason for most quantitative characteristics of
Anthurium did not conform to normal distribution. Anthurium is cultivated primarily for its showy flowers and glossy leaves. The important horticultural features of the flower are its color, size, texture, shape and showiness of the spathe, spadix length, and peduncle length [
25]. Breeders typically prefer varieties with long peduncles because a longer peduncle of potted varieties is usually associated with higher ornamental value, as the spathes are higher than the leaves, while in cut flower varieties, a longer peduncle usually has a higher market value. As anthurium evolved as an understory species in tropical forests [
26], fewer leaves with larger leaf sizes may have been an adaptive feature. But during the sympodial phase, one flower is produced from each leaf axil [
27]; fewer leaves also mean fewer flowers, which is a shortcoming not only for pot flowers but also for cut flowers. To improve the number of flowers, varieties with shorter or narrower leaf blades are preferred.
A difference of two notes is appropriate if the comparison between two varieties is performed at the level of notes. If the difference is only one note, both varieties could be very close to the same borderline (e.g., the high end of note 6 and the low end of note 7), and the difference might not be clear. When comparing the measurement data, a difference smaller than two notes might represent a clear difference. To ensure the accuracy of distinctness assessment by note, appropriate grading criteria need to be taken into account first. The results of this paper showed that the criteria obtained by two SD methods would result in 27.22% of incorrect determinations, while only 0.34% by multiple comparison method. On the other hand, 23.84% variety pair with the same note by criteria obtained of two SD methods considered to be not clearly different was distinguishable if the statistic method was used. In the species with high breeding levels, such as rice and maize, despite the richness of genetic resources, reduced genetic base and the prevalence of only a small set of germplasm resources or landraces in the breeding process had been the general approach [
28]. This breeding process has led to severe homogenization among the varieties and the very high similarity of morphological traits. Varieties with the same note of these species are often evaluated as distinct if it is differently deduced by a T-test or another statistical method. This means that if the grading criteria are not appropriate, even the varieties with the same note need to be statistically analyzed, which will greatly increase the computational work. The note obtained by measurement will lose its function.
In research work, we may often have to determine whether differences exist among the means of three or more groups. The only way to answer this question is to apply the ‘multiple comparison test’ (MCT), which will clarify the differences between particular pairs of experimental groups. The earliest example of a multiple comparison procedure could be found in 1929 [
29]. In DUS testing, it is a common occurrence that a candidate variety needs to be compared with similar wide varieties; MCT can help testers make a judgment. But there is no report about using MCT to group a large number of treatments. The variation of quantitative traits varies greatly among different genera due to their different environmental influences and breeding levels [
30], so it is generally considered that two SD or two LSD
0.05 is the minimum level of variation in establishing quantitative trait classification, and there is no feasible method to determine the appropriate level of variation. In this paper, we innovatively used multiple comparisons to classify anthurium, and the results showed that 2.5 times SD or LSD
0.05 was a suitable interval for anthurium, which will lead to the conclusion that varieties of the same note are not different and varieties with two notes D-value is significantly different, is correct with high confidence. It will reduce the error of distinctness evaluation by note. Anthurium can be propagated by seed or division, but almost all cultivars are now propagated through tissue culture. Compared to other reproductive methods, the morphology of tissue-cultured seedlings is more consistent; our research also showed anthurium varieties have higher uniformity. The lower intra-variety CV means a lower SD, which may be the reason that 2.5 SD was suitable for grading. For other species, especially seed-propagated varieties, 2 SD may be sufficient as a state interval for grading. The sample standard deviation is the average amount of variability in every sample [
31]. It tells you, on average, how far each value lies from the sample mean. A high standard deviation means that values are generally far from the mean, while a low standard deviation indicates that values are clustered close to the mean. The t
0.05 changes with little range while the degree of freedom is less than 3000; the LSD
0.05 value mainly depends on mean square error (MSE) [
12]. MSE is the quotient of the sample standard deviation sum and degree of freedom within the group. While the group number is big enough, the LSD
0.05 will be similar to SD. The results of this paper confirm this theory. By randomly selecting different varieties for comparison, it was found that when the number of varieties reached 150 or more, there was no significant difference between LSD
0.05 and SD, while when the number was less than 120, there was a significant difference. Comparing the average SD of different variety numbers, it was found that there was no difference in the average SD between the number of varieties from 20 to 200. However, the results of LSD
0.05 was different. There is no significant difference between the LSD
0.05 produced by different variety numbers, but the difference between the maximum and minimum of 10 random selecting on the same variety number increases with the decrease of the number of varieties (analysis results not published). Therefore, it is recommended that the SD method be preferred when the number of varieties is less than 150, and while the variety number is large enough and the characters conform to the normal distribution, LSD or SD methods can be used.
Due to a variety of data and statistical considerations, several dozen MCTs have been developed over the decades, such as Fisher LSD, Tukey’s HSD, Bonferroni, Scheffe, Games–Howell and Newman–Keuls [
32]. Among them, Fisher LSD, Tukey and Bonferroni are the most frequently used pairwise comparison tests. Bonferroni is known to be very conservative, while Fisher LSD is sensitive. Even if Fisher recommended using a more stringent alpha while performing his least significant difference procedure (LSD) but researchers find the LSD process inadequate to control a Type I error [
33], Tukey’s HSD is probably the most recommended and used procedure for controlling Type I error rate when making multiple pairwise comparisons [
34]. Absent linear combinations of means, Tukey’s HSD presents a robust and widely available test for a variety of situations. Due to the large number of varieties to be compared in this experiment and the need to perform pairwise comparisons, Tukey’s HSD method was chosen for multiple comparisons. HSD value also main effect by MSE, as the analytic results of LSD, the MSE also change with the number of varieties. More varieties are to be observed to help increase the stability of MSE. This will affect the result of pairwise comparison. In order to obtain reliable results, when using multiple comparisons for grading, the recommended number of varieties is not less than 50.
Example varieties are provided in the test guidelines to clarify the states of expression of a characteristic [
8]. There are many criteria, for example, varieties, such as availability, minimizing the number, and illustration of the range of expression within the variety collection. For quantitative characteristics that need to be observed by measurement, the example varieties should be provided in test guidelines. The main reason why example varieties are used in place of actual measurements is that measurements can be influenced by the environment. By comparing with standard varieties, the same variety in different regions will obtain the same description despite different measurements. In DUS testing practice, each test variety is not directly compared with the example variety; its measurements are transferred to notes according to the grading criteria established by analyzing the measurements of the example varieties. Therefore, when selecting example varieties for measuring quantitative characteristics, experts will choose varieties that represent the average of that state. If multiple comparison methods are used during testing, it is necessary to analyze the tested varieties simultaneously. If the total varieties of a growing trial are less than 50, this method cannot achieve good results.
Multiple comparisons can be analyzed by various software, such as SPSS and GraphPad Prism, but labeling letters requires further manual analysis or other software to achieve. If the amount of data to be compared is relatively large, such as the 251 varieties in this paper, which can form 31,375 variety pairs, it will take a long time to increase the labeling manually. I have written multiple comparison software using Python language, only 6 kb and the package is only 64.1 MB which can be run in Windows system. It only took about 15 min to complete the comparison and labeling of one trait, which was very easy and fast. And the grading criteria can be established by simply classifying the varieties without differences into the same class. The process does not need more adjustments and modifications that rely on experience. This means that with sufficient resources, testers without rich testing experience can accurately transfer the measurement to note, which can greatly reduce the error rate of distinctness evaluation.
This study used anthurium as material to establish the grading criteria by multiple comparisons; the results showed that it was feasible and simple. Other species, especially vegetatively propagated varieties, can benefit from using this method as long as the number of varieties is not less than 50, and whether it is suitable for seed-propagated species needs further verification.