Next Article in Journal
Numerically Optimized Fourier Transform-Based Beamforming Accelerated by Neural Networks
Previous Article in Journal
Determining Steady-State Operation Criteria Using Transient Performance Modelling and Steady-State Diagnostics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid-Margin Softmax for the Detection of Trademark Image Similarity

1
School of Electrical and Electronic Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
2
College of Information Technology, Shanghai Jianqiao University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(7), 2865; https://doi.org/10.3390/app14072865
Submission received: 21 February 2024 / Revised: 24 March 2024 / Accepted: 27 March 2024 / Published: 28 March 2024
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

Abstract

:
The detection of image similarity is critical to trademark (TM) legal registration and court judgment on infringement cases. Meanwhile, there are great challenges regarding the annotation of similar pairs and model generalization on rapidly growing data when deep learning is introduced into the task. The research idea of metric learning is naturally suited for the task where similarity of input is given instead of classification, but current methods are not targeted at the task and should be upgraded. To address these issues, loss-driven model training is introduced, and a hybrid-margin softmax (HMS) is proposed exactly based on the peculiarity of TM images. Two additive penalty margins are attached to the softmax to expand the decision boundary and develop greater tolerance for slight differences between similar TM images. With the HMS, a Siamese neural network (SNN) as the feature extractor is further penalized and the discrimination ability is improved. Experiments demonstrate that the detection model trained on HMS can make full use of small numbers of training data and has great discrimination ability on bigger quantities of test data. Meanwhile, the model can reach high performance with less depth of SNN. Extensive experiments indicate that the HMS-driven model trained completely on TM data generalized well on the face recognition (FR) task, which involves another type of image data.

1. Introduction

Trademarks (TMs) are distinctive designations registered to identify products and sources. The exclusivity of TM provides rules for orderly marketing [1]. However, the high incidence of TM misappropriation causes plenty of revenue and reputation loss to legitimate owners. Consumers can be misled to purchase counterfeit products, especially when the right-infringing TM image is similar to a legal one [2]. Meanwhile, the rapidly growing TM image database is massive itself, which brings great pressure on the governing body.
What further complicates this situation is that there are no defined criteria to conduct the test named ‘likelihood of confusion’ [3]. The test is a critical part of the procedure to determine whether a disputed trademark is similar to another one. Thus, there is a chance that inconsistent judgments are declared by courts of different levels or districts.
Generally, appearance, characters, and sound are taken into consideration during the test [4]. The repetition rate of characters is convenient to assess, and a TM can be pronounced differently among regions. As the most common and important forms of TM, the appearance, by contrast, is more consistent and controversial in judgment.
The feature extraction of TM images is crucial to the above issues. Conventional feature engineering involves manually designed descriptors to detect and match features, e.g., SIFT [5] and ORB [6]. SFIT is a local invariant feature descriptor based on keypoints and local image gradient directions. ORB is a fast binary descriptor based on FAST keypoint detector and binary BRIEF descriptor. These extraction methods focus on some specific image features such as points and edges [7], which makes it expensive to detect TM image similarity comprehensively with several manual descriptors. The great improvement in deep learning, that features which might be omitted by human beings can be extracted efficiently by convolutional kernels, makes introducing computer ‘opinions’ to the procedure of human judgment on TM image similarity a convincing prospect.
There is a great challenge in building training data when deep learning is introduced in the detection of TM image similarity (TMISD). The performance of the detection model is highly correlated with supervised information, while the annotation of similar TM image pairs takes intensive work with skilled labor involved. Furthermore, the model generalization on millions of new TM designs proposes a higher requirement for training data preparation where extensive TM images are supposed to be covered. Metric learning is naturally suitable for solving the problem of limited training shots [8]. A metric function of similarity can be learned to detect whether inputs are similar, instead of classifying the input samples.
Siamese neural networks (SNNs) are widely used in metric learning to extract pairs of input image features [9,10] and metric functions, e.g., Euclidean distance, Manhattan distance, and cosine similarity are used to compare embedded feature vectors [11,12]. Usually, contrastive loss is used in an SNN to minimize the distance between feature vectors of samples in the same classes and maximize the distance between samples in the different classes. There is a hyper-parameter in contrastive loss to control the threshold of distance. A data-driven triplet network was proposed on the basis of an SNN with an additional CNN branch [13]. The triplet loss function is used to decrease the feature vector distance between the anchor sample and the positive sample, and at the same time increase the distance between the anchor sample and the negative sample. The discrimination ability of the triplet network is improved while the training cost is greatly increased with the combinatorial explosion of the building of triplets, i.e., the input data of the triplet network. In this way, the pressure on training data quantity is transferred to the cost of existing annotated data mining by the more elaborate network architecture.
Another research idea in metric learning is loss-driven training methods such as recent works on face recognition (FR) [14,15]. Instead of building a large-scale dataset for training, these metric learning methods transformed the softmax function to conduct a margin penalty on the decision boundary, aiming to develop the discrimination ability of SNNs. The typical SphereFace [16], CosFace [17,18], and ArcFace [19] are all designed to expand intraclass space and reduce interclass space. Some of the reasons are that the performance of a data-driven model relies on the quality of information contained in training data excessively, and manual annotation is a major expenditure of human efforts. Furthermore, the SNN trained on close-set data shows bad performance in generalizing on open-set data [19].
More specifically, for the TMISD task, Setchi proposed a TM similarity analysis system to conduct the ‘likelihood of confusion’ test with three models [20]. Global and local shape feature descriptors, i.e., Zernike moment and an edge gradient co-occurrence matrix are used to extract TM image features. Euclidean distance is used to compute similarity. On this basis, Trappey introduced SNNs into the feature extraction of TM images [21]. VGG16 is used to build an SNN. Alshowaish used pre-trained CNNs to build an SNN including VGG16 and ResNet50 [22]. Most of these works focused on data-driven metric learning methods. However, the training database encounters a great challenge of annotation and covering the rapid growth in new TM designs.
We choose to research the TMISD task from the perspective of a loss-driven metric learning method. Here is a brief introduction to the frequently used loss function, i.e., the softmax function in the classification. The expression of softmax is as follows:
L 1 = 1 N i = 1 N log e W y i T x i + b y i j = 1 n e W j T x i + b j
where   N   is the class number,   W   and b   are weight and bias terms, x i is the embedded feature vector belonging to the y i -th class, and   W j   is the j -th column of the weight   W . Then, by fixing the weight   W = 1   and feature   x i = 1   by l 2 normalization, and by fixing the bias   b j = 0 , the decision boundary is transferred to the angular space.
The transformed softmax function is as follows:
L 2 = 1 N i = 1 N log e cos θ y i e cos θ y i + j = 1 , j y i n e cos θ j
where θ is the included angle between the normalized weight and feature vector. The prediction will depend only on the angle, and the decision boundary can be optimized by margin penalties.
The main contributions of this study are as follows:
(1)
We researched the TMISD task with prevalent methods in metric learning including data-driven and loss-driven. The performance of these methods was investigated from several evaluation aspects regarding the TMISD task, including accuracy, F1 score, training cost, and generalization ability.
(2)
According to the peculiarity of TM images, a hybrid-margin softmax (HMS) is proposed. Two additive margins are attached to the cosine term and the angular term of softmax, respectively, to expand the decision boundary in the angular space. The magnitudes of the weight and feature vector are preserved to retain the input information as much as possible. The metric function used to calculate the similarity is replaced by a classifier, i.e., a fully connected layer.
(3)
Experiments indicate that the detection model penalized by HMS can be trained on small numbers of annotated data and reaches high detection accuracy with fewer layers of SNN. Furthermore, the HMS detection model trained completely on TM data generalizes well on the face recognition (FR) task, which indicates that the model trained on HMS has great input image discrimination ability.

2. Materials and Methods

2.1. Hybrid-Margin Softmax

The peculiarity of TM images is crucial to the TMISD task. We compared the FR and the TMISD task to have a better view of the latter:
(1)
The compositions of images in an FR task are constant. The principal parts of the input pairs of samples are human faces that always come from one exact person or different ones. The features extracted from the input are fixed generally, such as the shapes of faces, eyes, and noses. Plus, there are external interfering terms that should be considered including gestures, illuminations, ages, image noises, etc.
(2)
The TMISD task is aimed at detecting the similarity of TM images. Generally, a TM design consists of a single element or several ones. The elements of the disputed TM image will not be identical to the legal one but partly similar in contours, colors, and textures, as shown in Figure 1. It is common for there to be both similar and different elements between two TM images in disputed cases. It should be noted that new outlines can be formed by the varying placements of elements. Furthermore, interfering terms mentioned in the FR task are no longer to be considered, since TM images are artificially designed in most cases.
To sum up, compared to the FR task, there are supposed to be more margin penalties on the decision boundary to tolerate a wide variety of element design changes in pairs of similar TM images. Meanwhile, the detection model should extract more information from input images to have a full understanding of the similarity degree and avoid false alarms. Therefore, the detection model should be further penalized, and the learnable parameters should be preserved as much as possible.
Given the characteristics of the TMISD task, a hybrid-margin softmax (HMS) is proposed as follows:
L 3 = 1 N i = 1 N log e s   W y i T   x i   cos θ y i d 1 d 2 e s   W y i T   x i   cos θ y i d 1 d 2 + j = 1 , j y i n e s   W j T   x i   cos θ j
where   s   is a global scale factor. W y i   are the weights of the fully connected layer, and x i   is the feature vector of i-th sample extracted by SNN. The weights and feature vectors are not normalized, and the biases are set to zero. Additive margin d 1 and d 2   are attached to the angle term and the cosine term, respectively.
The decision boundary of HMS loss is as follows:
W 1 cos θ 1 = W 2 cos θ 2 d 1 d 2
The decision boundary still can be considered as laying in the angular space with a varying amplitude of the cosine curve, as shown in Figure 2a.

2.2. Interpretation of HMS

Several margins are attached to the inner product form of the logit to tolerate unpredictable little changes between similar elements of TM images. The magnitudes of extracted feature vectors, and the weight vectors, which are learnable parameters, are preserved to retain information from the input as much as possible. The model can benefit from this information when similar elements are contained in the pairs of dissimilar TM images. Plus, the normalization changes the magnitude and direction of the logit, as shown in Figure 2b. The SNN has to constantly adapt to these changes during the training.
Considering a group of TM images with an anchor sample, a similar one and a dissimilar one are given, suppose the class center of the anchor in the feature space is W 2 , the class center of the dissimilar one is W 1 , and the feature vector of a similar one is x , as shown in Figure 2c. The model can make the right prediction by the following calculation:
W 1 cos θ 1 W 2 cos θ 2 d 1 d 2
where θ i i = 1 , 2 denote the angle between feature   x   and class center   W i .
The value of the cosine term is decreased by two margins   d 1   and   d 2 . The model is penalized further to improve discrimination ability. The magnitude of the weight vector can be scaled for better prediction during model training.
There is a toy experiment, as shown in Figure 3, to describe the distribution of features extracted by the SNN trained on different transformations of the softmax function. These features are sent to the classifier to give a prediction. The SNN discrimination ability of TM images is described visually in this way. Red and blue spots are visualized features of two input TM images. Spots in the first row are from dissimilar TM images and spots in the second row are from similar TM images. In the first row, the first four feature spots are loose and chaotic. It is not solid enough for the classifier to judge they are not similar. The last feature spots in the first row extracted by the SNN trained on HMS are oriented intensively and separable in the meantime. The spot distributions in the second row also indicate that features learned from the SNN with HMS are compact and adequate for making a judgment.

3. Results

We conducted two branches of experiments. The comparison of loss-driven methods includes detection models based on an SNN trained on different transformations of softmax. The comparison of data-driven methods includes detection models based on the typical SNN (trained by contrastive loss), triplet network, and fine-tuning method.

3.1. Datasets

There were two types of image data involved in the experiments: TM images and human faces. The face data were used to train the feature extractor in the fine-tuning method and used for testing in the loss-driven methods. The TM image training data were compiled from real-world trademarks and annotated, consisting of 300 pairs of similar samples. The TM test data were collected from real court-disputed cases and cleaned manually, consisting of 1000 pairs of similar samples. The dissimilar TM image pairs were randomly selected and paired, consisting of equivalent numbers of similar pairs. The public LFW dataset was used as human face data, consisting of 3000 pairs of positive samples and 3000 pairs of negative samples [23,24].
The data preprocessing included input images cropped to a fixed-size shape and preserved colors. The TM training data were normalized with corresponding statistics. The TM test data and LFW data were normalized with mean value X ¯ = 0.5 and variance σ = 0.5 .

3.2. Experimental Setup

The details of the detection process are given in this section. The detection based on the loss-driven methods includes feature extracting and classifying.
The process of feature extracting is as follows: the backbone of the feature extractor is an SNN consisting of two identical CNNs that have the same structure and weight, as shown in Figure 4. Images input through the SNN can be encoded to vectors in the same feature space. Several CNN structures are implemented, including a simple self-defined six-layer CNN and resnet18, 34, 50, 101, and 152 [25]. The fully connected layer in resnet is removed, and the six-layer CNN has a similar structure to resnet, including a batchnorm (BN) layer and pooling layer, as shown in Figure 5. There is no residual module in the six-layer CNN.
The process of classifying is as follows: output vectors of the feature extractor are concatenated, activated by the transformation of softmax, and then sent to the fully connected layer, i.e., the classifier with the sigmoid activation function. Output judgment of similarity, the same as the input label, is one-hot encoded.
The detection processes of the data-driven methods are as follows:
(1)
Fine-tuning method: An SNN to be transferred is trained on the LFW training dataset. The backbone of the SNN is composed of an original series of resnet. When the SNN reaches 95% or more accuracy on the LFW test dataset, the fully connected layer is removed and the rest of the weights are frozen. The trained and frozen SNN and a new fully connected layer compose the TM detection model. Then, the model is further trained on the TM training dataset and tested on the TM test dataset.
(2)
Triplet model: Each input consists of two similar TM images, a dissimilar one, and corresponding labels. The triplet is built from the TM training dataset by attaching a random TM image to the pairs of similar samples.
The six-layer CNN is excluded from the data-driven methods since a shallow-depth CNN is not able to meet the demand of fitting in the triplet network model and fine-tuning method.
Other experiment setups are as follows:
The scale factor s in softmax was set to 90, the angular margin in SphereFace was 4, the additive margin of cosine term in CosFace was 0.006, the additive margin of angular term in ArcFace was 0.003, the additive margins of cosine term and angular term in HMS were 0.006 and 0.003, respectively. All experiments were conducted in the pytorch framework. Cuda was used to accelerate training. The learning rate was set to 0.001, the batch size was 16, the optimizer in the triplet network was Adam, and the optimizer in other methods was SGD (momentum was 0.9).

3.3. Loss-Driven Method Experiments

To prove that HMS enables the SNN to learn separable enough features of a TM image, we compared the detection models based on different transformations of softmax under the same experimental conditions. The accuracy and F1 score of detecting similar and dissimilar TM image pairs are shown in Table 1 and Table 2. We also tested the detection model on the LFW dataset while the SNN was still trained with TM image data. The results are shown in Table 3 and Table 4.
For the TMISD task, SphereFace achieves up to 96.39% accuracy, which outperforms CosFace and ArcFace greatly. However, when the depth of SNN increases, the model is overfitted severely. The accuracy of HMS regarding normalization of feature and weight vectors is slightly better than that of CosFace and ArcFace. HMS achieves the best performance, with up to 98.97% accuracy, which is a 2.58% improvement over SphereFace. Another notable thing is that a simple six-layer SNN trained on HMS works well on the TMISD task, with 97.45% accuracy and an F1 score of 0.9516.
For the FR task, the detection model trained on SphereFace with TM data is overfitted. The performance of HMS (normalized) is also better than that of CosFace and ArcFace. The model trained on HMS generalizes well on the LFW dataset with up to 90.57% accuracy and an F1 score of 0.9002. A simple six-layer SNN trained on HMS can reach 80.45% accuracy and an F1 score of 0.8173.
The performances of HMS with different depths of SNN were tested, as shown in Table 1, Table 2, Table 3 and Table 4. ResNet18 was adequate for meeting the demand of the TMISD task and generalizing on the FR task. This also indicates that the SNN penalized by HMS is adequate to learn sufficient and critical information with fewer network layers. The training expenses are reduced as a result.
The detection accuracy of the model (resnet18) trained on HMS with different hyperparameters is shown in Figure 6. The accuracy fluctuation caused by scale factor s is higher than the margin   d 1   and   d 2 .

3.4. Data-Driven Method Experiments

The performance of detection models based on the typical SNN, triplet network, and fine-tuning methods is shown in Table 5.
The triplet network achieved 92.27% accuracy with an F1 score of 0.9282 on the TMISD task, which is a significant improvement over a typical SNN. However, the performance gap came with greatly increased training costs in terms of memory and time. The performance of the fine-tuning method on the TMISD task was not satisfactory considering the training cost of the transferred knowledge. But when a model for a similar task is readily available, the fine-tuning method makes for a good choice with minimal training cost and fine performance.
For the detection models based on the triplet method and fine-tuning method, the performance changes rapidly with the depths of the network. These data-driven methods obtain a large gain in performance only under the condition that an adaptive depth CNN is employed for the task.

3.5. Discussion

Metric learning is an appropriate research idea for the TMISD task since the requirement for annotation data during the training is reduced. With the same numbers of similar TM pairs, the triplet network and fine-tuning data-driven methods can improve performance greatly compared to a simple SNN model. The triplet model enhances the discrimination ability with an additional input during training. The fine-tuning method transfers the learned information from other tasks and alleviates the pressure of data annotation.
The advantage of the detection models based on the data-driven methods is not prominent compared to the typical loss-driven models since the training is complicated and expensive. The performance gaps between the SphereFace model and the other two models, CosFace and ArcFace, are huge, but the performance cannot be sustained when SNN depth is increased or a new type of image is input for detection. The SphereFace model can be damaged by the diversity of TM images.
The HMS model outperformed other methods in the following aspects: (1) the compactness of similar TM pairs is tightened obviously; (2) the discrimination ability for another type of image, i.e., face data, is improved, which indicates the model trained on HMS is robust; (3) the training cost is reduced as a result of the requirements of annotated data and deep SNN depth being loosened.
In general, the introduction of the loss-driven model training idea is meaningful to the TMISD task. The challenges of training data-building and generalization on new data are dealt with in a low-cost way.

4. Conclusions

The detection of TM image similarity (TMISD) is an essential procedure for court judgments on TM infringement cases and TM legal registration, while the training data-building of similar TM pairs and model generalization on fast-growing numbers of new TM designs are huge challenges for the task. To address these issues, similarity detection models based on loss-driven metric learning methods were researched. Compared to data-driven methods, including the triplet network model and fine-tuning method, the optimization of the softmax loss function had a larger gain in performance, with less data preparation and training cost.
A hybrid-margin softmax (HMS) is proposed based on the peculiarity of TM images. Additive margins are attached to the cosine and angular term of softmax in the angular space to tolerate the slight differences between the similar parts of similar TM image pairs. The weights of the classifier and extracted feature vectors in the softmax are not normalized, aiming to best preserve the information of input images.
The detection model trained on HMS is further penalized to improve the discrimination ability of TM images. The model can be trained on small numbers of TM training data. Experiments indicate that the model trained on HMS achieves the best performance on the TMISD task with up to 98.97% accuracy and an F1 score of 0.9746, compared to other transformations of softmax. The model can also achieve high performance with fewer SNN layers. Furthermore, the HMS-driven model trained completely on TM image data generalized well on the FR task, with up to 90.57% accuracy and an F1 score of 0.9002.

Author Contributions

Methodology, C.W.; software, C.W.; validation, C.W. and H.S.; data curation, H.S.; writing—original draft preparation, C.W.; writing—review and editing, G.Z. and H.S.; visualization, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Natural Science Foundation of China (Grant: 62173222).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The TM dataset used in this research can be obtained from the following link: https://pan.baidu.com/s/11gIv9yj327xKCyq4v5TyeQ?pwd=tmid, accessed on 26 March 2024. If the link fails, you can contact the corresponding author for a new link.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Duch-Brown, N.; Martens, B.; Mueller-Langer, F. The Economics of Ownership, Access and Trade in Digital Data. SSRN J. 2017. [Google Scholar] [CrossRef]
  2. Johnson, S. Trademark Territoriality in Cyberspace: An Internet Framework for Common-Law Trademarks. Berkeley Technol. Law J. 2014, 29, 1253–1300. [Google Scholar]
  3. Simon, D.A. The Confusion Trap: Rethinking Parody in Trademark Law. Wash. Law Rev. 2013, 88, 1021. [Google Scholar]
  4. Besen, S.M.; Raskind, L.J. An Introduction to the Law and Economics of Intellectual Property. J. Econ. Perspect. 1991, 5, 3–27. [Google Scholar] [CrossRef]
  5. Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  6. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
  7. Sabry, E.S.; Elagooz, S.S.; El-Samie, F.E.A.; El-Bahnasawy, N.A.; El-Banby, G.M.; Ramadan, R.A. Evaluation of Feature Extraction Methods for Different Types of Images. J. Opt. 2023, 52, 716–741. [Google Scholar] [CrossRef]
  8. Li, S.; Jin, J.; Li, D.; Wang, P. Research on Transductive Few-Shot Image Classification Methods Based on Metric Learning. In Proceedings of the 2023 7th International Conference on Communication and Information Systems (ICCIS), Virtual, 15–17 October 2023; pp. 146–150. [Google Scholar]
  9. Bromley, J.; Bentz, J.W.; Bottou, L.; Guyon, I.; Lecun, Y.; Moore, C.; Säckinger, E.; Shah, R. Signature verification using a “siamese” time delay neural network. Int. J. Patt. Recogn. Artif. Intell. 1993, 7, 669–688. [Google Scholar] [CrossRef]
  10. Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese Neural Networks for One-Shot Image Recognition. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37. [Google Scholar]
  11. Melekhov, I.; Kannala, J.; Rahtu, E. Siamese Network Features for Image Matching. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 378–383. [Google Scholar]
  12. Nandy, A.; Haldar, S.; Banerjee, S.; Mitra, S. A Survey on Applications of Siamese Neural Networks in Computer Vision. In Proceedings of the 2020 International Conference for Emerging Technology (INCET), Belgaum, India, 5–7 June 2020; pp. 1–5. [Google Scholar]
  13. Hoffer, E.; Ailon, N. Deep Metric Learning Using Triplet Network. In Similarity-Based Pattern Recognition, Proceedings of the Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, 12–14 October 2015; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  14. Boutros, F.; Damer, N.; Kirchbuchner, F.; Kuijper, A. ElasticFace: Elastic Margin Loss for Deep Face Recognition. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 1577–1586. [Google Scholar]
  15. Choi, J.; Kim, Y.; Lee, Y. Robust Face Recognition Based on an Angle-Aware Loss and Masked Autoencoder Pre-Training. In Proceedings of the ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 3210–3214. [Google Scholar]
  16. Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. SphereFace: Deep Hypersphere Embedding for Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  17. Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J.; Li, Z.; Liu, W. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  18. Wang, F.; Cheng, J.; Liu, W.; Liu, H. Additive Margin Softmax for Face Verification. IEEE Signal Process. Lett. 2018, 25, 926–930. [Google Scholar] [CrossRef]
  19. Deng, J.; Guo, J.; Yang, J.; Xue, N.; Kotsia, I.; Zafeiriou, S. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5962–5979. [Google Scholar] [CrossRef] [PubMed]
  20. Setchi, R.; Anuar, F.M. Multi-Faceted Assessment of Trademark Similarity. Expert Syst. Appl. 2016, 65, 16–27. [Google Scholar] [CrossRef]
  21. Trappey, C.V.; Trappey, A.J.C.; Lin, S.C.-C. Intelligent Trademark Similarity Analysis of Image, Spelling, and Phonetic Features Using Machine Learning Methodologies. Adv. Eng. Inform. 2020, 45, 101120. [Google Scholar] [CrossRef]
  22. Alshowaish, H.; Al-Ohali, Y.; Al-Nafjan, A. Trademark Image Similarity Detection Using Convolutional Neural Network. Appl. Sci. 2022, 12, 1752. [Google Scholar] [CrossRef]
  23. Huang, G.B.; Ramesh, M.; Berg, T.; Learned-Miller, E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments; University of Massachusetts: Amherst, MA, USA, 2007. [Google Scholar]
  24. Huang, G.B.; Learned-Miller, E. Labeled Faces in the Wild: Updates and New Reporting Procedures; University of Massachusetts: Amherst, MA, USA, 2014. [Google Scholar]
  25. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Figure 1. Some examples of TM images: (a) Similar pairs. Similar in element shapes and general color. (b) Similar pairs. Similar in contour, some elements. (c) Dissimilar pairs. Similar in partial contour, dissimilar in other factors.
Figure 1. Some examples of TM images: (a) Similar pairs. Similar in element shapes and general color. (b) Similar pairs. Similar in contour, some elements. (c) Dissimilar pairs. Similar in partial contour, dissimilar in other factors.
Applsci 14 02865 g001
Figure 2. Interpretations of HMS. (a) Decision boundary; (b) logit normalization; (c) penalties of HMS.
Figure 2. Interpretations of HMS. (a) Decision boundary; (b) logit normalization; (c) penalties of HMS.
Applsci 14 02865 g002
Figure 3. Visualized feature spots extracted by SNN trained on different transformations of softmax function. (a) SphereFace; (b) CosFace; (c) ArcFace; (d) HMS (normalized); (e) HMS.
Figure 3. Visualized feature spots extracted by SNN trained on different transformations of softmax function. (a) SphereFace; (b) CosFace; (c) ArcFace; (d) HMS (normalized); (e) HMS.
Applsci 14 02865 g003
Figure 4. The procedure of TM image similarity detection.
Figure 4. The procedure of TM image similarity detection.
Applsci 14 02865 g004
Figure 5. The structure of the 6-layer CNN.
Figure 5. The structure of the 6-layer CNN.
Applsci 14 02865 g005
Figure 6. The accuracy of the detection model (resnet18) trained with different hyper-parameters in HMS.
Figure 6. The accuracy of the detection model (resnet18) trained with different hyper-parameters in HMS.
Applsci 14 02865 g006
Table 1. The accuracy of the TMISD task.
Table 1. The accuracy of the TMISD task.
Transformations of SoftmaxAccuracy (%)
6-LayerResNet18ResNet34ResNet50ResNet101ResNet152
SphereFace [16]96.2595.3696.39— *
CosFace [17,18]43.8151.5552.0652.5855.1558.76
ArcFace [19]46.3947.9453.0952.0656.7052.58
HMS (normalized)54.1253.6154.1257.7352.0653.09
HMS97.4597.4297.9498.3998.9798.52
Note: *—indicates that SNN is overfitted, same below.
Table 2. The F1 score of the TMISD task.
Table 2. The F1 score of the TMISD task.
Transformations of SoftmaxF1 Score
6-LayersResNet18ResNet34ResNet50ResNet101ResNet152
SphereFace [16]0.95390.95160.9574
CosFace [17,18]0.41710.47780.51310.53060.59150.5789
ArcFace [19]0.48000.42940.53810.53730.62160.5534
HMS (normalized)0.51890.57140.46710.63390.53730.5646
HMS0.95160.97350.97980.97490.97460.9897
Table 3. The accuracy of the FR task.
Table 3. The accuracy of the FR task.
Transformations of SoftmaxAccuracy (%)
6-LayersResNet18ResNet34ResNet50ResNet101ResNet152
SphereFace [16]
CosFace [17,18]45.6347.2546.9747.3847.8246.38
ArcFace [19]46.0548.2349.0748.6346.8746.90
HMS (normalized)50.1350.5049.0650.5548.3051.18
HMS80.4590.5782.4582.3084.1782.90
Table 4. The F1 score of the FR task.
Table 4. The F1 score of the FR task.
Transformations of SoftmaxF1 Score
6-LayersResNet18ResNet34ResNet50ResNet101ResNet152
SphereFace [16]
CosFace [17,18]0.39320.44680.43480.44110.52290.4389
ArcFace [19]0.45640.44610.57890.54660.51730.4927
HMS (normalized)0.38280.53800.37940.55240.38720.5978
HMS0.81730.90020.84640.84490.85170.8483
Table 5. The performance of data-driven methods on the TMISD task.
Table 5. The performance of data-driven methods on the TMISD task.
SNN (Contrastive Loss) MethodTriplet Network MethodFine-Tuning Method
Accuracy (%)F1Accuracy (%)F1Accuracy (%)F1
ResNet1841.530.423785.050.844953.610.5588
ResNet3446.730.453592.270.928270.650.7149
ResNet5047.180.465458.250.689756.190.5685
ResNet10146.530.454954.130.664247.420.5049
ResNet15246.840.443159.280.680246.390.4851
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, C.; Zheng, G.; Shan, H. Hybrid-Margin Softmax for the Detection of Trademark Image Similarity. Appl. Sci. 2024, 14, 2865. https://doi.org/10.3390/app14072865

AMA Style

Wang C, Zheng G, Shan H. Hybrid-Margin Softmax for the Detection of Trademark Image Similarity. Applied Sciences. 2024; 14(7):2865. https://doi.org/10.3390/app14072865

Chicago/Turabian Style

Wang, Chenyang, Guangyuan Zheng, and Hongtao Shan. 2024. "Hybrid-Margin Softmax for the Detection of Trademark Image Similarity" Applied Sciences 14, no. 7: 2865. https://doi.org/10.3390/app14072865

APA Style

Wang, C., Zheng, G., & Shan, H. (2024). Hybrid-Margin Softmax for the Detection of Trademark Image Similarity. Applied Sciences, 14(7), 2865. https://doi.org/10.3390/app14072865

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop