Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images

Paradise Vit, Abigail; Aronson, Yarden; Fraidenberg, Raz; Puzis, Rami

doi:10.3390/app14177915

Open AccessArticle

Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images

by

Abigail Paradise Vit

^1,*

,

Yarden Aronson

¹,

Raz Fraidenberg

¹ and

Rami Puzis

²

¹

Information System, Max Stern Yezreel Valley College, Emek Yezreel 1930600, Israel

²

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Be’er Sheva 8410501, Israel

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7915; https://doi.org/10.3390/app14177915

Submission received: 8 August 2024 / Revised: 2 September 2024 / Accepted: 3 September 2024 / Published: 5 September 2024

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

Online social networks (OSNs) are fertile ground for information sharing and public relationships. However, the uncontrolled dissemination of information poses a significant risk of the inadvertent disclosure of sensitive information. This poses a notable challenge to the information security of many organizations. Improving organizations’ ability to automatically identify data leaked within image-based content requires specialized techniques. In contrast to traditional vision-based tasks, detecting data leaked within images presents a unique challenge due to the context-dependent nature and sparsity of the target objects, as well as the possibility that these objects may appear in an image inadvertently as background or small elements rather than as the central focus of the image. In this paper, we investigated the ability of multiple state-of-the-art deep learning methods to detect censored objects in an image. We conducted a case study utilizing Instagram images published by members of a large organization. Six types of objects that were not intended for public exposure were detected with an average accuracy of 0.9454 and an average macro F1-score of 0.658. A further analysis of relevant OSN images revealed that many contained confidential information, exposing the organization and its members to security risks.

Keywords:

censorship; data leakage; deep learning; image; information security; online social media

1. Introduction

Organizations are increasingly utilizing online social networks (OSNs) as part of their communication and marketing strategies [1,2]. Organizations create official public spaces, such as pages or groups, on OSN platforms. Members participate in these spaces and engage with the content. They may share posts, images, and location data on their personal accounts and add the organization’s name or hashtag to their posts. When someone searches for the organization’s name or hashtag on an OSN, sensitive information may be leaked and end up in unauthorized hands [3,4,5,6].

An intelligent attacker can exploit the information published via OSNs to collect information about a target organization to conduct industrial espionage or launch a targeted attack [7,8]. They can use this information to provide a comprehensive view of the organization’s membership, groups, posts, and locations. They can reconstruct the organization’s structure, identify its leadership and technologies, obtain contact information not available on its official website, and locate specialized branch offices [7,9,10]. Posting an image of an organization member in an office, for example, can reveal information about the premises, layout, or infrastructure. With this information, attackers can gain access to even more sensitive information and then proceed to attack the organization [7,9,11].

Organizations around the world face the challenge of preventing data leakage, which is defined as the disclosure of information to unauthorized entities or individuals [12,13]. Data leaks can result in substantial losses, whether financial or non-financial [14]. The primary cause of data leakage, whether intentional or not [15,16], is action by an organization’s members, a phenomenon referred to as an insider threat [17].

On the one hand, members of an organization may intentionally share images online containing sensitive information. As an example (https://www.bbc.com/news/world-us-canada-68473868 (accessed on 4 March 2024)), consider the case of an Air National Guard member who pleaded guilty to posting dozens of classified documents online to a platform popular with gamers. Maps, satellite images, and intelligence on U.S. allies were included in the upload documents. On the other hand, organization members may not be aware that they have captured confidential organizational content in their shared images or may not be aware that certain objects contain confidential information. Due to a lack of awareness or context-specific knowledge, seemingly innocuous visual content shared on social media platforms can result in data leakage [18].

A few examples in which sensitive information was unintentionally revealed were presented by Kutschera [18]. In a YouTube video, a backyard scene reveals the surroundings of a private residence, and a scene where a rain radar map is shown to the camera reveals the house’s global positioning system (GPS) location. Furthermore, Twitter images demonstrated a GPS-traced route and implied physical presence. Using Google Maps’ 3D mode, an address was located for a car parked in front of a house on the Twitter profile of an Australian web security consultant. Using the address, it was also possible to gather information about alleged relatives, property size, date of purchase, and the telephone numbers of the user.

Detecting confidential information poses a unique challenge different from other image-processing tasks. The confidential information in the image may be difficult to detect, since it is not the main focus of the image. It may be small or hidden in the background. Furthermore, confidential information can be highly context-dependent and organization-specific. An object that appears innocuous in one setting may be confidential in another. In addition, identifying confidential objects is challenging as they are likely to be rare in the overall dataset. Detecting confidential information online may pose additional challenges due to the vast amount of data shared on social media. Due to the high volume of user-generated content, automated solutions are required to safeguard confidential information.

To address these issues, we propose a method for analyzing image-based data uploaded to OSNs by members of organizations. In this method, objects that meet the specific criteria established by the organization as potentially containing confidential information are automatically identified and classified as censored. Several state-of-the-art deep learning methods are investigated for their ability to detect censored objects in images. We conducted a case study using Instagram images published by members of a large organization. The organization defined six types of objects not intended for public exposure: members, pins, property, hats, screens, and maps. Objects were detected with an average accuracy of 0.9454 and an average macro F1-score of 0.658.

The main contributions of this work are as follows:

We propose a novel method that utilizes deep learning for detecting censored objects; this method improves organizations’ ability to automatically inspect and secure their data against unintended disclosures.
We evaluate several techniques, investigating the ability of state-of-the-art deep learning methods to detect censored objects in images.
We provide real-world validation, demonstrating the practical effectiveness of our method in detecting censored objects in real-world organizational data collected from Instagram. Images published by organization members contained censored objects, exposing the organization and its personnel to potential security risks. This highlights the need for robust methods to detect and mitigate data leakage incidents in image-based content shared on OSNs.

2. Related Work

Numerous studies have examined the detection of objects in several contexts, including the detection of out-of-context objects, concealed objects, and sensitive texts and objects, as discussed below.

2.1. Out-of-Context Object Detection

Detecting out-of-context objects involves identifying objects in images that do not fit the contextual relations within the scene [19,20]. Out-of-context objects can be detected using graph-based models that capture relationships between objects [19]. As an example, Acharya et al. [21] made use of a graph contextual reasoning network to explicitly capture contextual cues from neighboring objects in order to detect out-of-context objects. The proposed graph convolutional recurrent network (GCRN) framework consists of two components: learning object representations and components for capturing context for object detection.

Several scene contextual constraints were proposed in [22] in order to improve detection performance. As a postprocessing step, scene contextual constraints can be applied to many state-of-the-art detectors, including faster region-based convolutional neural networks (Faster R-CNN) and You Only Look Once (YOLO). They demonstrated that their model led to an improvement in detection performance over state-of-the-art models.

An additional form of out-of-context detection involves pairing misleading or false captions with images, creating out-of-context multimodal misinformation. One approach to tackling this issue was presented in [23]: a self-supervised method that uses image–text matching to detect out-of-context image and text pairs. The model uses an input image and two captions from different sources. By comparing the captions to visual elements in the image, the model predicts whether the image was used out of context, achieving a detection accuracy of 85%. Another study by Zhang et al. [24] proposed a method to detect out-of-context misinformation using a combination of neural and symbolic techniques. Using the abstract meaning representation (AMR) of the caption, the method symbolically decodes text modality information into fact queries. The relationships between visual and textual elements are then mapped to a symbolic graph. Compared with various baselines, the model detected out-of-context multimodal misinformation more accurately.

Our study identifies confidential objects in images. This problem is different from identifying objects out of context because confidential objects depicted in an image can be found in their natural environment and are not necessarily out of context. As an example, an image may depict a workplace setting with confidential documents, which may be normal for the setting but still constitutes a privacy risk if leaked.

2.2. Concealed Object Detection

Another problem involves the detection of concealed objects which are perfectly embedded in their backgrounds [25]. The detection of concealed objects is common in security inspections, which are employed to identify dangerous items hidden beneath clothing [25]. Studies have focused on passive terahertz imaging [26], a technology that uses terahertz (THz) radiation to capture images for the detection of concealed objects.

Trofimov et al. [27] presented a real-time algorithm for the computer processing of passive THz images aimed at detecting concealed objects. A correlation function is used between the THz image and a standard image. It triggers an alarm if the correlation function value exceeds a certain level. As part of this scope, Kowalski [28] compared two deep learning-based processing methods (YOLOv3 and R-FCN) for the detection and recognition of concealed objects. R-FCN outperformed YOLOv3. A multi-scale filter and geometric transformation matrix (MSFG) augmentation for passive terahertz (THz) images is proposed in [29]. MSFG augmentation significantly improves detection accuracy in passive THz images. YOLOv5I-MSFG shows the best performance, with an accuracy rate of 93.49% for one-class detection and 90.76% for two-class detection.

Liu et al. [30] introduced an advanced clustering algorithm that enhances concealed object detection by achieving a 90.38% recall rate and 94% precision. Finally, Cheng et al. [31] proposed replacing the SSD backbone network with a more representative residual network to reduce network training difficulty. The SSD algorithm improved its accuracy from 95.04% to 99.92% in experiments and outperformed other algorithms such as Faster R-CNN and YOLO.

Passive millimeter wave technology is a non-invasive imaging technique that detects concealed objects by capturing millimeter wave radiation emitted by objects and their surroundings [32]. In several studies, this type of image is analyzed using deep learning methods such as YOLOv3 [32] and a one-stage anchor-free detector combining a convolutional neural network (CNN) and transformers [33]. The active millimeter wave imaging technique is also used for the detection of concealed objects and involves the use of an external source that emits millimeter waves in the direction of an object. The use of fast wavelet transforms (FWTs) for segmenting images has an accurate detection rate of 80.92%, with a false alarm rate of 11.78% [34]. For data acquisition, another method uses multiple-input, multiple-output radar antennas, and synthetic aperture radar techniques, as well as an analytical Fourier transform algorithm for image reconstruction. GhostNet_SEResNet56 (lightweight convolutional neural networks) achieved a prediction accuracy of 98.18% [35].

Additionally, Wang et al. [36] presented a normalized accumulation map-based training mechanism. Using binary masks, this method identifies commonly appearing concealed objects and assigns different weights to various locations. With this training mechanism, YOLOv2 shows a 4.43% improvement in mean average precision. An additional innovation in this field can be found in [37], where indirect microwave holography was used in transmission mode to detect and identify concealed objects by analyzing the shapes and sizes of metallic sheets. The self-paced feature attention fusion network (SPFAFN) [38] integrates multi-scale features and uses a hierarchical pyramid attention mechanism. This method combines channel and spatial attention and employs self-paced learning. On real-world datasets, SPFAFN outperforms existing methods.

Sun et al. [39] proposed a multi-source aggregation transformer (MATR) that encodes local relationships within each image using a self-attention module, while a cross-attention module models interactions across multiple images from different viewpoints. Asok et al. [40] presented a novel time-domain beamforming algorithm for ultra-wideband microwave imaging using Vivaldi antennas. The algorithm successfully reconstructs concealed objects attached to a human body. Asok et al. [41] also used a synthetic aperture radar (SAR) beamforming algorithm which successfully removed unwanted clutter from scanning data. Simulations and real-world scenarios demonstrate the effectiveness of single-time scanning.

The concealed object problem refers to the intentional embedding of objects within an image by deliberately concealing or camouflaging them. In our case, objects are not intentionally concealed; they are overtly present and visible, appearing in the image inadvertently.

2.3. Sensitive Text Detection

A number of studies have explored the problem of text leakage, particularly in relation to OSNs and organizations. In [42], a privacy-aware framework was proposed for text data in OSNs. The framework is capable of automatically detecting privacy information shared by users in OSNs and accurately locating which parts of the text leaks sensitive information by using the roformerBERT model, the BI_LSTM model, and the global_pointer algorithm to construct a direct privacy entity recognition (DPER) model. According to experimental results, the model is capable of achieving an overall accuracy of 98.3%.

Ahmed et al. [43] applied deep learning to identify unstructured context-dependent sensitive information in organizations. The researchers focused on both text and image data but only extracted the text from the images. Using publicly available tweet and image datasets, they explored alternative models, including shallow machine learning classifiers and a rule-based model, and demonstrated that the proposed approach of deep learning outperformed the alternatives. In contrast to the studies mentioned above, which involved text analysis, we are advancing the analysis of confidential content in images.

2.4. Sensitive and Harmful Object Detection

Several studies have investigated the automatic identification of sensitive personal information, such as an ID number or credit card number, in photos. Melad [44] used an object detection model to detect IDs in an image and compared it with other object detection models (SSD, R-FCN) by measuring the mAP. With a mAP of 70.7%, the Faster R-CNN with the ResNet101 model achieved the highest accuracy. Soni and Hiran [45] utilized YOLOv3 to detect personally identifiable information. The purpose of these studies was primarily to analyze text extracted from images. Conversely, our research focuses specifically on censored objects within the image itself, which represents a distinct challenge.

Several studies have focused on the identification of harmful or sensitive categories of images. Yu et al. [46] presented iPrivacy (version of 22 February 2017), a tool for automatically recommending privacy settings for image sharing based on detecting privacy-sensitive objects. They used a deep multitask learning method to jointly learn representative deep CNNs and discriminative tree classifiers for the accurate detection of privacy-sensitive objects.

There have been several studies examining the identification of harmful objects, such as weapons, pornography, and violence. Harvey et al. [47] presented an object detection pipeline for detecting guns in images based on Twitter data. Their model was based on the visual representations in vision-language model (VinVL). The model achieved a mean precision of 72.3 on the Twitter Gun dataset, compared with a YOLOv5’s performance of 52.6. Mao et al. [48] proposed a multiclassification sensitive-image detection method based on lightweight CNN for detecting photos associated with pornography, politics, and violence. Based on an EfficientNet model, this method combined the Ghost Module idea with a channel attention mechanism for feature extraction training. According to the experimental results on the sensitive-image dataset, the proposed method detected sensitive data with 94.46% accuracy.

Finally, in [49], harmful objects were detected using six categories: alcohol, insulting gestures, blood, cigarettes, guns, and knives. The researchers used two baseline object detection architectures: YOLOv5 and Faster R-CNN. It was found that YOLOv5 had the best performance.

Unlike previous studies on object detection for sensitive or harmful content, our study examines the specific domain of organizational data leakage using images shared on OSNs, which presents its own set of challenges. With the use of real-world data and deep learning models, we aim to address a critical security concern for organizations in the era of social media and demonstrate the effectiveness of the proposed method in identifying censored objects specific to organizations, highlighting the security risks posed by their leakage and the need for automated solutions to detect and mitigate them.

In Table 1, the main studies related to our study are compared. The rows correspond to the studies, while the last row represents our method. Each research work is compared according to several metrics. These metrics include the use of image-based analysis techniques, the use of social media data, and the application of deep learning techniques. In addition, we check for the implementation of image captioning using deep learning methods and the execution of object detection using deep learning methods. Furthermore, we examine the type of detection: sensitive text information, sensitive or censored objects, out-of-context objects, concealed objects, and harmful objects.

It can be seen in the table that we are the first to evaluate several deep learning technologies, such as an object detection technique, image classification, and image captioning technique, for the purpose of detecting censored objects in social media data. It should be noted that despite the similarity between the study of Yu et al. [46] and our own, Yu et al. focused on privacy-sensitive objects that could compromise the privacy of an individual. A privacy-sensitive object may be a person, a location, or a text, or it may be a user-dependent object such as a home shrine.

Our research focused on detecting objects that could pose a privacy risk to an organization. It is important to recognize that these objects are non-trivial and context-dependent, depending on the specific needs of the organization. As an example, identifying a member of the organization involves more than just recognizing the person; it requires identifying individuals wearing the specific clothing associated with the organization. Therefore, by detecting censored objects with context-dependent cues and attributes relevant to a specific organization, we can provide a more robust solution to organizational data leakage.

3. Materials and Methods

3.1. The Method

Our method is based on four steps: (1) defining confidential information in images; (2) selecting the organization’s representative hashtags; (3) collecting data from OSN images; and (4) training deep learning classifiers. These steps are presented in Figure 1 and described in the subsections that follow. As was presented in the related work section, to the best of our knowledge, this study is the first to evaluate multiple deep learning technologies, including object detection, image classification, and image captioning techniques, for the purpose of detecting censored objects in social media data.

3.1.1. Defining Confidential Information

As part of this step, the organization decides what information it does not wish to disclose publicly via OSNs in accordance with its security protocols. Organizations may maintain confidential information regarding intellectual property, physical assets, and financial information. Further, confidential data may include R&D data, employee records, trade secrets, policies, business plans and strategies, and innovative technologies [50,51]. The organization needs to define which objects it does not wish to publish online. In some cases, a combination of several objects that appear in the same image can result in the disclosure of classified information. For example, in an organization where certain positions are considered confidential and the organization wishes to prevent the disclosure of the people holding those positions, it may be possible to determine a member’s role in an organization based on the combination of two objects in a single image, such as the member’s clothes and a pin. As a result, the organizational structure can be exposed.

In many cases, members of the organization are unaware that the confidential objects have been captured in an image uploaded online; the objects have been captured inadvertently. This can include, for example, a member of the organization taking a selfie without noticing a map or screen behind him that contains confidential information about the organization. In other cases, members of the organization may be unaware that certain objects are confidential or proprietary to their organization. A lack of awareness and context-specific knowledge can result in data leakage when seemingly innocuous visual content is shared on OSNs.

Figure 2a,b illustrate examples of censored objects. In Figure 2a, a woman takes a selfie without noticing a screen behind her. This could inadvertently reveal confidential information about the organization. The screen in Figure 2b displays a map that may contain sensitive information about the organization.

3.1.2. Selecting the Organization’s Representative Hashtags

Finding the representative pages and hashtags of the organization on OSNs is essential. It is necessary to enter the name of the organization in the search window of the OSN in order to determine which pages and hashtags are associated with the organization. Hashtags can be used on any OSN but are most commonly used on Instagram and X (Twitter). Social media content is tagged and categorized with hashtags. Instagram users, in particular, often use the hashtag symbol to identify specific locations in their images. Typically, when the location corresponds to their workplace or organization, members associate the hashtag symbol with their specific organization.

3.1.3. Collecting Data from OSN Images

Searching for the organization’s representative hashtags enabled us to retrieve public images associated with the organization that had been uploaded to OSNs. By using APIs for downloading content from OSNs, the information was gathered based on publicly available OSN content.

3.1.4. Training Deep Learning Classifiers

Using the data collected in the previous step, a deep learning classifier was created for the confidential objects defined by the organization. In order to determine the most effective method for identifying censored objects, it was necessary to perform a comprehensive comparison. Several tasks in computer vision helped us find objects relevant to the organization:

Image classification: In computer vision, image classification refers to the process of categorizing an image into predefined classes or categories. Deep learning techniques such as CNNs are common approaches to image classification that have demonstrated remarkable performance in various image recognition tasks [52]. Transfer learning with CNN architectures and pre-trained models can be used for the classification of images. Transfer learning starts with a pre-trained model on large datasets such as ImageNet and fine-tunes its weights using a new dataset. The model learns relevant patterns and relationships relevant to the new specific task [53].

Several CNN architectures have been proposed, including VGG19 [54], DenseNet121 [55], EfficientNetV2S [56], ResNet50 [57], and InceptionV3 [58]. EfficientNetV2S, for example, a variant of EfficientNetV2, was proposed by Google researchers in 2021 [56]. With fewer parameters, EfficientNetV2S maintains high accuracy in various computer vision tasks. Compound scaling, stochastic depth, and inverse square root scaling are used in this architecture. The ResNet50 CNN architecture was introduced by Microsoft Research in 2015 [57]. ResNet50 is a variant of the ResNet architecture which addresses the vanishing gradient problem. There are 50 layers in ResNet50 architecture, including convolutional, pooling, and fully connected layers. Skip connections are also available, which bypass one or more layers and connect input to the next layer directly. ResNet152 is a residual network with a depth of 152. As a result of the increased depth, ResNet152 is able to learn more complex features and representations. Using labeled images representing each of the selected objects, pre-trained CNN models can be further trained and fine-tuned.

2.: Object detection is designed to identify objects of a particular class of interest with precise localization in an image [59,60]. A popular real-time object detection algorithm is YOLO. It was first introduced by Joseph Redmon et al. [61] in 2016 and has undergone several versions since. YOLO is a fast algorithm that processes images in a single pass. The image is divided into grids and bounding boxes, and class probabilities are directly predicted with the use of a single CNN simultaneously. YOLO can be used to detect the different objects selected by the organization.
3.: Image captioning is the process of generating captions containing textual descriptions of images. A combination of deep learning and encoder–decoder architecture is often used for image-to-text generation [62]. A popular method for converting images to text is the bootstrapping language-image pre-training (BLIP) method [63], a new vision-language pre-training (VLP) framework that is flexibly applied to both vision-language understanding and generation tasks using a multimodal combination of encoders and decoders. This method achieves state-of-the-art results in a wide range of vision-language tasks such as retrieval of image text and image captioning.

3.2. Case Study

The proposed method was demonstrated through a case study conducted with the assistance of a real organization. For security reasons, the organization will remain anonymous. During the case study, we followed the steps indicated in Figure 1 in Section 3, as discussed in the following subsections.

3.2.1. Defining Confidential Information

As a first step, the organization provided us with the following objects that it did not wish to make publicly available through OSNs; as the organization chose to remain anonymous, these will also be described generally:

Members—identification of an image with organization members based on the organization’s clothes can lead to identification of individuals affiliated with the organization;
Pin—images with a pin representing the organization, which can identify the specific department;
Property—images of specific assets of the organization may provide insight into its resources and capabilities;
Screen—images containing computers, TV, and tablet screens, etc., can disclose confidential information;
Map—images of maps revealing geographical information relevant to the organization, including sensitive information such as locations;
Hat—images with organization hats may reveal affiliation and contribute to member identification.

The combination of some of these objects in the same image may pose a greater risk than each object alone. As an example, an image taken of a member with a specific pin, along with a certain asset, could result in the unintentional disclosure of significant information.

3.2.2. Selecting the Organization’s Representative Hashtags

In step 2, we searched on Instagram for the organization’s name and found two main hashtags related to it. The next step was to download the posts from the two hashtags.

3.2.3. Collecting Data from OSN Images

In order to collect the Instagram images, we used the RapidAPI website (https://rapidapi.com/ (accessed on 1 February 2024)) to connect with Instagram API. RapidAPI hosts thousands of public APIs. An Instagram post can include an image, a series of images, or a short video. We downloaded Instagram posts in order to collect images. For every post, a single image was saved; if the post included more than one, only the first was saved. Using the API, we collected all the images that were available via Instagram posts from 7 December 2011 to 15 February 2023. A total of 252,628 Instagram posts were downloaded. Instagram posts include the publication date, number of likes, text, and image. There were 13,225 posts from one hashtag and 239,403 from another.

Posts collected per year are shown in Figure 3. Most posts were collected between 2018 and 2022. Increasing access to OSNs and the continuous growth of data available on the Internet can explain why the number of posts increased. In addition, the decrease in posts in 2023 was a result of the limited time available for data collection at the beginning of the year (six weeks).

3.2.4. Training Deep Learning Classifiers

Using CNN architecture for image classification

For each object defined by the organization, public images were collected through a Google search. The goal was to create training data for each object. We classified the images of each object into two categories (exists/does not exist). Figure 4 presents the number of training images collected for each object.

It is important to note that in this study, publicly available images were used for training purposes without obtaining the explicit consent of the image owners.

The classification models were trained and evaluated using fivefold cross-validation for each object. The CNN architectures evaluated were ResNet50, ResNet152, VGG16, VGG19, InceptionV3, EfficientNetV2B2, EfficientNetV2L, EfficientNetV2S, EfficientNetV2M, and EfficientNetB0. We ran the CNN architectures with the default settings for each model (dropout = 0.2, number of neurons in middle layer = 1024, activation function = relu, activation function in output layer = sigmoid, optimizer= adam, learning rate = 0.001). A single Intel Core i7-10510U (16GB RAM) was used in the experiments.

As part of the preprocessing, each image was resized to 224 × 224 × 3. It was necessary to do this to construct the transfer learning neural network, as it ensured compatibility with the input requirements of the pre-trained model. To fine-tune the pre-trained model, the convolutional layer was frozen. A fully connected layer was created, with an output layer consisting of two neurons (one for each class). Training the model involved 100 iterations. However, to prevent overfitting, we stopped the process when performance on a validation dataset declined. Our model was evaluated using validation loss as a stopping criterion. Losses were calculated using binary cross-entropy. For each of the six objects, we selected the model with the highest F-score across the five folds, resulting in six optimal models. A test set was then used to test each on new data.

2.: Using YOLOv5 for object detection

We tagged each object separately for the YOLOv5 models. Tagging was performed using the Roboflow site (https://roboflow.com/ (accessed on 1 February 2024)), used to build computer vision models. A feature for tagging images is provided by Roboflow. The resolution of each image was changed to 640 × 640 × 3 during preprocessing. Google Colab was used to train the model with the following parameters: epochs = 100, batch = 16, weights = yolov5s (initial weights). A YOLOv5 model was created for each object, and the training data were divided into 80% for training and 20% for validation.

Similarly to the training data for the CNN models, the training data for the YOLOv5 were created from public images retrieved from Google. Figure 5 shows the number of images and total number of tags for each object used to train the YOLOv5 model. Considering that an image can contain multiple instances of the same object, there are more tags than images. In order to test the YOLOv5 models, we used the training data from the CNN models presented in Figure 4.

3.: Using the BLIP model for image captioning

The BLIP model is an image captioning model pre-trained on the Common Objects in Context (COCO) dataset based on Vision Transformers (ViT) architecture. The HuggingFace repository includes the BLIP model under the name “Salesforce/blip-image-captioning-base”. To fine-tune the BLIP model for custom image captioning, we used the same training data as for the CNN models. There were 922 unique images in the training dataset. Each was accompanied by a corresponding caption. In the caption, we highlighted the objects that the organization did not wish to make public via OSNs. For example, an image of a table with a screen and map was captioned Screen and Map.

The fine-tuning process was conducted in Google Colab with A100 GPU. The model parameters were set to batch size = 2, learning rate = 5 × 10⁻⁵, and epochs = 10. The dataset was divided into five folds, and mean metrics (accuracy, recall, precision, and F-score) were calculated for each object separately, before and after the finetuning, to assess model performance.

3.3. Data Collection for Testing

For each object, we selected the model with the highest F-score. All the images in the organization’s Instagram dataset received classification for each object using the selected best models. Due to the large number of images that we collected, it was not possible to verify the accuracy of all images manually.

Therefore, we randomly selected 0.8% of the organization’s Instagram dataset, resulting in 2021 images. After reviewing these images, we found and removed duplicates. Duplicate images can occur, for example, when several organization members publish the same image. This left us with 2007 unique images, which constituted our test set. Each object in these images was manually classified as “exists” or “does not exist”, and then, the model predictions were assessed using accuracy, recall, precision, and F-scores.

As an additional measure, we present the weighted F-score, which is useful when dealing with class imbalances, since it accounts for the frequency of each instance in the dataset, unlike the F-score, which was computed only for one class (object exists).

For each object, Figure 6 shows the number of images classified manually as existing or not existing in the testing set. As can be seen from the figure, the objects included under map, hat, and pin were relatively less common.

Figure 7 summarizes the main methodology employed in this study. For training purposes, we downloaded public images from Google Search. Using fivefold cross-validation, we trained 10 different CNN models and the BLIP model. The YOLO model was evaluated using 20% of the data as a validation set. We selected the best-performing model based on the F-score. Once we had our best model for each object, we applied it to predict all the images in our organization’s Instagram dataset. From this dataset, we randomly selected a subset of images for manual review.

4. Results and Discussion

This section provides a description of the experimental results, their interpretation, and the experimental conclusions. Please refer to Figure 7 for the methodology workflow for training, evaluation, and testing. In the CNN architectures, the classification models were trained and evaluated using fivefold cross-validation for each object. The distribution of data used to train the objects is shown in Figure 4.

The CNN architectures evaluated were ResNet50, ResNet152, VGG16, VGG19, InceptionV3, EfficientNetV2B2, EfficientNetV2L, EfficientNetV2S, EfficientNetV2M, and EfficientNetB0. For each object, Table 2 shows the best model based on the highest weighted F-score among all the architectures. Detailed results from the 10 CNN models on the validation set are presented in Appendix A in Table A1.

Figure 8 illustrates the accuracy of BLIP, YOLO, and CNN (the top CNN models are detailed in Table 1), while their F-scores are depicted in Figure 9. Our findings align with those in the literature, indicating that EfficientNet and its variants, along with ResNet, outperform VGG16 and VGG19 on the ImageNet validation dataset [56,64,65].

The YOLOv5 and the BLIP model were tested on validation data, as presented in Figure 4 and Figure 5. A comparison of the accuracy of the three models can be seen in Figure 8. For most of the objects—property, pin, hat, map, and screen—the CNN model is more accurate than the other two. In the case of the member objects, the BLIP model showed better performance. In Figure 9, there is a greater difference between the F-scores of the models, with the YOLOv5 exhibiting the lowest performance.

According to the validation data, CNN models performed best for most objects. Therefore, we used it to classify all 252,628 images in the organization’s Instagram dataset. Figure 10 indicates the number of occurrences of each object in our dataset. Most images did not contain the censored objects the organization had selected. Most of the objects found in the images were screens, members, and properties.

From the 252,628 images classified by each of our models, 2007 random images were selected and manually assessed for the presence of each object. This was carried out to determine whether the models were successful at predicting new real-world data. The number of images classified manually as existing or not existing in the testing set is shown Figure 6. Table 3 presents the results of the best models on this test set. This table summarizes the accuracy, the F-score, and the weighted F-score received by each CNN object model. It can be observed that the measures for objects such as property and member were relatively high; however, for the test set for low-frequency objects such as hat, map, and pin, the measures were relatively low. For example, there was only one image with a pin that was not identified, resulting in a zero F-score. In addition to having a low frequency in the test set, the objects that received lower values were generally smaller. Member and property objects are larger than pins, hats, and screens. These smaller objects were not correctly identified in the images. Small objects present a greater challenge than larger objects in object-detection tasks [66,67]. The difficulty in detecting small objects can be attributed to the indistinguishable features, low resolution, complex background, and limited contextual information that they contain [66].

Across different architectures, such as multi-scale, single-scale, ResNet, feature pyramid network (FPN), dynamic convolutional network (DCN), and YOLO, smaller objects result in lower average precision [68]. Comparing YOLOv3, Faster R-CNN, and SSD on three large benchmark datasets focusing on small objects, the detection accuracy for small objects was generally low. Faster R-CNN utilizes CNN [66,69]. According to our research, CNN outperforms YOLO across all objects, which is consistent with these findings.

In Figure 11, we present two images that were previously shown in Figure 2. We used our models to classify each image independently. The classifiers correctly identified the image from Figure 2a as a screen, marking a successful detection. Nevertheless, in the second image (Figure 2b), which shows a screen displaying a map, the classifiers recognized the screen but not the map. It would appear that the classifiers have difficulty classifying more complex objects such as maps. The map is less prominent in the image; its colors are bright but not sufficiently distinctive, and it is relatively small in comparison with the rest of the image. It is likely that this combination of factors contributed to the classifiers’ inability to accurately detect the map.

In order to improve the model’s ability to detect small, low-resolution, and unclear objects, as well as to address data imbalance, it may be necessary to increase the training data to include a broader range of examples in different contexts and sizes. Data augmentation may also be beneficial, including rotations, flips, and color adjustments [70]. Moreover, creating synthetic images of rare objects using synthetic data generation techniques can help balance the dataset [71].

While most studies focus on the leakage of text in images [44,45] and harmful object detection [47,48,49], the case study we conducted demonstrates the importance of detecting censored objects in images. The case study described in this paper revealed multiple sensitive objects related to the studied organization on Instagram. These findings demonstrate the vulnerability of the organization to information leakage through images posted on social networks. Such leakage can lead to the unauthorized dissemination of sensitive information, posing a risk to the organization and personal risk to its members.

Image censoring is important because members of a targeted organization may not be aware that they have captured sensitive organizational content in their shared images or that combining several objects in the same image could result in a disclosure of classified information. There is a need to raise organizational members’ awareness of the risks associated with capturing sensitive visual information in images posted on social networks and to emphasize that even innocuous visual content can result in data leakage.

Due to the vast and growing amount of data available on social media, as illustrated in our case study in Figure 3, it is necessary to go beyond increasing awareness and providing training. Deep learning methods, like those presented in this study, should be integrated into robust, automated systems for effective information monitoring. Automated information monitoring systems allow organizations to monitor at a scale and speed required for real-time monitoring, reducing the chances of human error and enhancing overall security. By quickly identifying and managing the risks of sharing sensitive information on social media, organizations are able to maintain better control over what information is shared and ensure that confidential information remains secure.

In addition, our method is flexible to allow for customization in accordance with the needs of specific organizations. For example, models can be fine-tuned to recognize objects relevant to a specific organization or industry. As a result of this adaptability, the proposed approach is applicable to a wide range of industries.

5. Limitations and Future Work

The API utilized in this study, RapidAPI, has certain limitations. The collected data may be incomplete. Some information may not be accessible through the API due to privacy restrictions such as limited access to private posts or accounts. As a result, the available data may be limited or incomplete.

In spite of these limitations, it may be argued that they contribute to the study’s authenticity. An attacker attempting to gather information about an organization from social networks would likely encounter similar challenges. Data collection may not be perfect, or private profile information may not be accessible. As a result, this study simulates the challenges an actual attacker may encounter, increasing our findings’ practical relevance.

As part of our future research, we plan to implement additional deep learning methods to identify censored objects in images. We also plan to apply these methods to additional organizations. Furthermore, since social media posts often include textual content in addition to images, we are planning to extend our method to automatically identify confidential information in text as well. The enhancement will enable the detection and protection of both visual and textual confidential information.

6. Conclusions

In this paper, we investigated the capabilities of several state-of-the-art deep learning methods to detect censored objects in images. This method improves an organization’s ability to detect data leakage by automatically detecting confidential content in images on online social networks. This method comprises four main steps: defining confidential information in images, selecting an organization’s representative hashtags, collecting data from social media images, and training deep learning classifiers.

A case study was conducted using Instagram images published by members of a large organization. The following deep learning classifiers were employed and compared: a CNN architecture for image classification, YOLOv5 for object detection, and the BLIP model for captioning images. Six types of objects not intended for public exposure were detected with a mean accuracy of 0.9454 and a mean macro F1-score of 0.658.

After identifying the best-performing models from our evaluations, we employed the best models to classify all 252,628 images in the organization’s Instagram dataset. The results revealed that a significant number of images contained censored objects, potentially posing a security risk to the organization and its members.

The results highlight the importance of implementing robust automated systems for detecting and preventing the exposure of confidential data, particularly in the context of image-based content shared on OSNs.

Author Contributions

Conceptualization, A.P.V., Y.A., R.F. and R.P.; methodology, A.P.V., Y.A., R.F. and R.P.; software, A.P.V., Y.A. and R.F.; validation, A.P.V., Y.A., R.F. and R.P.; formal analysis, A.P.V., Y.A. and R.F.; investigation, A.P.V., Y.A. and R.F.; data curation, A.P.V., Y.A. and R.F.; writing—original draft preparation, A.P.V.; writing—review and editing, A.P.V. and R.P.; supervision, A.P.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was approved by the Institutional Review Board of Max Stern Yezreel Valley College (YVC EMEK 2024-32).

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available due to information security concerns, the organization that participated in the study wishes to remain anonymous. Requests to access the datasets should be directed to authors.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Full results of CNN models based on validation data. The red color indicates the best models for each object based on the weighted F-score.

	Model	Accuracy	Weighted Recall	Weighted Precision	Weighted F-Score
Screen
	ResNet50	0.951	0.951	0.952	0.951
	ResNet152	0.947	0.947	0.947	0.947
	VGG16	0.926	0.926	0.927	0.926
	VGG19	0.922	0.922	0.925	0.922
	InceptionV3	0.672	0.672	0.682	0.669
	EfficientNetV2B2	0.918	0.918	0.919	0.918
	EfficientNetV2L	0.943	0.943	0.945	0.943
	EfficientNetV2S	0.935	0.935	0.938	0.934
	EfficientNetV2M	0.939	0.939	0.940	0.939
	EfficientNetB0	0.938	0.938	0.940	0.938
Member
	ResNet50	0.842	0.842	0.845	0.838
	ResNet152	0.859	0.859	0.860	0.857
	VGG16	0.819	0.819	0.825	0.818
	VGG19	0.827	0.827	0.828	0.826
	InceptionV3	0.654	0.654	0.643	0.637
	EfficientNetV2B2	0.852	0.852	0.854	0.853
	EfficientNetV2L	0.846	0.846	0.848	0.846
	EfficientNetV2S	0.851	0.851	0.853	0.851
	EfficientNetV2M	0.864	0.864	0.868	0.861
	EfficientNetB0	0.842	0.842	0.846	0.843
Pin
	ResNet50	0.977	0.977	0.977	0.976
	ResNet152	0.978	0.978	0.978	0.978
	VGG16	0.976	0.976	0.977	0.976
	VGG19	0.974	0.974	0.974	0.974
	InceptionV3	0.900	0.900	0.895	0.896
	EfficientNetV2B2	0.966	0.966	0.966	0.966
	EfficientNetV2L	0.970	0.970	0.970	0.970
	EfficientNetV2S	0.977	0.977	0.977	0.977
	EfficientNetV2M	0.961	0.961	0.962	0.961
	EfficientNetB0	0.973	0.973	0.975	0.973
Property
	ResNet50	0.839	0.839	0.835	0.835
	ResNet152	0.852	0.852	0.850	0.847
	VGG16	0.841	0.841	0.839	0.832
	VGG19	0.837	0.837	0.832	0.829
	InceptionV3	0.673	0.673	0.648	0.657
	EfficientNetV2B2	0.877	0.877	0.881	0.878
	EfficientNetV2L	0.870	0.870	0.873	0.870
	EfficientNetV2S	0.867	0.867	0.868	0.867
	EfficientNetV2M	0.860	0.860	0.862	0.858
	EfficientNetB0	0.863	0.863	0.869	0.864
Map
	ResNet50	0.967	0.967	0.967	0.966
	ResNet152	0.974	0.974	0.974	0.974
	VGG16	0.971	0.971	0.971	0.971
	VGG19	0.975	0.975	0.975	0.975
	InceptionV3	0.903	0.903	0.903	0.898
	EfficientNetV2B2	0.976	0.976	0.976	0.976
	EfficientNetV2L	0.973	0.973	0.973	0.973
	EfficientNetV2S	0.981	0.981	0.981	0.981
	EfficientNetV2M	0.963	0.963	0.965	0.963
	EfficientNetB0	0.971	0.971	0.971	0.971
Hat
	ResNet50	0.964	0.964	0.964	0.960
	ResNet152	0.965	0.965	0.964	0.962
	VGG16	0.958	0.958	0.955	0.955
	VGG19	0.960	0.960	0.959	0.957
	InceptionV3	0.897	0.897	0.888	0.883
	EfficientNetV2B2	0.962	0.962	0.960	0.960
	EfficientNetV2L	0.953	0.953	0.953	0.952
	EfficientNetV2S	0.956	0.956	0.955	0.952
	EfficientNetV2M	0.957	0.957	0.954	0.953
	EfficientNetB0	0.956	0.956	0.963	0.960

References

Tsimonis, G.; Dimitriadis, S. Brand Strategies in Social Media. Mark. Intell. Planning 2014, 32, 328–344. [Google Scholar] [CrossRef]
Li, E.Y. (Ed.) Organizations and Social Networking: Utilizing Social Media to Engage Consumers; IGI Global: Hershey, PA, USA, 2013. [Google Scholar]
Fire, M.; Goldschmidt, R.; Elovici, Y. Online Social Networks: Threats and Solutions. IEEE Commun. Surv. Tutor. 2014, 16, 2019–2036. [Google Scholar] [CrossRef]
Li, X.; Smith, J.D.; Dinh, T.N.; Thai, M.T. Privacy Issues in Light of Reconnaissance Attacks with Incomplete Information. In Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), Omaha, NE, USA, 13–16 October 2016; IEEE: New York, NY, USA, 2016; pp. 311–318. [Google Scholar]
Kaur, K.; Gupta, I.; Singh, A.K. A Comparative Study of the Approach Provided for Preventing the Data Leakage. Int. J. Netw. Secur. Its Appl. 2017, 9, 21–32. [Google Scholar] [CrossRef]
Can, U.; Alatas, B. A New Direction in Social Network Analysis: Online Social Network Analysis Problems and Applications. Phys. A Stat. Mech. Its Appl. 2019, 535, 122372. [Google Scholar] [CrossRef]
Paradise, A.; Puzis, R.; Shabtai, A. Anti-Reconnaissance Tools: Detecting Targeted Socialbots. IEEE Internet Comput. 2014, 18, 11–19. [Google Scholar] [CrossRef]
Hou, T.; Wang, V. Industrial Espionage—A Systematic Literature Review (SLR). Comput. Secur. 2020, 98, 102019. [Google Scholar] [CrossRef]
Elyashar, A.; Fire, M.; Kagan, D.; Elovici, Y. Homing Socialbots: Intrusion on a Specific Organization’s Employee Using Socialbots. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara Falls, ON, Canada, 25–28 August 2013; pp. 1358–1365. [Google Scholar]
Fire, M.; Puzis, R. Organization Mining Using Online Social Networks. Netw. Spat. Econ. 2015, 16, 545–578. [Google Scholar] [CrossRef]
Boshmaf, Y.; Muslukhov, I.; Beznosov, K.; Ripeanu, M. Design and Analysis of a Social Botnet. Comput. Netw. 2013, 57, 556–578. [Google Scholar] [CrossRef]
Singh, A.K.; Gupta, I.; Verma, R.; Gautam, V.; Yadav, C.P. A Survey on Data Leakage Detection and Prevention. In Proceedings of the International Conference on Innovative Computing & Communications (ICICC), New Delhi, India, 21–23 February 2020; pp. 1–7. [Google Scholar] [CrossRef]
Kim, J.; Kim, H.J. A Study on Privacy Preserving Data Leakage Prevention System. In Recent Progress in Data Engineering and Internet Technology: Volume 2; Springer: Berlin/Heidelberg, 2012; pp. 191–196. [Google Scholar]
Nayak, S.K.; Ojha, A.C. Data Leakage Detection and Prevention: Review and Research Directions. In Machine Learning and Information Processing: Proceedings of ICMLIP 2019; Springer: Berlin/Heidelberg, Germany, 2020; pp. 203–212. [Google Scholar]
Morrow, B. BYOD Security Challenges: Control and Protect Your Most Sensitive Data. Netw. Secur. 2012, 2012, 5–8. [Google Scholar] [CrossRef]
Herrera Montano, I.; García Aranda, J.J.; Ramos Diaz, J.; Molina Cardín, S.; De la Torre Díez, I.; Rodrigues, J.J. Survey of Techniques on Data Leakage Protection and Methods to Address the Insider Threat. Clust. Comput. 2022, 25, 4289–4302. [Google Scholar] [CrossRef]
Theoharidou, M.; Kokolakis, S.; Karyda, M.; Kiountouzis, E. The Insider Threat to Information Systems and the Effectiveness of ISO17799. Comput. Secur. 2005, 24, 472–484. [Google Scholar] [CrossRef]
Kutschera, S. Incidental Data: Observation of Privacy Compromising Data on Social Media Platforms. Int. Cybersecur. Law Rev. 2023, 4, 91–114. [Google Scholar] [CrossRef]
Choi, M.J.; Torralba, A.; Willsky, A.S. Context Models and Out-of-Context Objects. Pattern Recognit. Lett. 2012, 33, 853–862. [Google Scholar] [CrossRef]
Mottaghi, R.; Chen, X.; Liu, X.; Cho, N.G.; Lee, S.W.; Fidler, S.; Yuille, A. The Role of Context for Object Detection and Semantic Segmentation in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 891–898. [Google Scholar]
Acharya, M.; Roy, A.; Koneripalli, K.; Jha, S.; Kanan, C.; Divakaran, A. Detecting Out-of-Context Objects Using Graph Context Reasoning Network. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Messe Wien, Vienna, Austria, 23–29 July 2022. [Google Scholar]
Alamri, F.; Pugeault, N. Improving Object Detection Performance Using Scene Contextual Constraints. IEEE Trans. Cogn. Dev. Syst. 2022, 14, 1320–1330. [Google Scholar] [CrossRef]
Aneja, S.; Bregler, C.; Nießner, M. Cosmos: Catching Out-of-Context Image Misuse Using Self-Supervised Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 14084–14092. [Google Scholar]
Zhang, Y.; Trinh, L.; Cao, D.; Cui, Z.; Liu, Y. Detecting out-of-context multimodal misinformation with interpretable neural-symbolic model. arXiv 2023, arXiv:2304.07633. [Google Scholar]
Fan, D.-P.; Ji, G.-P.; Cheng, M.-M.; Shao, L. Concealed Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 6024–6042. [Google Scholar] [CrossRef]
Grossman, E.N.; Gordon, J.; Novotny, D.; Chamberlin, R. Terahertz Active and Passive Imaging. In Proceedings of the 8th European Conference on Antennas and Propagation, Hague, The Netherlands, 6–11 April 2014; pp. 2221–2225. [Google Scholar] [CrossRef]
Trofimov, V.A.; Trofimov, V.V.; Shestakov, I.L.; Blednov, R.G. Concealed Object Detection Using the Passive THz Image Without Its Viewing. In Passive and Active Millimeter-Wave Imaging XIX; SPIE: Bellingham, WA, USA, 2016; Volume 9830, pp. 88–98. [Google Scholar]
Kowalski, M. Hidden Object Detection and Recognition in Passive Terahertz and Mid-Wavelength Infrared. J. Infrared Millim. Terahertz Waves 2019, 40, 1074–1091. [Google Scholar] [CrossRef]
Xu, F.; Huang, X.; Wu, Q.; Zhang, X.; Shang, Z.; Zhang, Y. YOLO-MSFG: Toward Real-Time Detection of Concealed Objects in Passive Terahertz Images. IEEE Sens. J. 2021, 22, 520–534. [Google Scholar] [CrossRef]
Liu, Y.; Xu, F.; Pu, Z.; Huang, X.; Chen, J.; Shao, S. AC-SDBSCAN: Toward Concealed Object Detection of Passive Terahertz Images. IET Image Process. 2021, 16, 839–851. [Google Scholar] [CrossRef]
Cheng, L.; Ji, Y.; Li, C.; Liu, X.; Fang, G. Improved SSD Network for Fast Concealed Object Detection and Recognition in Passive Terahertz Security Images. Sci. Rep. 2022, 12, 12082. [Google Scholar] [CrossRef]
Pang, L.; Liu, H.; Chen, Y.; Miao, J. Real-Time Concealed Object Detection from Passive Millimeter Wave Images Based on the YOLOv3 Algorithm. Sensors 2020, 20, 1678. [Google Scholar] [CrossRef] [PubMed]
Yang, H.; Yang, Z.; Hu, A.; Liu, C.; Cui, T.J.; Miao, J. Unifying Convolution and Transformer for Efficient Concealed Object Detection in Passive Millimeter-Wave Images. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 3872–3887. [Google Scholar] [CrossRef]
Du, K.; Zhang, L.; Chen, W.; Wan, G.; Fu, R. Concealed Objects Detection Based on FWT in Active Millimeter-Wave Images. In Seventh International Conference on Electronics and Information Engineering; SPIE: Bellingham, WA, USA, 2017; Volume 10322, pp. 386–391. [Google Scholar]
Liu, J.; Zhang, K.; Sun, Z.; Wu, Q.; He, W.; Wang, H. Concealed Object Detection and Recognition System Based on Millimeter Wave FMCW Radar. Appl. Sci. 2021, 11, 8926. [Google Scholar] [CrossRef]
Wang, C.; Shi, J.; Zhou, Z.; Li, L.; Zhou, Y.; Yang, X. Concealed Object Detection for Millimeter-Wave Images with Normalized Accumulation Map. IEEE Sens. J. 2021, 21, 6468–6475. [Google Scholar] [CrossRef]
Ahmed, A.; Kumari, V.; Sheoran, G. Concealed Object Detection Using Microwave Transmission Holography. In Proceedings of the 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India, 24–26 June 2022; IEEE: New York, NY, USA; pp. 1–4. [Google Scholar]
Wang, X.; Gou, S.; Li, J.; Zhao, Y.; Liu, Z.; Jiao, C.; Mao, S. Self-Paced Feature Attention Fusion Network for Concealed Object Detection in Millimeter-Wave Image. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 224–239. [Google Scholar] [CrossRef]
Sun, P.; Liu, T.; Chen, X.; Zhang, S.; Zhao, Y.; Wei, S. Multi-Source Aggregation Transformer for Concealed Object Detection in Millimeter-Wave Images. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6148–6159. [Google Scholar] [CrossRef]
Asok, A.O.; Nath, G.; Dey, S. Concealed Object Detection with Microwave Imaging Using Vivaldi Antennas Utilizing Novel Time-Domain Beamforming Algorithm. IEEE Access 2022, 10, 116987–117000. [Google Scholar] [CrossRef]
Asok, A.O.; Nath, G.; Dey, S. Microwave Imaging with Novel Time-Domain Clutter Removal Algorithm Using High Gain Antennas for Concealed Object Detections. IEEE Trans. Comput. Imaging 2023, 9, 147–158. [Google Scholar] [CrossRef]
Liu, G.; Sun, X.; Li, Y.; Li, H.; Zhao, S.; Guo, Z. An Automatic Privacy-Aware Framework for Text Data in Online Social Network Based on a Multi-Deep Learning Model. Int. J. Intell. Syst. 2023, 2023, 1727285. [Google Scholar] [CrossRef]
Ahmed, H.; Traore, I.; Saad, S.; Mamun, M. Automated Detection of Unstructured Context-Dependent Sensitive Information Using Deep Learning. Internet Things 2021, 16, 100444. [Google Scholar] [CrossRef]
Melad, N. Detecting and Blurring Potentially Sensitive Personal Information Containers in Images Using Faster R-CNN Object Detection Model with TensorFlow and OpenCV. Master’s Thesis, Asia Pacific College, Kalakhang Maynila, Philippines, 2019. [Google Scholar]
Soni, S.; Hiran, K.K. Personally Identifiable Information (PII) Detection and Obfuscation Using YOLOv3 Object Detector. In Proceedings of the International Conference on Emerging Technologies in Computer Engineering, Jaipur, India, 4–5 February 2022; Springer: Cham, Switzerland, 2022; pp. 274–282. [Google Scholar]
Yu, J.; Zhang, B.; Kuang, Z.; Lin, D.; Fan, J. iPrivacy: Image Privacy Protection by Identifying Sensitive Objects via Deep Multi-Task Learning. IEEE Trans. Inf. Forensics Secur. 2016, 12, 1005–1016. [Google Scholar] [CrossRef]
Harvey, R.; Lebret, R.; Massonnet, S.; Aberer, K.; Demartini, G. Firearms on Twitter: A Novel Object Detection Pipeline. In Proceedings of the International AAAI Conference on Web and Social Media, Limassol, Cyprus, 5–8 June 2023; Volume 17, pp. 1128–1132. [Google Scholar]
Mao, Y.; Song, B.; Zhang, Z.; Yang, W.; Lan, Y. Multi-Classification Sensitive Image Detection Method Based on Lightweight Convolutional Neural Network. KSII Trans. Internet Inf. Syst. 2023, 17, 5. [Google Scholar]
Ha, E.; Kim, H.; Na, D. HOD: New Harmful Object Detection Benchmarks for Robust Surveillance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 183–192. [Google Scholar]
Li, H.; Peng, Z.; Feng, X.; Ma, H. Leakage Prevention Method for Unstructured Data Based on Classification. In Proceedings of the Applications and Techniques in Information Security: 6th International Conference, ATIS 2015, Beijing, China, 4–6 November 2015; Proceedings 6. Springer: Berlin/Heidelberg, Germany, 2015; pp. 337–343. [Google Scholar]
Chae, C.-J.; Shin, Y.; Choi, K.; Kim, K.-B.; Choi, K.-N. A Privacy Data Leakage Prevention Method in P2P Networks. Peer Peer Netw. Appl. 2015, 9, 508–519. [Google Scholar] [CrossRef]
Elngar, A.A.; Arafa, M.; Fathy, A.; Moustafa, B.; Mahmoud, O.; Shaban, M.; Fawzy, N. Image Classification Based on CNN: A Survey. J. Cybersecur. Inf. Manag. 2021, 6, 18–50. [Google Scholar] [CrossRef]
Shin, H.-C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [Google Scholar] [CrossRef] [PubMed]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; PMLR: New York, NY, USA; pp. 10096–10106. [Google Scholar]
Targ, S.; Almeida, D.; Lyman, K. ResNet in ResNet: Generalizing Residual Architectures. arXiv 2016, arXiv:1603.08029. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object Detection with Deep Learning: A Review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
Wu, X.; Sahoo, D.; Hoi, S.C.H. Recent Advances in Deep Learning for Object Detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
He, X.; Deng, L. Deep Learning for Image-To-Text Generation: A Technical Overview. IEEE Signal Process. Mag. 2017, 34, 109–116. [Google Scholar] [CrossRef]
Li, J.; Li, D.; Xiong, C.; Hoi, S. BLIP: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 June 2022; PMLR: New York, NY, USA; pp. 12888–12900. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; PMLR: New York, NY, USA; pp. 6105–6114. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Liu, Y.; Sun, P.; Wergeles, N.; Shang, Y. A Survey and Performance Evaluation of Deep Learning Methods for Small Object Detection. Expert Syst. Appl. 2021, 172, 114602. [Google Scholar] [CrossRef]
Tong, K.; Wu, Y.; Zhou, F. Recent Advances in Small Object Detection Based on Deep Learning: A Review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
Tian, J.; Jin, Q.; Wang, Y.; Yang, J.; Zhang, S.; Sun, D. Performance Analysis of Deep Learning-Based Object Detection Algorithms on COCO Benchmark: A Comparative Study. J. Eng. Appl. Sci. 2024, 71, 76. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
Figueira, A.; Vaz, B. Survey on Synthetic Data Generation, Evaluation Methods and GANs. Mathematics 2022, 10, 2733. [Google Scholar] [CrossRef]

Figure 1. An overview of the method steps. There are four main steps listed in the figure as 1,2,3,4.

Figure 2. (a) Image with a screen: this image shows a digital screen of a laptop displayed in the center with an organization member; (b) image with a screen and map: this image presents a digital screen of a laptop with an organization member, similar to the first image, but with the additional element of a map displayed on it.

Figure 3. The number of images collected each year using the API. According to the data, the number of images collected annually has increased significantly since 2017. Most of the posts were collected between 2019 and 2022.

Figure 4. Data used for training the CNN architecture for each object type (member, screen, property, map, hat, and pin). The pink color represents the number of images that are not defined as containing the specific object, while the blue color represents the number of images that were defined as containing it.

Figure 5. The number of images and tags used in training the YOLOv5 models for each object type (member, screen, property, map, hat, and pin). The blue color represents the number of photos used to train the object-detection model. The pink color indicates the total number of tags associated with the object in the photos. It is important to note that a single image can contain multiple tags for the same object, which is reflected in the pink data.

Figure 6. Data used for testing each object type (member, screen, property, map, hat, and pin). The blue color represents the number of images that are not defined as containing the specific object, while the pink color represents the number of images that were defined as containing it.

Figure 7. A methodology workflow for training, evaluation, and testing of the CNN architectures, YOLO, and BLIP models.

Figure 8. An evaluation of the accuracy of the CNN, YOLOv5, and BLIP models based on the validation data.

Figure 9. An evaluation of the F-scores of the CNN, YOLOv5, and BLIP models based on the validation data.

Figure 10. Number of appearances of each object in the 252,628 images in the organization’s Instagram dataset.

Figure 11. Results of the best classifiers for the two images that were presented in Figure 2. The ☑ symbol indicates that the classifier determined the object is present in the image, while the ⮽ symbol indicates that the classifier determined the object is not present in the image.

Table 1. A comparative analysis of related works. A comparison was made between the works that were previously shown in the related work section and our method, which is presented in the final row in the table. The “☑” symbol indicates that the metric in the column was included in the study, while the “⮽” symbol indicates that it was not included.

	Image-Based Analysis	Social Media Data	Deep Learning Techniques	Image Captioning Using Deep Learning	Object Detection Using Deep Learning	Sensitive/Censored Object Detection	Out-of- Context Object Detection	Concealed Object Detection	Harmful Object Detection	Sensitive Text Detection
[21,22]	☑	⮽	☑	⮽	☑	⮽	☑	⮽	⮽	⮽
[23]	☑	⮽	☑	☑	☑	⮽	☑	⮽	⮽	⮽
[24]	☑	⮽	☑	☑	⮽	⮽	☑	⮽	⮽	⮽
[27]	☑	⮽	⮽	⮽	☑	⮽	⮽	☑	⮽	⮽
[28,29,31,32,33,35,36,38,39]	☑	⮽	☑	⮽	☑	⮽	⮽	☑	⮽	⮽
[30,34,37,40,41]	☑	⮽	⮽	⮽	⮽	⮽	⮽	☑	⮽	⮽
[42]	⮽	☑	☑	⮽	⮽	⮽	⮽	⮽	⮽	☑
[43]	☑	☑	☑	☑	⮽	⮽	⮽	⮽	⮽	☑
[44,45]	☑	☑	☑	⮽	☑	⮽	⮽	⮽	⮽	☑
[46]	☑	☑	☑	⮽	☑	☑	⮽	⮽	⮽	⮽
[48]	☑	⮽	☑	⮽	⮽	⮽	⮽	⮽	☑	⮽
[47,49]	☑	⮽	☑	⮽	☑	⮽	⮽	⮽	☑	⮽
Our method	☑	☑	☑	☑	☑	☑	⮽	⮽	⮽	⮽

Table 2. Results of best CNN models based on testing data for each object type.

Object	CNN Model
Hat	ResNet152
Pin	ResNet152
Map	EfficientNetV2S
Property	EfficientNetV2B2
Screen	ResNe50

Table 3. Results of best CNN models based on testing data in terms of accuracy, F-score, and weighted F-score for each object type (member, screen, property, map, hat, and pin).

	Hat	Pin	Map	Property	Screen	Member
Accuracy	0.992	0.993	0.99	0.902	0.941	0.854
F-score	0.211	0	0.087	0.813	0.213	0.804
Weighted F-score	0.603	0.498	0.541	0.873	0.591	0.844

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Paradise Vit, A.; Aronson, Y.; Fraidenberg, R.; Puzis, R. Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images. Appl. Sci. 2024, 14, 7915. https://doi.org/10.3390/app14177915

AMA Style

Paradise Vit A, Aronson Y, Fraidenberg R, Puzis R. Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images. Applied Sciences. 2024; 14(17):7915. https://doi.org/10.3390/app14177915

Chicago/Turabian Style

Paradise Vit, Abigail, Yarden Aronson, Raz Fraidenberg, and Rami Puzis. 2024. "Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images" Applied Sciences 14, no. 17: 7915. https://doi.org/10.3390/app14177915

APA Style

Paradise Vit, A., Aronson, Y., Fraidenberg, R., & Puzis, R. (2024). Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images. Applied Sciences, 14(17), 7915. https://doi.org/10.3390/app14177915

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images

Abstract

1. Introduction

2. Related Work

2.1. Out-of-Context Object Detection

2.2. Concealed Object Detection

2.3. Sensitive Text Detection

2.4. Sensitive and Harmful Object Detection

3. Materials and Methods

3.1. The Method

3.1.1. Defining Confidential Information

3.1.2. Selecting the Organization’s Representative Hashtags

3.1.3. Collecting Data from OSN Images

3.1.4. Training Deep Learning Classifiers

3.2. Case Study

3.2.1. Defining Confidential Information

3.2.2. Selecting the Organization’s Representative Hashtags

3.2.3. Collecting Data from OSN Images

3.2.4. Training Deep Learning Classifiers

3.3. Data Collection for Testing

4. Results and Discussion

5. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI