Exploring Boost Efficiency in Text Analysis by Using AI Techniques in Port Companies
Abstract
:1. Introduction
2. Background
2.1. Port System Overview
2.2. Literature Review
3. Methods
3.1. Machine Learning Techniques
3.1.1. Supervised Learning
- (i)
- Recurrent Neural Networks (RNNs).RNNs are a class of supervised learning algorithms designed to process sequential data by capturing temporal dependencies. They achieve this by maintaining information from previous states in hidden layers, which is essential for understanding dynamic patterns over time [49] (Equation (1)):Here, represents the hidden state at time t, capturing information from the current input and the previous hidden state . This mechanism allows the RNN to integrate prior data, with weight matrices and influencing the input and the transition from one state to the next, respectively, while serves as the bias.The output at time step t is given byIn this equation, is the output, is the weight matrix linking the hidden state to the output, is the bias and represents a task-specific activation function, such as softmax for classification or identity for regression. The process demonstrates how an RNN uses the present inputs, , and previous states, , to generate a new hidden state, , which serves as a “memory” and influences the subsequent output.In the present study, the selection of RNN over alternative sequential models, such as LSTM, is substantiated by its efficacy in capturing short-term dependencies, a critical aspect of the dataset under consideration, wherein the sequences are not extensive. While LSTMs are regarded as superior in terms of learning long-term dependencies and circumventing issues such as gradient disappearance, in this study, RNNs are preferred due to their simplicity and lower computational cost. This choice enables faster processing and reduced resource utilization.
- (ii)
- Multi-Layer Perceptron (MLP).The Multi-Layer Perceptron (MLP) is a neural network architecture designed for both classification and regression tasks. It consists of multiple layers of interconnected neurons that systematically process input data. The transformation of an input vector through each hidden layer is defined as follows [50]:Here, is the input, is the output of the layer l (where ), and and are the weight matrix and the bias vector, respectively, while f is a nonlinear activation function. The output layer generates the final output as follows:
- (iii)
- Support Vector Machine (SVM).The Support Vector Machine (SVM) is a supervised learning algorithm that constructs a hyperplane to maximize the separation between two classes of data. The model is trained on a dataset represented as , with , where each is a feature vector and indicates the class label, defined by the following [51]:This condition ensures that all data points are correctly classified while maintaining a margin between the hyperplane and the nearest data points. The classification of new instances is determined by
- (iv)
- Decision Tree for ML (DT).Decision trees (DTs) use a hierarchical structure to make decisions by recursively splitting data into branches based on selected features. This model is widely used for both classification and regression tasks due to its interpretability. The Gini index is often used to measure inequality within a node [52]:
3.1.2. Unsupervised Learning
- (i)
- Random Forest Clustering (RFC)RFC adapts traditional random forest methods, typically used for classification and regression, to clustering tasks. This method constructs a forest of decision trees, each trained on different random subsets of the dataset, using an isolation forest approach to effectively handle unlabeled data [53]. Unlike supervised learning, RFC does not rely on predefined categories for data segmentation.In RFC, the similarity between instances is quantified by the frequency with which two points land on the same leaf, which is captured in a similarity matrix S. The matrix element represents the proportion of trees where two points share the same leaf, with values ranging from 0 to 1—higher values indicate greater similarity. This model supports the use of traditional clustering algorithms, such as k-means or hierarchical clustering, by providing a similarity matrix that can be interpreted as distances in spectral clustering approaches. RFC’s ability to use random forest structures to identify intrinsic data similarities makes it particularly useful for addressing complex clustering challenges in studies involving unlabeled data.
3.2. Metrics
3.3. Hybrid Method
- How can a hybrid ML and NLP model be constructed to classify companies based on strategic port information?
- How can strategic texts be analyzed to identify sustainable and technological aspects?
- What are the potential challenges and opportunities associated with using ML and NLP to classify companies based on port information?
3.3.1. Phase 1: NLP-Driven Strategic Analysis in Port Systems
- System overview and data collection:
- Port system description: The structure of the port system is reviewed to identify key players and their strategic roles, thereby enhancing the understanding of systemic operations. This phase also includes the selection of critical strategic texts that describe the decision-making processes within port companies.
- Strategic data collection: Key information is systematically collected from multiple websites using both manual and automated web scraping techniques to ensure comprehensive data collection. The data collected include company names, roles within the port, geographic locations and strategic variables such as mission, vision, values, goals and other corporate text.
- Data processing and text analysis:
- Automated text extraction: Text is automatically extracted from digital documents and websites, streamlining the data collection process.
- Text preprocessing techniques:
- –
- Normalization and tokenization: These processes transform raw text into a structured format suitable for further analysis.
- –
- Stop word removal and lemmatization: These steps refine the dataset by removing extraneous words and applying lemmatization to improve the quality of text analysis.
- NLP and text characterization:
- –
- Text Mining: Statistical and machine learning algorithms are used to identify patterns and extract valuable insights from large text datasets.
- –
- Web scraping: This technique systematically extracts data from websites, providing a robust database for subsequent text mining.
- –
- NLP techniques: Advanced NLP techniques are used to deeply characterize textual information, enabling entity identification, text classification, and sentiment analysis to reveal strategic patterns and trends.
3.3.2. Phase 2: Advanced ML and NLP Methods for Strategic Text Classification
- Option 1: BERT-based text analysis and classification
- Preprocessing and vectorization: Texts are transformed into numerical vectors using BERT to achieve a deep contextual representation.
- Sentiment analysis: This process evaluates and classifies emotions and opinions in the text, focusing on content relevant to port systems.
- Dimensionality reduction and clustering: This step is critical for simplifying models and identifying significant data groups or patterns without compromising essential information.
- Model training and evaluation: Fine-tunes the parameters and performance of BERT-based models.
- Network diagram integration: Classification models are developed to incorporate network diagrams that help illustrate data connections and dependencies.
- Option 2: Word2Vec-based text analysis and classification
- Construction of representative dictionaries: Dictionaries are created to identify key terms in the areas of innovation, sustainability and technology.
- Compilation of strategic texts: Texts representative of each category are organized for further training and analysis.
- Vectorization: Word2Vec is used to convert text to vector form, capturing its semantic essence.
- Training and evaluation: Machine learning classification models are trained and evaluated using the vector representations as input.
- Integrate additional vectors: By integrating additional vectors into the models, the analysis is extended and the classification accuracy is improved.
3.3.3. Phase 3: Advanced Predictive Analytics and Model Evaluation
- Neural network (NN) and hybrid model development
- NN training: Neural network models are trained to analyze operations within the port environment, enhancing the predictive capabilities of the system.
- Hybrid model implementation: These models integrate neural networks with vectors generated by natural language processing (NLP), creating robust hybrid models that leverage both textual and numerical data.
- NLP processing and vectorization:
- Vectorization for prediction: Textual data are transformed into vector formats using NLP techniques, enabling these data to be used as input for predictive modeling.
- Application of hybrid models: The hybrid models are applied to both classification and prediction tasks in the port context to improve accuracy and reliability.
- Cluster classification of port companies: Classify port companies into clusters based on operational and strategic characteristics to identify patterns and improve decision-making.
- Hybrid model: NN and ML
- ML algorithms: Implement and train machine learning models, including decision trees, recurrent NN, multi-layer perceptrons, random forests and support vector machines.
- Performance evaluation: Evaluate models using accuracy, recall, F1 score and overall precision metrics.
- Cluster classification of port companies:
- Perform cluster classification: Categorizes port companies into clusters based on their operational and strategic characteristics. This classification helps identify patterns that can significantly streamline decision-making processes.
- Analysis and real-time data integration:
- Results Analysis: Analyzes the effectiveness of models to determine their performance and identify areas for improvement.
- Predictive Analytics: Explore real-time data integration coupled with predictive analytics. This approach is designed to predict trends and behaviors in port operations, enabling proactive management and optimization.
4. Implementation and Evaluation of the Hybrid Method
4.1. Phase 1: NLP-Driven Strategic Analysis in Port Systems
4.2. Phase 2: Advanced ML and NLP Methods for Strategic Text Classification
4.2.1. K-Means Algorithm
4.2.2. Integration of TF-IDF and Clustering Models
4.2.3. Pattern and Strategic Alliance Visualization
4.3. Phase 3: Advanced Predictive Analytics and Model Evaluation
4.3.1. Evaluation of Classification Models Using Error Metrics
4.3.2. Performance Analysis of Classification Models
5. Discussion
5.1. Challenges of the Hybrid Model
5.2. Strategic and Technological Analysis in Ports
5.3. Related Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
NLP | Natural Language Processing |
ML | Machine Learning |
LDA | Latent Dirichlet Allocation |
WoS | Web of Science |
RNN | Recurrent Neural Network |
MLP | Multi-Layer Perceptron |
RF | Random Forest |
SVM | Support Vector Machine |
DT | Decision Tree |
MAE | Mean Absolute Error |
MAPE | Mean Absolute Percentage Error |
MDAE | Median Absolute Error |
MSE | Mean Square Error |
RMSE | Root Mean Square Error |
References
- Park, J.S.; Seo, Y.J. The impact of seaports on the regional economies in South Korea: Panel evidence from the augmented Solow model. Transp. Res. Part Logist. Transp. Rev. 2016, 85, 107–119. [Google Scholar] [CrossRef]
- Hossain, T.; Adams, M.; Walker, T. Role of sustainability in global seaports. Ocean Coast. Manag. 2021, 202, 105435. [Google Scholar] [CrossRef]
- Adom, A.Y.; Nyarko, I.K.; Som, G.N.K. Competitor Analysis in Strategic Management: Is it a Worthwhile Managerial Practice in Contemporary Times? J. Resour. Dev. Manag. 2016, 24, 116–127. [Google Scholar]
- Durán, C.; Córdova, F. ScienceDirect Information Technology and Quantitative Management (ITQM 2016) Conceptual model to identify technological synergic relationships of strategic level in a medium-sized Chilean port. Procedia Comput. Sci. 2016, 91, 382–391. [Google Scholar] [CrossRef]
- Menon, A.; Choi, J.; Tabakovic, H. What You Say Your Strategy Is and Why It Matters: Natural Language Processing of Unstructured Text. Acad. Manag. Proc. 2018, 2018, 18319. [Google Scholar] [CrossRef]
- Sharoff, S. What neural networks know about linguistic complexity. Russ. J. Linguist. 2022, 26, 371–390. [Google Scholar] [CrossRef]
- Ranjan Jayanthi, F.C. Big Data Analytics in Building the Competitive Intelligence of Organizations. Int. J. Inf. Manag. 2021, 56, 102231. [Google Scholar] [CrossRef]
- Zhecheva, D.; Nenkov, N. Business demands for processing unstructured textual data – text mining techniques for companies to implement. Access J. Access Sci. Busin. Innov. Digit. Econ. 2022, 3, 107–120. [Google Scholar] [CrossRef]
- Mouratidis, I.; Kamariotou, M.I.; Kitsios, F.C. Big Data Strategy and Business Analytics: A Literature Review. In Operational Research in the Era of Digital Transformation and Business Analytics; Matsatsinis, N.F., Kitsios, F.C., Madas, M.A., Kamariotou, M.I., Eds.; Springer: Cham, Switzerland, 2023; pp. 171–178. [Google Scholar]
- Evangelopoulos, N.; Zhang, X.; Prybutok, V. Latent Semantic Analysis: Five Methodological Recommendations. Eur. J. Inf. Syst. 2012, 21, 70–86. [Google Scholar] [CrossRef]
- Sobrie, O.; Mousseau, V.; Pirlot, M.; Fortemps, P.; Mahmoudi, S.; De Smet, Y.; Labreuche, C.; Gillis, N.; Ouerdane, W.; de Mons, U.; et al. Learning Preferences with Multiple-Criteria Models; Université Paris Saclay (COmUE); Université de Mons, 2016. Available online: https://theses.hal.science/tel-01370555 (accessed on 9 April 2025).
- Alekberli, R.Z.; Haussmann, R.E. Integrating Big Data Governance and Corporate Strategies in Small and Medium Caspian Basin Seaports. In Proceedings of the 2024 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT), Dubai, United Arab Emirates, 19–21 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Nikolakopoulos, A.; Julian Segui, M.; Pellicer, A.B.; Kefalogiannis, M.; Gizelis, C.A.; Marinakis, A.; Nestorakis, K.; Varvarigou, T. BigDaM: Efficient Big Data Management and Interoperability Middleware for Seaports as Critical Infrastructures. Computers 2023, 12, 218. [Google Scholar] [CrossRef]
- Durlik, I.; Miller, T.; Cembrowska-Lech, D.; Krzemińska, A.; Złoczowska, E.; Nowak, A. Navigating the Sea of Data: A Comprehensive Review on Data Analysis in Maritime IoT Applications. Appl. Sci. 2023, 13, 9742. [Google Scholar] [CrossRef]
- Robert, A.; Frank, L.; Potter, K. Explainable AI: Interpreting and Understanding Machine Learning Models. Artif. Intell. 2024. [Google Scholar] [CrossRef]
- Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Gutierrez-Bustamante, M.; Espinosa-Leal, L. Natural language processing methods for scoring sustainability reports—A study of Nordic listed companies. Sustainability 2022, 14, 9165. [Google Scholar] [CrossRef]
- Berry, D.M. Ambiguity in Natural Language Requirements Documents. In Innovations for Requirement Analysis. From Stakeholders’ Needs to Formal Designs; Paech, B., Martell, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 1–7. [Google Scholar]
- Durán, C.; Palominos, F.; Carrasco, R.; Carrillo, E. Influence of Strategic Interrelationships and Decision-Making in Chilean Port Networks on Their Degree of Sustainability. Sustainability 2021, 13, 3959. [Google Scholar] [CrossRef]
- Rodriguez Estevez, D.; González-Cancelas, N.; Camarero, A.; Vaca Cabrero, J. Development of a “Smart Dry Port” Indicator and Ranking Calculation for Spanish Dry Ports. Future Transp. 2023, 3, 1272–1291. [Google Scholar] [CrossRef]
- Feng, J.; Han, P.; Zheng, W. Identifying the factors affecting strategic decision-making ability to boost the entrepreneurial performance: A hybrid structural equation modeling – artificial neural network approach. Front. Psychol. 2022, 13, 1038604. [Google Scholar] [CrossRef]
- Amoako, G.; Omari, P.; Kumi, D.; Agbemabiase, G.; Asamoah, G. Conceptual Framework—Artificial Intelligence and Better Entrepreneurial Decision-Making: The Influence of Customer Preference, Industry Benchmark, and Employee Involvement in an Emerging Market. J. Risk Financ. Manag. 2021, 14, 604. [Google Scholar] [CrossRef]
- Lienert, J.; Linkov, I. Editorial featured papers on environmental decisions. EURO J. Decis. Process. 2019, 7, 151–157. [Google Scholar] [CrossRef]
- Denktas-Sakar, G.; Karatas-Cetin, C. Port Sustainability and Stakeholder Management in Supply Chains: A Framework on Resource Dependence Theory. Asian J. Shipp. Logist. 2012, 28, 301–319. [Google Scholar] [CrossRef]
- Leong, K.H.; Dahnil, D.P. Classification of Healthcare Service Reviews with Sentiment Analysis to Refine User Satisfaction. Int. J. Electr. Comput. Eng. Syst. 2022, 13, 323–330. [Google Scholar] [CrossRef]
- Haider, M.; Gandomi, A. When big data made the headlines: Mining the text of big data coverage in the news media. Int. J. Serv. Technol. Manag. 2021, 27, 23. [Google Scholar] [CrossRef]
- Chiarello, F.; Gastaldi, L.; Martini, A. Design and implementation of a text mining-based tool to support scoping reviews. Int. J. Technol. Manag. 2023, 91, 147. [Google Scholar] [CrossRef]
- Lee, H.; Lee, S.H.; Lee, K.R.; Kim, J.H. ESG Discourse Analysis Through BERTopic: Comparing News Articles and Academic Papers. Comput. Mater. Contin. 2023, 75, 6023–6037. [Google Scholar] [CrossRef]
- Wang, H.; Lu, Q. Understanding Philosophies of Higher Education between Countries in China’s Belt and Road Initiative: Analysis of University Mottos Based on Natural Language Processing Technology. Sage Open 2022, 12, 21582440221. [Google Scholar] [CrossRef]
- Wang, Y.; Feng, L.; Wang, J.; Zhao, H.; Liu, P. Technology Trend Forecasting and Technology Opportunity Discovery Based on Text Mining: The Case of Refrigerated Container Technology. Processes 2022, 10, 551. [Google Scholar] [CrossRef]
- Dehler-Holland, J.; Okoh, M.; Keles, D. Assessing technology legitimacy with topic models and sentiment analysis – The case of wind power in Germany. Technol. Forecast. Soc. Chang. 2022, 175, 121354. [Google Scholar] [CrossRef]
- Chowdhury, S.; Alzarrad, A. Applications of Text Mining in the Transportation Infrastructure Sector: A Review. Information 2023, 14, 201. [Google Scholar] [CrossRef]
- Karkhanis, G.V.; Chandnani, S.U.; Chakraborti, S. Analysis of employee perception of employer brand: A comparative study across business cycles using structural topic modelling. J. Bus. Anal. 2023, 6, 95–111. [Google Scholar] [CrossRef]
- Jatnika, D.; Bijaksana, M.A.; Suryani, A.A. Word2Vec Model Analysis for Semantic Similarities in English Words. Procedia Comput. Sci. 2019, 157, 160–167. [Google Scholar] [CrossRef]
- Hannigan, T.; Haans, R.F.; Vakili, K.; Tchalian, H.; Glaser, V.L.; Wang, M.; Kaplan, S.; Jennings, P.D. Topic modeling in management research: Rendering new theory from textual data. Acad. Manag. Ann. 2019, 13, 586–632. [Google Scholar] [CrossRef]
- Ahadh, A.; Binish, G.; Srinivasan, R. Text mining of accident reports using semi-supervised keyword extraction and topic modeling. Process Saf. Environ. Prot. 2020, 155, 455–465. [Google Scholar] [CrossRef]
- Batool, A.; Byun, Y.C. Enhanced Sentiment Analysis and Topic Modeling During the Pandemic Using Automated Latent Dirichlet Allocation. IEEE Access 2024, 12, 81206–81220. [Google Scholar] [CrossRef]
- Turkeli, S.; Ozaydin, F. A Novel Framework for Extracting Knowledge Management from Business Intelligence Log Files in Hospitals. Appl. Sci. 2022, 12, 5621. [Google Scholar] [CrossRef]
- Altuntas, F.; Gok, M.S. A data-driven analysis of renewable energy management: A case study of wind energy technology. Clust. Comput. 2023, 26, 4133–4152. [Google Scholar] [CrossRef]
- Pan, X.; Zhong, B.; Wang, X.; Xiang, R. Text mining-based patent analysis of BIM application in construction. J. Civ. Eng. Manag. 2021, 27, 303–315. [Google Scholar] [CrossRef]
- Vinayavekhin, S.; Li, F.; Banerjee, A.; Caputo, A. The academic landscape of sustainability in management literature: Towards a more interdisciplinary research agenda. Bus. Strategy Environ. 2022, 107, 5748–5784. [Google Scholar] [CrossRef]
- Chang, I.C.; Horng, J.S.; Liu, C.H.; Chou, S.F.; Yu, T.Y. Exploration of Topic Classification in the Tourism Field with Text Mining Technology—A Case Study of the Academic Journal Papers. Sustainability 2022, 14, 4053. [Google Scholar] [CrossRef]
- Ozcan, S.; Suloglu, M.; Sakar, C.O.; Chatufale, S. Social media mining for ideation: Identification of sustainable solutions and opinions. Technovation 2021, 107, 102322. [Google Scholar] [CrossRef]
- Tavana, M.; Shaabani, A.; Vanani, I.R.; Gangadhari, R.K. A Review of Digital Transformation on Supply Chain Process Management Using Text Mining. Processes 2022, 10, 842. [Google Scholar] [CrossRef]
- Tavana, M.; Shaabani, A.; Santos-Arteaga, F.J.; Vanani, I.R. A Review of Uncertain Decision-Making Methods in Energy Management Using Text Mining and Data Analytics. Energies 2020, 13, 3947. [Google Scholar] [CrossRef]
- Eddy Soria Leyva, D.P.P. Environmental approach in the hotel industry: Riding the wave of change. Sustain. Future 2021, 3, 100050. [Google Scholar] [CrossRef]
- Mishra, M.K.; Sharma, C.; Sharma, S.; Kumar, S.; Srivastav, A.L. Exploring Antecedents, Consequences, Research Constituents and Future Directions of Circular Economy: A Predictive Analysis in the Preview of Text Mining. J. Knowl. Econ. 2024, 2024, 1–25. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, X.; Zhu, X.L. Enhancing emerging technology discovery in nanomedicine by integrating innovative sentences using BERT and NLDA. J. Data Inf. Sci. 2024, 9, 155–195. [Google Scholar] [CrossRef]
- Al-Smadi, M.; Qawasmeh, O.; Al-Ayyoub, M.; Jararweh, Y.; Gupta, B. Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. J. Comput. Sci. 2018, 27, 386–393. [Google Scholar] [CrossRef]
- Naskath, J.; Sivakamasundari, G.; Begum, A.A.S. A Study on Different Deep Learning Algorithms Used in Deep Neural Nets: MLP SOM and DBN. Wirel. Pers. Commun. 2022, 128, 2913–2936. [Google Scholar] [CrossRef] [PubMed]
- Mammone, A.; Turchi, M.; Cristianini, N. Support vector machines. Wiley Interdiscip. Rev. Comput. Stat. 2009, 1, 283–289. [Google Scholar] [CrossRef]
- Almunirawi, K.M.; Maghari, A.Y.A. A Comparative Study on Serial Decision Tree Classification Algorithms in Text Mining. Int. J. Intell. Comput. Res. 2016, 7, 754–760. [Google Scholar] [CrossRef]
- Yuan, D.; Huang, J.; Yang, X.; Cui, J. Improved random forest classification approach based on hybrid clustering selection. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020. [Google Scholar] [CrossRef]
- Shani, G.; Gunawardana, A. Evaluating Recommendation Systems. In Recommender Systems Handbook; Springer: Boston, MA, USA, 2011; Chapter 8; pp. 257–297. [Google Scholar]
- Grebovic, M.; Filipovic, L.; Katnic, I.; Vukotic, M.; Popovic, T. Overcoming Limitations of Statistical Methods with Artificial Neural Networks. In Proceedings of the 2022 International Arab Conference on Information Technology (ACIT), Abu Dhabi, United Arab Emirates, 22–24 November 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Shi, C.; Wei, B.; Wei, S.; Wang, W.; Liu, H.; Liu, J. A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm. EURASIP J. Wirel. Commun. Netw. 2021, 2021, 31. [Google Scholar] [CrossRef]
- Rezaei, J.; van Wulfften Palthe, L.; Tavasszy, L.; Wiegmans, B.; van der Laan, F. Port performance measurement in the context of port choice: An MCDA approach. Manag. Decis. 2018, 57, 396–417. [Google Scholar] [CrossRef]
- Lauriola, I.; Lavelli, A.; Aiolli, F. An introduction to Deep Learning in Natural Language Processing: Models, techniques, and tools. Neurocomputing 2022, 470, 443–456. [Google Scholar] [CrossRef]
- Truong Ngoc, C.; Le Ngoc, L.; Kim, H.S.; You, S.S. Data analytics and throughput forecasting in port management systems against disruptions: A case study of Busan Port. Marit. Econ. Logist. 2022, 25, 61. [Google Scholar] [CrossRef]
Major Contribution | Industry | Method/Tech. | NLP | TM 2 | WS 3 | S 4 |
---|---|---|---|---|---|---|
A sentiment analysis algorithm was developed to classify user reviews based on word-level emotion. The responses are structured according to rating criteria to help reference, compare and select medical centers [25]. | Healthcare | Sentiment analysis | ✓ | × | ✓ | × |
Analyzes large amounts of data to identify prevalent industries and locations cited in different contexts. Regression models are used to explore patterns in NLP findings and track shifts in attitudes toward big data over time [26]. | Media and communications | Literature review | ✓ | ✓ | × | × |
Presents NLP4Scoping, an innovative tool designed to support scope reviews. The study details the requirements, design and implementation of the tool and illustrates its functionality through a scenario analysis focused on innovation management in digital ecosystems [27]. | Information technology and services | Literature review | ✓ | × | ✓ | × |
Examines the alignment between public perception and academic perspectives on environmental, social and governance (ESG) factors by analyzing global news and academic papers through text mining. It shows that media coverage often mirrors academic findings, enabling companies to better align their strategies with market and societal expectations [28]. | Financial services | BERTopic and topic modeling | ✓ | × | × | ✓ |
Analyzes university slogans from 61 countries within the Belt and Road Initiative and identifies five educational themes. The study highlights the impact of China’s BRI and how COVID-19 has fostered innovation in higher education while emphasizing the need for global cooperation in sustainable development [29]. | Education | NLP and text mining | ✓ | ✓ | ✓ | × |
Proposes a framework that integrates text mining and machine learning to predict technology trends and identify opportunities for innovation in refrigerated containers. The approach includes creating a technology roadmap and leveraging expert opinion to explore advances such as lighter, greener compressors [30]. | Transport and logistics | Text mining and the Latent Dirichlet Allocation (LDA) topic model | ✓ | ✓ | ✓ | × |
Examines challenges in the German wind energy sector, including regulatory barriers and declining investment. Proposes the use of NLP techniques to assess the credibility of the sector amid political controversy and declining validation. It shows that integrating big data governance into business strategies can improve environmental sustainability and reduce environmental impacts [31]. | Electricity | Sentiment analysis and topic modeling | ✓ | × | ✓ | ✓ |
Integrates text mining techniques into transportation infrastructure research, with an emphasis on practical applications and methodological choices. The study acknowledges its limited international scope [32]. | Transport infrastructure | Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) | ✓ | ✓ | × | × |
Examines how text analytics can be used to optimize employer branding decisions and reduce training costs during economic downturns. The study highlights how some companies are prioritizing human capital over financial considerations, aligning with Indian leadership culture values of morality and ethics [33]. | Human resources services | NLP and text mining | ✓ | ✓ | × | × |
Validates a Word2Vec model on a large dataset of English Wikipedia articles, enabling the transformation of words into vectors to measure semantic similarities. This technique enhances innovative data management and semantic analysis applications [34]. | Technology and information services | Cosine similarity and Pearson correlation | ✓ | ✓ | ✓ | × |
Explores thematic modeling as a means to improve business management by generating new theories and concepts from textual data. The study identifies key constructs that enhance understanding of online audiences, consumer behavior, and socio-cultural movements [35]. | Financial services | NLP and topic modeling | ✓ | × | × | × |
Develops a semi-supervised text mining technique for analyzing accident reports. By using domain-specific keywords and topic modeling with minimal expert input, the study demonstrates reduced manual intervention and improved keyword organization into topics [36]. | Aviation and process | Text mining and topic modeling | ✓ | ✓ | × | × |
Enhances pandemic-related text analysis by examining public sentiment and emerging patterns. The study aims to support frontline workers and healthcare professionals by providing rapid analysis of large datasets on public attitudes and trending topics [37]. | Healthcare | Automated LDA for topic modeling, WordCloud and Word2Vec | ✓ | ✓ | × | × |
Description | Hybrid Apch. | Data Types | C 2 | |
---|---|---|---|---|
I 3 | S 4 | |||
Combines business intelligence and NLP for healthcare knowledge management. A case study from a Turkish hospital shows how to classify and analyze digital data [38]. | Bag of Words (BoW) and K-means | Log records detailing digital activities and communications | ✓ | × |
Examines wind energy patents and identifies key terms related to towers, shafts and turbines. Highlights innovations that improve shaft and motor efficiency and streamline assembly processes [39]. | Text mining (TM) and K-means | Documents structural and mechanical component innovations | ✓ | × |
Uses text mining and patent analysis to forecast technological advances in refrigerated container technology. Develops a technology roadmap focusing on compressors, power systems and refrigerants [30]. | Term Frequency–Inverse Document Frequency (TF-IDF), LDA | Patents and scientific articles | ✓ | × |
Introduces a novel text mining framework for patent analysis in Building Information Modeling (BIM) that identifies key applications and trends to drive technological advances and innovation in BIM construction [40]. | TM, Social Network Analysis (SNA) and LDA | Patent documents | ✓ | × |
Utilizes interdisciplinary methods, including clustering analysis and text similarity, to understand contributions to sustainable development goals (SDGs) [41]. | TM and clustering in knowledge maps | Academic papers | ✓ | ✓ |
Uses automated classification and keyword extraction to improve tourism analysis. It uses web diagrams for thematic differentiation and integrates the 7P marketing strategy into co-word analysis [42]. | TM and hierarchical cluster analysis | Publicly communicated | ✓ | ✓ |
Uses semi-supervised ML and NLP to extract and classify sustainability insights from Twitter data, demonstrating applicability to sustainable product innovation [43]. | Support Vector Machines (SVMs), transductive SVM | Tweet ideas and opinions | ✓ | ✓ |
Analyzes large supply chain management datasets to assess the impact of digital transformation. The methodology categorizes strategic information to improve decision-making and promote sustainable practices [44]. | TM techniques, clustering and topic modeling | Journal articles | ✓ | ✓ |
Employs large datasets for energy management and decision-making under uncertainty, using a hybrid approach to categorize and interpret data to support strategic decisions in energy policy and sustainability [45]. | TM, clustering and topic modeling techniques | Journal articles | ✓ | ✓ |
Uses text mining on patent data to analyze wind energy innovations, focusing on critical keywords of turbine components such as towers, shafts and assembly methods. This approach improves strategic management and oversight in the renewable energy sector [39]. | TM and K-means | Patent documents. | ✓ | × |
Develops an automated classification system for tourism to verify consistency, identify keywords, visualize thematic differences with web diagrams, integrate the 7P marketing strategy into the co-word analysis and highlight expert involvement for thematic consistency [42]. | TM and hierarchical K-means clustering analysis | Journal articles | ✓ | ✓ |
Incorporates ethical, demographic and legal dimensions into Iberostar’s environmental management strategy to improve community involvement and meet hospitality standards [46]. | TM and hierarchical cluster analysis | Strategic web texts | × | ✓ |
Examines the circular economy as an alternative to traditional economic models, highlighting its role in environmental sustainability. The study uses modeling techniques to analyze 4488 research articles (2005–2023) to identify trends and research gaps [47]. | TM and LDA | Journal articles | ✓ | ✓ |
Integrates advanced natural language processing, noise-free topic modeling and multidimensional bibliometrics to identify emerging topics in nanomedicine and highlight their transformative impact on diagnostics, therapeutics and regenerative medicine [48] | BERT and Noiseless Latent Dirichlet Allocation (NLDA) | Journal articles | ✓ | ✓ |
Criteria | Traditional Methods | Neural Network Methods |
---|---|---|
Efficiency | Suitable for simple problems that require fewer resources. | They require more resources due to their ability to handle large amounts of data and complex relationships. |
Accuracy | Limited due to reliance on linear relationships and few predictors. | High due to ability to capture complex and nonlinear dynamics. |
Flexibility | Designed for specific data and relationships with limited adaptability. | Highly adaptable to different types of data and patterns. |
Interpretability | Clear and easy to understand, with well-defined relationships between variables. | Less interpretable due to complexity and opaque operations. |
Keywords | Topic 1 | Topic 2 | Topic 3 | Topic 4 |
---|---|---|---|---|
quality | 0.00% | 39.30% | 23.90% | 23.60% |
customers | 0.00% | 38.00% | 61.60% | 49.70% |
company | 0.00% | 22.50% | 20.10% | 41.90% |
companies | 0.00% | 38.00% | 3.10% | 17.60% |
team | 0.00% | 30.90% | 11.90% | 11.50% |
experience | 0.00% | 12.60% | 13.10% | 21.20% |
management | 0.00% | 28.10% | 10.10% | 18.80% |
group | 0.00% | 29.60% | 8.70% | 13.30% |
needs | 0.00% | 9.80% | 22.60% | 12.70% |
offering | 100.00% | 13.80% | 9.80% | 11.90% |
security | 0.00% | 19.70% | 6.20% | 18.80% |
service | 0.00% | 26.70% | 35.70% | 33.40% |
services | 0.00% | 22.50% | 43.90% | 31.50% |
solutions | 0.00% | 19.70% | 20.10% | 14.60% |
transportation | 0.00% | 4.20% | 22.60% | 28.50% |
Errors | RNN | MLP | RF | SVM | DT | Hybrid |
---|---|---|---|---|---|---|
RMSE | 1.605 | 0.174 | 0.179 | 0.311 | 1.606 | 0.596 |
MAE | 0.758 | 0.030 | 0.032 | 0.097 | 0.387 | 0.161 |
MDAE | 0.00 | 0.000 | 0.0 | 0.0 | 0.0 | 0.0 |
Technique | RNN | MLP | RF | SVM | DT | Hybrid |
---|---|---|---|---|---|---|
Precision | 0.807 | 0.977 | 0.0359 | 0.0361 | 0.7717 | 0.922 |
Recall | 0.727 | 0.967 | 0.1895 | 0.1895 | 0.7712 | 0.903 |
F1 Score | 0.697 | 0.971 | 0.0604 | 0.0607 | 0.7694 | 0.901 |
Global Precision | 0.727 | 0.970 | 0.1895 | 0.1895 | 0.771 | 0.903 |
Strategic Decision | Keywords | Text Sources |
---|---|---|
It supports recommendation systems and information platforms that enhance medical center selection by providing reliable and up-to-date data [25]. | Charge, service, bill, price, hospital. | User reviews of medical services on online platforms. |
It explores how interpreting market dynamics and consumer trends through comprehensive data analytics can inform business strategies [26]. | Big data, executives, computer software, business analytics. | News accessible through LexisNexis® Academic. |
Organizing and visualizing large amounts of academic information is critical for quickly identifying emerging areas of research and addressing knowledge gaps [27]. | Business model, digital platforms, firm, management, government. | Journal articles and conference proceedings from SCOPUS databases. |
Facilitates corporate and investor access to critical information on how environmental, social and governance (ESG) issues affect long-term shareholder value and sustainability [28]. | Industry, emission, trade, business, governance. | News and articles from LexisNexis® and Web of Science. |
Supports proactive decision-making processes aimed at improving safety, service quality and customer satisfaction [36]. | Traffic control, flight en-route, turbulence, level flight, cruise climb. | Accident reports, manuals, glossaries and Wikipedia. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Durán, C.; Fernández-Campusano, C.; Espinosa-Leal, L.; Castañeda, C.; Carrillo, E.; Bastias, M.; Villagra, F. Exploring Boost Efficiency in Text Analysis by Using AI Techniques in Port Companies. Appl. Sci. 2025, 15, 4556. https://doi.org/10.3390/app15084556
Durán C, Fernández-Campusano C, Espinosa-Leal L, Castañeda C, Carrillo E, Bastias M, Villagra F. Exploring Boost Efficiency in Text Analysis by Using AI Techniques in Port Companies. Applied Sciences. 2025; 15(8):4556. https://doi.org/10.3390/app15084556
Chicago/Turabian StyleDurán, Claudia, Christian Fernández-Campusano, Leonardo Espinosa-Leal, Cristóbal Castañeda, Eduardo Carrillo, Marcelo Bastias, and Felipe Villagra. 2025. "Exploring Boost Efficiency in Text Analysis by Using AI Techniques in Port Companies" Applied Sciences 15, no. 8: 4556. https://doi.org/10.3390/app15084556
APA StyleDurán, C., Fernández-Campusano, C., Espinosa-Leal, L., Castañeda, C., Carrillo, E., Bastias, M., & Villagra, F. (2025). Exploring Boost Efficiency in Text Analysis by Using AI Techniques in Port Companies. Applied Sciences, 15(8), 4556. https://doi.org/10.3390/app15084556