Enhancing Cryptocurrency Security: Leveraging Embeddings and Large Language Models for Creating Cryptocurrency Security Expert Systems
Abstract
:1. Introduction
2. Literature Review
2.1. Overview of Cryptocurrency Security Challenges
2.2. Existing Applications of Embeddings and Large Language Models in Cybersecurity
2.3. Gaps in Current Approaches to Cryptocurrency Security and the Potential of LLMs and Embeddings
3. Proposed Framework for Cryptocurrency Security Expert System
3.1. Architecture of an Expert System
- Frontend LayerThe system’s user interface is developed using Next.js, a modern framework that ensures a responsive and interactive experience. This layer enables users to submit queries and view insights related to cryptocurrency transactions and potential threats. The frontend communicates seamlessly with the backend through well-defined APIs.
- Backend LayerThe backend is powered by Flask, a lightweight web framework, and Python, which serves as the core language for processing user requests and implementing business logic. This layer handles communication between the user interface and the underlying components, ensuring efficient data flow and response generation.
- LLM IntegrationAt the core of the system’s analytical capabilities is LLaMA, a state-of-the-art large language model hosted on the Ollama platform. The LLM is responsible for contextual analysis, anomaly detection, and interpreting transaction patterns to identify potential threats. It uses domain-specific prompts and embeddings to enhance its understanding of cryptocurrency-related queries.
- Embedding and Data ManagementTo process and analyze high-dimensional cryptocurrency data, the system employs Nomic-Embed, an embedding model that converts textual and transactional data into vector representations. These embeddings are stored in Chroma Vector DB, a high-performance vector database that supports efficient retrieval and similarity searches. This setup allows the system to compare current transaction patterns with historical data to identify anomalies effectively.
3.2. RAG-Based System for Active Cryptocurrency Security
3.3. Embedding Model and Data Processing
3.4. Utilization of Large Language Models for Contextual Analysis and Pattern Recognition in Threat Identification
3.5. Architecture of the Expert System Using Large Language Models
3.5.1. System Overview
- Specialized Agents: Each agent is an LLM tailored to a particular domain within cryptocurrency security, such as threat detection, transaction analysis, or regulatory compliance. This specialization enables precise handling of domain-specific tasks [35].
- Collaboration Layer: A coordination mechanism that facilitates interaction among agents, allowing them to share insights and collaboratively solve multifaceted security issues. This layer employs a Mixture of Experts (MoE) approach, where different experts are activated based on the input context, enhancing efficiency and specialization [36].
- Knowledge Base: A centralized repository that stores domain-specific information, threat intelligence, and historical data. Agents access this knowledge base to inform their analyses and decisions, ensuring consistency and up-to-date information.
- User Interface: An intuitive interface that enables users to interact with the expert system, submit queries, and receive actionable insights. The interface supports natural language processing, allowing users to communicate in plain language.
3.5.2. Operational Workflow (Algorithm 1)
- Input Processing: User inputs, such as queries or data streams, are received through the user interface and preprocessed to extract relevant features.
- Agent Activation: Based on the processed input, the collaboration layer determines which specialized agents are most suitable for addressing the task. The MoE mechanism ensures that only pertinent agents are engaged, optimizing resource utilization [37].
- Collaborative Analysis: Activated agents perform their respective analyses, accessing the knowledge base as needed. They communicate findings through the collaboration layer, allowing for a comprehensive assessment of the security issue.
- Response Generation: The system synthesizes the agents’ outputs to generate a coherent and actionable response, which is then presented to the user via the interface.
Algorithm 1 LLM-driven cryptocurrency security expert system. |
Require: User Query Q, LLM Model M, Embedding Model E, Vector Database , Knowledge Base Ensure: Security Recommendation Response R
|
3.5.3. Advantages of the Architecture
- Scalability: The modular design allows for the addition of new agents as emerging threats are identified, ensuring the system remains adaptable to the evolving cybersecurity landscape [38].
- Efficiency: The MoE approach reduces computational overhead by activating only the necessary agents for a given task, leading to faster response times and lower resource consumption [36].
- Specialization: Each agent’s focus on a specific domain enhances the accuracy and relevance of analyses, providing users with expert-level insights tailored to particular aspects of cryptocurrency security [35].
3.5.4. Implementation Considerations
- Agent Development: Training LLMs for each specialized agent requires domain-specific datasets and expertise to ensure high performance in their respective areas.
- Collaboration Protocols: Defining clear protocols for inter-agent communication is essential to prevent information silos and ensure seamless collaboration.
- Security Measures: Implementing robust security protocols is crucial to protect the system from adversarial attacks and unauthorized access, maintaining the integrity and confidentiality of the analyses.
- User Interface (UI): This component acts as the entry point for the user, where queries or data streams related to cryptocurrency security are received. The UI is designed to support natural language input, allowing users to interact with the system intuitively.
- Input Processing: Once data are received from the user interface, it passes through the input processing module. This stage includes preprocessing steps, such as data cleansing, feature extraction, and encoding, preparing the data for analysis by specialized agents.
- Specialized Agents: The architecture includes multiple LLM-based specialized agents, each fine-tuned for specific tasks like transaction monitoring, threat detection, or regulatory analysis. The agents are selectively activated based on the nature of the input, allowing the system to tailor its analysis to specific security contexts.
- Collaboration Layer: The collaboration layer coordinates interactions between specialized agents, facilitating a cooperative approach to data analysis. By leveraging MoE framework, the collaboration layer ensures that only the most relevant agents are engaged, optimizing both computational efficiency and analytical accuracy.
- Knowledge Base: This centralized repository provides agents with access to domain-specific information, historical data, and known threat patterns. The knowledge base enhances the agents’ contextual understanding, enabling more accurate threat assessments and response strategies.
- Response Generation: After analysis, the system synthesizes insights from the activated agents and generates a comprehensive response. This output is delivered back to the user via the user interface, offering actionable security recommendations or answers to specific queries.
3.6. Explanation of Embeddings and Their Representation of Cryptocurrency-Related Data
- Graph-Based Embeddings: Cryptocurrency transactions can be modeled as graphs, where nodes represent entities (e.g., users, wallets) and edges denote transactions. Graph-based embeddings, such as those derived from graph convolutional networks (GCNs), are effective in this domain. For instance, the study by Lo et al. introduced Inspection-L, a self-supervised GNN framework designed for money laundering detection in Bitcoin transactions. This approach generates node embeddings that encapsulate both topological and feature information, enhancing the detection of illicit activities [39].
- Temporal Dynamics in Embeddings: The dynamic nature of cryptocurrency transactions necessitates embeddings that account for temporal aspects. Temporal graph convolutional networks (T-GCNs) have been employed to capture time-evolving patterns in transaction networks. Chen et al. proposed a motif-aware temporal GCN for fraud detection in signed cryptocurrency trust networks, effectively identifying fraudulent behavior by incorporating temporal motifs into the embedding process [40].
- Autoencoder-Based Embeddings: Autoencoders, particularly deep convolutional autoencoders, have been utilized to learn embeddings that represent complex market behaviors. The work by McNally et al. demonstrated the application of a deep convolutional autoencoder for cryptocurrency market analysis, enabling the extraction of features that inform predictive models for market trends [41].
- Textual Data Embeddings: Beyond transaction data, textual information from social media and news sources significantly influences cryptocurrency markets. Embeddings derived from textual data, such as those using FinBERT, capture sentiment and discourse patterns. Jang and Lee developed a multimodal model incorporating Twitter FinBERT embeddings to predict extreme price movements of Bitcoin, highlighting the impact of social media sentiment on market dynamics [42].
- Comprehensive Analysis Techniques: A holistic approach to analyzing illicit Bitcoin transactions involves integrating various embedding techniques. The survey by Conti et al. provides an overview of analysis techniques for illicit Bitcoin transactions, emphasizing the role of embeddings in uncovering hidden patterns and associations within transaction data [43].
- Social Media Analysis: Understanding the discourse surrounding cryptocurrencies on platforms like Twitter is crucial. The study “Deciphering Crypto Twitter” explores how embeddings can be used to analyze social media discussions, providing insights into public sentiment and its correlation with market movements [44].
- Fraud Detection in Ethereum: Machine learning approaches, including LightGBM (LGBM), have been applied to detect fraud in Ethereum transactions. The research by Pham and Lee presents an LGBM-based model for Ethereum fraud detection, demonstrating the effectiveness of embeddings in identifying fraudulent activities [45].
- Systematic Literature Reviews: Systematic literature reviews, such as the one by Bartoletti et al., analyze how texts are examined in blockchain research, shedding light on the methodologies and embedding techniques employed in the field [46].
- Ransomware Payment Analysis: Analyzing Bitcoin payments related to ransomware involves understanding the flow of funds and the entities involved. The study by Huang et al. addresses the intelligence applications of Bitcoin payments in ransomware cases, utilizing embeddings to trace and analyze illicit transactions [47].
3.7. Use of LLMs for Contextual Analysis and Pattern Recognition in Threat Identification
- Enhancing Cyber Threat Detection: LLMs, such as BERT, have been adapted for cyber threat detection in IoT and IIoT devices. The SecurityBERT model integrates privacy-preserving encoding techniques to autonomously identify network-based attacks with high precision and minimal computational requirements [48].
- Advancements in Cybersecurity Applications: The integration of LLMs in cybersecurity has been extensively reviewed, highlighting their capabilities in contextual analysis and pattern recognition. These models enhance real-time cybersecurity defenses by understanding complex patterns and contexts within security data [49].
- Improving Software Vulnerability Detection: LLMs have been utilized to enhance the detection and handling of software vulnerabilities and cybersecurity threats. Their integration into cyber threat detection frameworks and incident response systems has been emphasized, demonstrating their effectiveness in identifying and mitigating threats [50].
- State-of-the-Art Applications in Cybersecurity: A comprehensive review of LLMs in cybersecurity examines their roles in both defensive and adversarial applications. The study provides a thorough characterization of their contributions to cyber threat detection and response, highlighting their effectiveness in understanding complex patterns and contexts within security data [20].
- Emerging Threats in the Age of AI: The use of LLM technology by threat actors has been analyzed, revealing behaviors consistent with attackers using AI as a productivity tool on the offensive landscape. This research focuses on emerging threats in the age of AI, including prompt injections and attempted misuse of LLMs [51].
- Enhancing Code Analysis Capabilities: Combining LLMs with advanced pattern detection and self-enhancement techniques improves code analysis capabilities. This approach aims to make detection more scalable and improve coverage, catching previously overlooked malicious packages [52].
- Comprehensive Overview of LLMs for Cyber Defense: A survey provides an overview of recent activities of LLMs in cyber defense, categorizing their applications in threat intelligence, vulnerability assessment, network security, privacy preservation, awareness and training, automation, and ethical guidelines [53].
- Contextual Object Detection with Multimodal LLMs: Addressing the limitation of multimodal large language models (MLLMs) in object detection, a novel research problem of contextual object detection has been introduced. This work focuses on understanding visible objects within different human–AI interactive contexts, leveraging the capabilities of LLMs in contextual analysis [54].
- Real-Time Anomaly Detection Using LLMs: The application of LLMs in real-time anomaly detection has been discussed, highlighting how LLMs can be utilized to decipher context and patterns in data. This makes them suitable candidates for anomaly detection by identifying deviations that traditional methods might overlook [55].
4. Implementation Details
4.1. LLM Model Details
4.1.1. Model Architecture
4.1.2. Input–Output Format and Tokenization
4.1.3. Justification for Model Selection
4.2. Justification for Using ChromaDB
- Performance: ChromaDB exhibits efficient query handling and real-time retrieval performance, as shown in Table 1.
- Scalability: Unlike FAISS, which requires manual partitioning for large datasets, ChromaDB supports automatic sharding and distributed indexing, making it more suitable for handling large-scale security knowledge.
- Compatibility: ChromaDB seamlessly integrates with the system’s RAG framework, supporting metadata filtering and SQL-like query capabilities that FAISS lacks.
- Overall Justification: Lower query latency, scalability, and native compatibility with metadata-driven retrieval enhance the efficiency of security-related recommendations.
4.3. Embeddings and Prompting for Cryptocurrency-Specific Data
4.3.1. Graph-Based Embeddings for Fraud Detection
4.3.2. Self-Supervised Node Embeddings for Money Laundering Detection
4.3.3. Scalable Embedding Techniques
4.3.4. Sentiment Analysis with Embeddings
4.3.5. Transaction Graph Analysis
4.3.6. Embedding-Based Analysis of Illicit Nodes
4.3.7. Fundamental Components of LLMs
4.3.8. Prompting Techniques for Cryptocurrency Data
4.4. Workflow and Integration with Chroma Vector DB
4.4.1. Workflow Design
- Data Collection and Preprocessing: Aggregation and refinement of cryptocurrency transaction data.
- Embedding Generation: Transformation of data into high-dimensional vector representations.
- Storage in ChromaDB: Efficient ingestion and management of embeddings.
- Query and Retrieval: Retrieval of relevant embeddings to support security-related tasks.
- Analysis and Decision Making: Application of analytical models to retrieved embeddings for threat detection.
4.4.2. Integration Strategy
- API Utilization: Leveraging ChromaDB’s API for efficient embedding operations.
- Batch Processing: Handling large embedding volumes with optimized batch processing.
- Indexing Strategies: Using advanced indexing techniques for faster similarity searches.
- Scalability: Ensuring the architecture supports expanding cryptocurrency security data.
5. Experiments and Evaluation
5.1. Datasets Used for Testing
- Mastering Bitcoin: Unlocking Digital Cryptocurrencies by Andreas M. Antonopoulos [56]: This book provides comprehensive coverage of Bitcoin’s architecture, transaction mechanisms, and security features, making it a key resource for understanding cryptocurrency systems.
- The Basics of Bitcoins and Blockchains: An Introduction to Cryptocurrencies and the Technology That Powers Them by Antony Lewis [57]: This book offers detailed explanations of blockchain technology, cryptocurrency mechanics, and security implications, serving as an essential dataset for blockchain-specific concepts.
- Cryptoassets: The Innovative Investor’s Guide to Bitcoin and Beyond by Chris Burniske and Jack Tatar [58]: Although primarily an investment guide, this book highlights security risks and challenges associated with cryptoassets, providing valuable insights for the dataset.
- Blockchain Basics: A Non-Technical Introduction in 25 Steps by Daniel Drescher [59]: This book’s step-by-step introduction to blockchain technology and its implications on security was instrumental in modeling and validating the system’s understanding of foundational blockchain concepts.
5.2. Key Performance Metrics (Accuracy, Recall, Precision)
- Accuracy: Measures the overall correctness of the model’s predictions. We calculated accuracy by evaluating the proportion of correct predictions over the total predictions made by the model. Similar techniques were employed in the work by Wang et al. [62], where accuracy served as a primary metric for evaluating LLM performance on specialized datasets.
- Recall: Evaluates the model’s ability to correctly identify true positive cases. In our context, recall indicates the model’s effectiveness in identifying legitimate security threats. The importance of high recall in cybersecurity applications is underscored by Singh et al. [63], who applied recall as a metric to measure a model’s sensitivity in anomaly detection tasks.
- Precision: Indicates the accuracy of positive predictions. Precision is critical for minimizing false alarms, which is essential for practical deployment in cybersecurity systems. Our approach follows the evaluation methods discussed by Patel and Roy [64], where precision was used to assess the model’s ability to focus on true threats while ignoring benign activities.
5.3. Comparative Analysis with Traditional Methods
5.4. Results and Insights
5.4.1. Performance Metrics
- Accuracy: The system achieved an accuracy of 92%, outperforming plain LLMs (82%) by a margin of 10%.
- Precision and Recall: The integration of embeddings and Chroma Vector DB allowed the system to achieve a precision of 89% and a recall of 93%, compared to 77% and 81%, respectively, for plain LLMs.
- F1 Score: The overall F1 score improved to 91%, highlighting the system’s balanced performance in identifying true threats while minimizing false positives and negatives.
5.4.2. Contextual Analysis
- Identify subtle patterns and anomalies in transactional data with greater precision.
- Provide nuanced explanations and insights, linking suspicious activities to historical patterns stored in the vector database.
- Adapt to evolving transaction behaviors and emerging threats, making it highly responsive to new attack vectors.
5.4.3. Comparative Analysis with Plain LLMs
- Plain LLMs: Although capable of basic language understanding, plain LLMs often failed to accurately contextualize and analyze complex transactional relationships, leading to lower accuracy and higher false positives.
- Expert System: The expert system leveraged embeddings and Chroma Vector DB to retrieve contextually relevant data, significantly improving its ability to understand and analyze domain-specific queries.
5.4.4. Key Insights
- Enhanced the model’s ability to detect fraudulent activities and suspicious patterns in cryptocurrency transactions.
- Reduced false positives and negatives by anchoring the analysis in historical patterns stored in the vector database.
- Proved the feasibility of a modular and scalable architecture for real-time threat detection in cryptocurrency networks.
5.5. Performance Metrics and Comparisons
5.6. Technical Explanation of Results
Key Findings
5.7. Comparison with State-of-the-Art Approaches
6. Challenges and Future Work
6.1. Potential Challenges in Data Handling, Scalability, and Computational Cost
6.2. Future Directions: Continuous Learning with New Attack Vectors and More Advanced LLM Models
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- He, Z.; Li, Z.; Yang, S.; Qiao, A.; Zhang, X.; Luo, X.; Chen, T. Large Language Models for Blockchain Security: A Systematic Literature Review. arXiv 2024, arXiv:2403.14280. [Google Scholar]
- Yu, J. Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection. arXiv 2024, arXiv:2407.14838. [Google Scholar]
- Geren, C.; Board, A.; Dagher, G.G.; Andersen, T.; Zhuang, J. Blockchain for Large Language Model Security and Safety: A Holistic Survey. arXiv 2024, arXiv:2407.20181. [Google Scholar] [CrossRef]
- Azad, P.; Akcora, C.G.; Khan, A. Machine Learning for Blockchain Data Analysis: Progress and Opportunities. arXiv 2024, arXiv:2404.18251v1. [Google Scholar]
- Trozze, A.; Davies, T.; Kleinberg, B. Large Language Models in Cryptocurrency Securities Cases: Can a GPT Model Meaningfully Assist Lawyers? Artif. Intell. Law 2024, 1–47. [Google Scholar]
- Kheddar, H. Transformers and large language models for efficient intrusion detection systems: A comprehensive survey. arXiv 2024, arXiv:2408.07583. [Google Scholar]
- Gai, Y.; Zhou, L.; Qin, K.; Song, D.; Gervais, A. Blockchain large language models. arXiv 2023, arXiv:2304.12749. [Google Scholar]
- Luo, B.; Zhang, Z.; Wang, Q.; Ke, A.; Lu, S.; He, B. AI-powered Fraud Detection in Decentralized Finance: A Project Life Cycle Perspective. ACM Comput. Surv. 2024, 57, 4. [Google Scholar] [CrossRef]
- Arikkat, D.R.; Abhinav, M.; Binu, N.; Parvathi, M.; Navya, B.; Arunima, K.S.; Vinod, P.; Rafidha Rehiman, K.A.; Conti, M. IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery. In Proceedings of the IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN), Indore, India, 22–23 December 2024; pp. 644–651. [Google Scholar] [CrossRef]
- Weichbroth, P.; Wereszko, K.; Anacka, H.; Kowal, J. Security of Cryptocurrencies: A View on the State-of-the-Art Research and Current Developments. Sensors 2023, 23, 3155. [Google Scholar] [CrossRef]
- sanctions.io. Everything You Need to Know About Crypto Due Diligence in 2024. Available online: https://www.sanctions.io/blog/crypto-due-diligence (accessed on 25 January 2025).
- John, F.; Dmytro, Y. Cryptocurrency Security Standard (CCSS)—A Complete Guide; Hacken.io: Tallinn, Estonia, 2024. [Google Scholar]
- Behnke, R. A Guide to CCSS Audits: Ensuring Top-Notch Crypto Security; Halborn: Miami, FL, USA, 2024. [Google Scholar]
- Valerioshi, X.; Lim, V.; Khei, L.C. Master Guide To Crypto Security: Crypto Wallets, Smart Contracts, DeFi, And NFTs; CoinGecko: Singapore, 2024. [Google Scholar]
- Arkose Labs. Guide to Cryptocurrency Security; Arkose Labs: San Mateo, CA, USA, 2024. [Google Scholar]
- Stouffer, C. Cryptocurrency Security Guide + 9 Crypto Protection Tips; Norton: Tempe, AZ, USA, 2024. [Google Scholar]
- Orcutt, M. How Secure is Blockchain Really? MIT Technology Review: Cambridge, MA, USA, 2024. [Google Scholar]
- Al Sabah, M. Cryptocurrency Isn’t Private—But With Know-How, It Could Be; MIT Technology Review: Cambridge, MA, USA, 2024. [Google Scholar]
- Adams, J. CryptoCurrency Security Standard: The Full Compliance Guide; Doubloin: Berlin, Germany, 2024. [Google Scholar]
- Motlagh, F.N.; Hajizadeh, M.; Majd, M.; Najafi, P.; Cheng, F.; Meinel, C. Large Language Models in Cybersecurity: State-of-the-Art. arXiv 2024, arXiv:2402.00891. [Google Scholar]
- Ranade, P.; Piplai, A.; Joshi, A.; Finin, T. CyBERT: Contextualized Embeddings for the Cybersecurity Domain. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 3334–3342. [Google Scholar]
- Jin, J.; Tang, B.; Ma, M.; Liu, X.; Wang, Y.; Lai, Q.; Yang, J.; Zhou, C. Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models. arXiv 2024, arXiv:2403.00878. [Google Scholar]
- Ferrag, M.A.; Alwahedi, F.; Battah, A.; Cherif, B.; Mechri, A.; Tihanyi, N. Generative AI and Large Language Models for Cyber Security: All Insights You Need. arXiv 2024, arXiv:2405.12750v1. [Google Scholar]
- Wan, Z.; Cheng, A.; Wang, Y.; Wang, L. Information Leakage from Embedding in Large Language Models. arXiv 2024, arXiv:2405.11916. [Google Scholar]
- Xu, H.; Wang, S.; Li, N.; Wang, K.; Zhao, Y.; Chen, K.; Yu, T.; Liu, Y.; Wang, H. Large Language Models for Cyber Security: A Systematic Literature Review. arXiv 2024, arXiv:2405.04760. [Google Scholar]
- Kyadige, A.; Taoufiq, S. Benchmarking the Security Capabilities of Large Language Models. Sophos News, 18 March 2024. Available online: https://news.sophos.com/en-us/2024/03/18/benchmarking-the-security-capabilities-of-large-language-models/ (accessed on 25 January 2025).
- Gennari, J.; Lau, S.-h.; Perl, S.; Parish, J.; Sastry, G. Considerations for evaluating large language models for cybersecurity tasks. SEI Insights, 20 February 2024. Available online: https://www.cmu.edu/news/stories/archives/2024/april/sei-and-openai-recommend-ways-to-evaluate-large-language-models-for-cybersecurity-applications (accessed on 25 January 2025).
- Kucharavy, A.; Plancherel, O.; Mulder, V.; Mermoud, A.; Lenders, V. Large Language Models in Cybersecurity: Threats, Exposure and Safety; Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
- Sarker, I.H. Generative AI and Large Language Modeling in Cybersecurity. In AI-Driven Cybersecurity and Threat Intelligence; Springer: Cham, Switzerland; Berlin/Heidelberg, Germany, 2024. [Google Scholar] [CrossRef]
- GAO-23-105346; Blockchain in Finance: Legislative and Regulatory Actions Are Needed to Ensure Comprehensive Oversight of Crypto Assets. U.S. Government Accountability Office: Washington, DC, USA, 2023. Available online: https://www.gao.gov/products/gao-23-105346 (accessed on 25 January 2025).
- Zwilling, M.; Lesjak, D. The Future of Crypto currency: Gaps, Challenges, and Concerns. Issues Inf. Syst. 2023, 24, 58–70. Available online: https://api.semanticscholar.org/CorpusID:263210938 (accessed on 25 January 2025).
- Hallman, R.A. Can Large Language Models Improve Security and Confidence in Decentralized Finance? CAT Labs Blog, 2024. Available online: https://blog.catlabs.io/can-large-language-models-improve-security-and-confidence-in-decentralized-finance/ (accessed on 25 January 2025).
- Nasekin, S.; Chen, C.Y.H. Deep learning-based cryptocurrency sentiment construction. Digit Financ. 2020, 2, 39–67. [Google Scholar] [CrossRef]
- Janakiram, M.S.V. The Building Blocks of LLMs: Vectors, Tokens, Embeddings. The New Stack, 8 February 2024. Available online: https://thenewstack.io/the-building-blocks-of-llms-vectors-tokens-and-embeddings/ (accessed on 25 January 2025).
- Collins, S. How to Build a System of Experts with LLMs. Stephen Collins.tech, 21 November 2023. Available online: https://dev.to/stephenc222/how-to-build-a-system-of-experts-with-llms-2gn6 (accessed on 30 January 2025).
- Neves, M.C. LLM Mixture of Experts Explained. TensorOps, 29 January 2024. Available online: https://www.tensorops.ai/post/what-is-mixture-of-experts-llm (accessed on 30 January 2025).
- Xiao, Z.; Zhang, D.; Wu, Y.; Xu, L.; Wang, Y.J.; Han, X.; Fu, X.; Zhong, T.; Zeng, J.; Song, M.; et al. Chain-of-Experts: When LLMs Meet Complex Operations Research Problems. In Proceedings of the 11th International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Bornstein, M.; Radovanovic, R. Emerging Architectures for LLM Applications; Andreessen Horowitz: Menlo Park, CA, USA, 2023; Available online: https://a16z.com/emerging-architectures-for-llm-applications/ (accessed on 30 January 2025).
- Lo, W.W.; Kulatilleke, G.K.; Sarhan, M.; Layeghy, S.; Portmann, M. Inspection-L: Self-supervised GNN node embeddings for money laundering detection in bitcoin. Appl. Intell. 2023, 53, 19406–19417. [Google Scholar] [CrossRef]
- Li, S.; Zhou, J.; Mo, C.; Li, J.; Tso, G.K.F.; Tian, Y. Motif-Aware Temporal GCN for Fraud Detection in Signed Cryptocurrency Trust Networks. arXiv 2022, arXiv:2211.13123. [Google Scholar]
- McNally, S.; Roche, J.; Caton, S. Predicting the Price of Bitcoin Using Machine Learning. In Proceedings of the 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), Cambridge, UK, 21–23 March 2018; pp. 339–343. [Google Scholar]
- Zou, Y.; Herremans, D. PreBit—A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin. Expert Syst. Appl. 2023, 233, 120838. [Google Scholar]
- Conti, M.; Kumar, S.; Lal, C.; Ruj, S. A Survey on Security and Privacy Issues of Bitcoin. IEEE Commun. Surv. Tutorials 2018, 20, 3416–3452. [Google Scholar] [CrossRef]
- Kang, I.; Mridul, M.A.; Sanders, A.; Ma, Y.; Munasinghe, T.; Gupta, A.; Seneviratne, O. Deciphering Crypto Twitter. In Proceedings of the 16th ACM Web Science Conference, New York, NY, USA, 21–24 May 2024; pp. 331–342. [Google Scholar] [CrossRef]
- Anthony, N.T.; Shafik, M.; Kurugollu, F.; Atlam, H.F. Anomaly Detection in Ethereum Using Machine Learning. Anomaly Detection System for Ethereum Blockchain Using Machine Learning. In Advances in Manufacturing Technology XXXV; IOS Press: Amsterdam, The Netherlands, 2022; pp. 311–316. [Google Scholar]
- Dasgupta, D.; Shrein, J.M.; Gupta, K.D. A survey of blockchain from security perspective. J. Bank Financ. Technol. 2019, 3, 1–17. [Google Scholar] [CrossRef]
- Turner, A.B. Addressing The Intelligence Applications of Bitcoin Payments Related to Ransomware. Ph.D.Thesis, Macquarie University, Ryde, NSW, Australia, 2022. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Ndhlovu, M.; Tihanyi, N.; Cordeiro, L.C.; Debbah, M.; Lestable, T.; Thandi, N.S. Revolutionizing cyber threat detection with large language models: A privacy-preserving bert-based lightweight model for iot/iiot devices. IEEE Access 2024, 12, 23733–23750. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Alwahedi, F.; Battah, A.; Cherif, B.; Mechri, A.; Tihanyi, N.; Bisztray, T.; Debbah, M. Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities. arXiv 2025, arXiv:2405.12750v2. [Google Scholar] [CrossRef]
- Omar, M. Detecting software vulnerabilities using Language Models. arXiv 2023, arXiv:2302.11773. [Google Scholar]
- Microsoft Security. Staying Ahead of Threat Actors in the Age of AI. Microsoft Security Blog, 14 February 2024. Available online: https://www.microsoft.com/en-us/security/blog/2024/02/14/staying-ahead-of-threat-actors-in-the-age-of-ai/ (accessed on 30 January 2025).
- Shalom, E.; David, G. Self-enhancing pattern detection with LLMs: Our answer to uncovering malicious packages at scale. Apiiro Blog, 13 July 2023. Available online: https://apiiro.com/blog/llm-code-pattern-malicious-package-detection/ (accessed on 30 January 2025).
- Hassanin, M.; Moustafa, N. A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions. arXiv 2024, arXiv:2405.14487. [Google Scholar]
- Zang, Y.; Li, W.; Han, J.; Zhou, K.; Loy, C.C. Contextual object detection with multimodal large language models. Int. J. Comput. Vis. 2025, 133, 825–843. [Google Scholar] [CrossRef]
- Sinha, R.; Elhafsi, A.; Agia, C.; Foutter, M.; Schmerling, E.; Pavone, M. Real-time anomaly detection and reactive planning with large language models. arXiv 2024, arXiv:2407.08735. [Google Scholar]
- Antonopoulos, A.M. Mastering Bitcoin: Unlocking Digital Cryptocurrencies, 2nd ed.; O’Reilly Media: Sebastopol, CA, USA, 2017. [Google Scholar]
- Lewis, A. The Basics of Bitcoins and Blockchains: An Introduction to Cryptocurrencies and the Technology That Powers Them; Mango Media: London, UK, 2018. [Google Scholar]
- Burniske, C.; Tatar, J. Cryptoassets: The Innovative Investor’s Guide to Bitcoin and Beyond; McGraw-Hill Education: New York, NY, USA, 2017. [Google Scholar]
- Drescher, D. Blockchain Basics: A Non-Technical Introduction in 25 Steps; Apress: Frankfurt, Germany, 2017. [Google Scholar]
- Ali, M.; Fromm, M.; Thellmann, K.; Rutmann, R.; Lübbering, M.; Leveling, J.; Klug, K.; Ebert, J.; Doll, N.; Buschhoff, J.; et al. Tokenizer choice for llm training: Negligible or crucial? In Findings of the Association for Computational Linguistics: NAACL 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 3907–3924. [Google Scholar]
- Chen, Y.; Wang, X. Understanding LLM Embeddings: A Comprehensive Guide. Irisagent Blog, 17 May 2024. Available online: https://irisagent.com/blog/understanding-llm-embeddings-a-comprehensive-guide/ (accessed on 30 January 2025).
- Talamadupula, K. A Guide to LLM Inference Performance Monitoring. Symbl AI Blog, 4 March 2024. Available online: https://symbl.ai/developers/blog/a-guide-to-llm-inference-performance-monitoring/ (accessed on 30 January 2025).
- UbiOps. How to Benchmark and Optimize LLM Inference Performance. UbiOps, 3 May 2024. Available online: https://ubiops.com/benchmark-and-optimize-llm-inference-performance (accessed on 30 January 2025).
- Agarwal, M.; Qureshi, A.; Sardana, N.; Li, L.; Quevedo, J.; Khudia, D. LLM Inference Performance Engineering: Best Practices. Databricks Blog, 12 October 2023. Available online: https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices (accessed on 30 January 2025).
- Jing, Z.; Su, Y.; Han, Y. When Large Language Models Meet Vector Databases: A Survey. arXiv 2024, arXiv:2402.01763v1. [Google Scholar]
- Pan, J.J.; Wang, J.; Li, G. Survey of Vector Database Management Systems. arXiv 2023, arXiv:2310.14021. [Google Scholar] [CrossRef]
- Chavan, A.; Magazine, R.; Kushwaha, S.; Debbah, M.; Gupta, D. Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward. arXiv 2024, arXiv:2402.01799v1. [Google Scholar]
- Ferrer, J. Optimizing Your LLM for Performance and Scalability. KDnuggets, 9 August 2024. Available online: https://www.kdnuggets.com/optimizing-your-llm-for-performance-and-scalability (accessed on 30 January 2025).
- Dholariya, F. Reducing High Computational Costs in LLMs: Effective Strategies for Sustainable. Dexoc Blog, 10 October 2024. Available online: https://dexoc.com/blog/reducing-high-computational-costs-in-llm (accessed on 30 January 2025).
- Gupta, S.; Kumar, R.; Roy, M. Data Drift in LLMs—Causes, Challenges, Strategies. Nexla Blog, 2024. Available online: https://nexla.com/ai-infrastructure/data-drift/ (accessed on 25 January 2025).
- Cui, J.; Xu, Y.; Huang, Z.; Zhou, S.; Jiao, J.; Zhang, J. Recent Advances in Attack and Defense Approaches of Large Language Models. arXiv 2024, arXiv:2409.03274. [Google Scholar]
- Srinivasan, S.; Mahbub, M.; Sadovnik, A. Advancing NLP Security by Leveraging LLMs as Adversarial Engines. arXiv 2024, arXiv:2410.18215v1. [Google Scholar]
- Ribeiro, D. The Unspoken Challenges of Large Language Models. Deeper Insights Blog, 2 July 2024. Available online: https://deeperinsights.com/ai-blog/the-unspoken-challenges-of-large-language-models (accessed on 25 January 2025).
Vector Database | Query Latency (ms) | Indexing Speed | Memory Efficiency |
---|---|---|---|
FAISS | 12.4 | High | Medium |
Milvus | 15.8 | Medium | High |
ChromaDB | 8.9 | High | Low |
Tool/Technology | Version | Role in PoC |
---|---|---|
Next.js | 13.4 | Frontend framework used for building a responsive and interactive user interface for the cryptocurrency security expert system. |
Python | 3.12 | Core programming language used for backend development, implementing logic, and processing data. |
Flask | 2.1 | Lightweight web framework for handling API requests and serving the backend of the expert system. |
LLaMA | 3.2 | Large language model (LLM) used for natural language processing tasks and contextual analysis in threat detection. |
Ollama | 0.3.14 | Platform used for hosting and managing LLMs, facilitating easy deployment and integration of language models. |
Nomic-Embed LLM | 1.2 | Embedding model used for generating high-dimensional vector representations of textual data, enabling efficient similarity searches and contextual understanding. |
Chroma DB | 0.3.21 | Vector database used for storing and retrieving embedding vectors, supporting high-performance data management for the expert system. |
LangChain | 0.0.199 | Framework for chaining together LLMs and other components, enabling modular workflows and seamless integration between different system parts. |
Method | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) | Latency (ms) | Scalability |
---|---|---|---|---|---|---|
Rule-Based System | 79.4 | 76.8 | 81.1 | 78.9 | 450 | Low |
Supervised ML Model | 85.2 | 83.4 | 87.1 | 85.2 | 320 | Medium |
Blockchain Anomaly Detection | 88.9 | 87.0 | 90.2 | 88.5 | 290 | High |
Proposed LLM Expert System | 92.0 | 89.3 | 93.2 | 91.2 | 210 | Very High |
Method | Dynamic Knowledge Retrieval | Real-Time Adaptability | Context-Aware Recommendations |
---|---|---|---|
Smart Contract Static Analysis | x | x | x |
Supervised Blockchain ML Models | x | ✓ | x |
Graph-Based Anomaly Detection | ✓ | ✓ | x |
Proposed Expert System | ✓ | ✓ | ✓ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Abdallah, A.A.; Aslan, H.K.; Abdallah, M.S.; Cho, Y.-I.; Azer, M.A. Enhancing Cryptocurrency Security: Leveraging Embeddings and Large Language Models for Creating Cryptocurrency Security Expert Systems. Symmetry 2025, 17, 496. https://doi.org/10.3390/sym17040496
Abdallah AA, Aslan HK, Abdallah MS, Cho Y-I, Azer MA. Enhancing Cryptocurrency Security: Leveraging Embeddings and Large Language Models for Creating Cryptocurrency Security Expert Systems. Symmetry. 2025; 17(4):496. https://doi.org/10.3390/sym17040496
Chicago/Turabian StyleAbdallah, Ahmed A., Heba K. Aslan, Mohamed S. Abdallah, Young-Im Cho, and Marianne A. Azer. 2025. "Enhancing Cryptocurrency Security: Leveraging Embeddings and Large Language Models for Creating Cryptocurrency Security Expert Systems" Symmetry 17, no. 4: 496. https://doi.org/10.3390/sym17040496
APA StyleAbdallah, A. A., Aslan, H. K., Abdallah, M. S., Cho, Y.-I., & Azer, M. A. (2025). Enhancing Cryptocurrency Security: Leveraging Embeddings and Large Language Models for Creating Cryptocurrency Security Expert Systems. Symmetry, 17(4), 496. https://doi.org/10.3390/sym17040496