Computational Resources and Infrastructures for a Novel Bioinformatics Laboratory: A Case Study
Abstract
1. Introduction
2. Case Description
2.1. Existing Resources and Adaptations
2.1.1. Dry-Lab Room Adaptation
2.1.2. Existing Computational Designs Adaptation
Desktop | Component | Specifications | Units (Number) |
---|---|---|---|
HP EliteDesk 800 G4 TWR | RAM | Kingston 16 GB DDR4-2666 MHz | 16 |
SSD NVMe | SSD NVMe WD M.2 PCIe 4.0 4 TB Black SN850X; | 4 | |
SSD WD M.2 PCIe 4.0 NVMe 1 TB Black SN850X | 2 | ||
G.Skill | SSD NVMe | SSD NVMe WD M.2 PCIe 4.0 4 TB Black SN850X | 1 |
SSD SATA | SSD 2.5″ SATA SAMSUNG 1 TB 870 EVO | 2 | |
Kit Keyboard + mouse | Logitech MK120 | 1 |
Brand/Model | HP EliteDesk 800 G4 TWR | G.Skill |
---|---|---|
#Specimens | 4 | 1 |
Year | 2019 | 2019 |
MB | Intel Q370 PCH-H—vPro | ASUS TUF B450-PLUS GAMING |
CPU | Intel Core i7-8700 (6C12T) | AMD Ryzen 7 2700X (8C16T) |
RAM | Kingston 64 GB DDR4-2666 MHz | G.Skill AEGIS 64 GB DDR4-2400 MHz |
Storage 1 | SSD NVMe Samsung PM981 256 GB | 2x SSD SATA Samsung 1 TB 870 EVO |
Storage 2 | SSD NVMe WD 4 TB SN850X | SSD NVMe WD 4 TB SN850X + HDD Toshiba P300 2 TB 7200 rpm |
GPU | NVIDIA GeForce GT 730 | NVIDIA GeForce RTX 2060 SUPER |
ECC | ns | ns |
2.2. Novel Computational Resources
2.2.1. NAS Servers—The Centralized Data Storage Solution
Component | Specification | Units (Number) |
---|---|---|
Motherboard | Supermicro X12DPL-I6 (dual socket) | 1 |
CPU | Intel Xeon Gold 6326 16C32T 2.9 GHz 24 MB | 1 |
RAM | DIMM Samsung 32 GB DDR4 3200 MHz ECCR 2Rx4 (128 GB) | 4 |
Boot drives (mirror) | SSD Samsung 240 GB PM893 SSD Samsung 240 GB SM883 | 1 + 1 |
HBA controller | Broadcom 9500-8e Tri-Mode PCI-e 4.0 Broadcom 9500-8i Tri-Mode PCI-e 4.0 | 1 + 1 |
Enclosure (main 8 HDD slots) | HDD Seagate Exos 7e8 6TB SATA3 HDD Seagate Exos 7e10 6TB SATA3 | 4 + 4 |
Enclosure (JBOD 24 HDD slots) | HDD Seagate Exos 7e8 8TB SATA3 HDD Seagate Exos 7e10 8TB SATA3 | 5 + 5 |
NIC | Broadcom P210TP dual port 10GBE BASE-T | 1 |
UPS | Riello SDH 3000VA/2700W | 1 |
Extra Boot drives | SSD Samsung 240 GB PM893 SSD Samsung 240 GB SM883 | 1 + 1 |
Extra Storage drives (main 2 slots used) | HDD Seagate Exos 7e8 6 TB SATA3 HDD Seagate Exos 7e10 6 TB SATA3 | (1 + 1) * |
Extra Storage drives (JBOD 2 slots used) | HDD Seagate Exos 7e8 8 TB SATA3 HDD Seagate Exos 7e10 8 TB SATA3 | (1 + 1) * |
Component | Specification | Units (Number) |
---|---|---|
Motherboard | Supermicro X12STL-F | 1 |
CPU | Intel Xeon E-2336 6C12T 2.9 GHz 12 MB | 1 |
RAM | Samsung 32 GB DDR4 3200 MHz ECC UNBUFFERED | 1 |
Boot drives | Samsung 256 GB PM9A1 PCI-E 4.0 | 1 |
HBA controller | Broadcom 9341-8i SATA/SAS3 | 1 |
Enclosure (main 8 HDD slots) | HDD Seagate Exos 7e8 6 TB SATA3 HDD Seagate Exos 7e10 6 TB SATA3 | 4 + 4 |
UPS | Riello SDH 2200VA/1980W | 1 |
Extra Storage drives (main 2 slots used) | HDD Seagate Exos 7e8 6TB SATA3 HDD Seagate Exos 7e10 6TB SATA3 | (1 + 1) * |
2.2.2. Servers RAID Configurations and Novel OS Requirements
2.2.3. Desktop Computers
Component | Specification | Units (Number) |
---|---|---|
Motherboard | Supermicro X13SAE | 1 |
CPU (a) | Intel Core i9-13900K 24C32T 3.0 GHZ 36 MB | 1 |
RAM | Hynix 32 GB DDR5 4.8 GHz ECC Unbuffered | 2 |
Storage 1 (b) | Western Digital SN850X 1TB NVMe | 1 |
Storage 2 | Samsung U.2 PM9A3 3.84 TB PCI-E 2.5″ | 1 |
GPU | Intel UHD 770 | 1 |
Monitor | Philips C-LINE 279C9 27″ 4K | 1 |
UPS (c) | Riello NPW 1500VA/900W Riello NPW 1000VA/600W | 1 + 2 |
2.2.4. Workstation—The Higher-Throughput Computer
Component | Specification | Units (Number) |
---|---|---|
Motherboard | Supermicro X12DAI-N6 (dual socket) | 1 |
CPU | Intel Xeon Gold 5320 26C52T 2.2 GHz (52C104T) | 2 |
RAM | Samsung 32 GB DDR4 3,2 MHz ECCR 2Rx4 (384 GB) | 12 |
Storage 1 (*) | Western Digital SN850X 1 TB SSD NVMe | 2 |
Storage 2 | Samsung U.2 PM9A3 3.84 TB SSD PCI-E 2.5″ | 1 |
Storage 3 | HDD Seagate EXOS 7e10 4 TB SATA3 | 1 |
GPU | PNY QUADRO T400 4 GB | 1 |
Monitor | Philips 329P1H 31.5″ 4K | 1 |
UPS | Riello NPW 2000VA/1200W | 1 |
2.2.5. Network and Connectivity
Component | Specification | Units (Number) | Cost (EUR) |
---|---|---|---|
Managed switch (a) | Netgear M4300-28G GSM4328S-100NES 24X10GBE + 4X10GBE | 1 | 1519.05 |
Switch (b) | TP-LINK SG-105 1GBE | 3 | 67.16 |
Copper Cables (a) | Fscom cat8—1.5m | 2 | 49.20 |
GBIC (a) | Netgear Prosafe 10G SR SFP + Multi Mode Fscom 10GBE SFP+ > 10GBE Base-T Fscom 10GBE SFP+ | 1 + 1 + 1 | 455.10 |
227.55 | |||
47.97 | |||
Power strips (not final) (a) | Nexus Energy Ruler 5 Outlets White Without Switch | 9 | 46.38 |
Power strips (final) (b) | Monolyth 3050001 19″ 8 sockets with switch and 16A Schuko cable | 9 | 203.02 |
Fiber optics cable (c) | Patch cord FO LC-LC duplex OM3 70 m | 1 | 124.29 |
Fiber optics adapters (d) | Coupler LC/LC multi-mode OM3 Cable fiber optics LC/SC 1m multi-mode OM3 | 1 + 1 | 15.21 |
2.3. Total and Final Costs
Category | Cost (EUR) | Year | Company | Reference Material |
---|---|---|---|---|
Novel computational resources | 51,482.19 | 2023 | Soon—Business Solutions, S.A./Nexus Solutions, S.A., Maia, Portugal | Table 4, Table 5, Table 6 and Table 7, Table 8 minus (a,c,d) and Supplementary Materials S2: Proposal S2. |
Upgrades to existing desktops | 3468.63 | 2024 | RuiPolana/Eurobit—Sistemas Informaticos e Manutenção, Lda, Covilhã, Portugal | Table 2 and Table 8 (d) and Supplementary Materials S4. |
Dry lab adaptation material | 3508.14 | 2023 | L3W Material Eléctrico, Lda, Famalicão, Portugal | Supplementary Materials S5. |
Additional connectivity (fiber cable) | 124.29 | 2023 | Copper2Fiber—Soluções de Conectividade, Lda, Barcarena, Portugal | Table 8 (c) and Supplementary Materials S3. |
Total | 58,583.25 | - | - | - |
3. Discussion and Evaluation
3.1. General Remarks
3.2. Computational Resources and Bioinformatics Applications
System | HS | Desktop | Workstation | Server | |||
---|---|---|---|---|---|---|---|
Brand | na | HP | G.Skill | Nexus | Nexus | Nexus DCB | Nexus DC |
OS | na | Kubuntu 24.04 LTS | TrueNAS SCALE 24.04.2 | ||||
Kernel | na | 6.8.0-41-generic #41-Ubuntu | 6.6.32-production+truenas #1 | ||||
CPU | Cores/Threads | 6C12T | 8C16T | 24C32T | 52C104T (2x 26CT52) | 6C12T | 16C32T |
Clock Frequency (GHz) | 3.2–4.6 | 3.7–4.35 | 2.2–5.8 | 2.2–3.4 | 2.9–4.8 | 2.9–3.5 | |
Cache (MB) | 12 | 16 | 36 | 78 (2x 39) | 12 | 24 | |
RAM | Standard | DDR4 | DDR4 | DDR5 | DDR4 | DDR4 | DDR4 |
Clock Frequency (GHz; installed) | 2.6 | 2.4 | 4.8 | 3.2 | 3.2 | 3.2 | |
Clock Frequency (GHz; working) | 2.6 | 2.4 | 4.4 | 2.9 | 3.2 | 3.2 | |
Size (GB) | 64 | 64 | 64 | 384 (2x 192) | 32 | 128 | |
ECC | Supported | No | No | Yes | Yes | Yes | Yes |
Storage | Usable SSD | 4 TB | 4 TB | 3.84 TB | 3.84 TB | na | na |
Usable HDD | na | 2 TB | na | 4 TB | 48 TB | 48 TB + 80 TB | |
Total (*) | 4 TB + 256 GB (#) | 6 TB + 2 TB (#,$) | 3.84 TB + 1 TB (#) | 7.84 TB + 2 TB (#,$) | 48 TB + 256 GB ($) | 128 TB + 480 GB ($) | |
4Kn Supported Drives | All | Usable SSD | All | All | Usable HDD | Usable HDD |
3.3. Computational Resources Performance Evaluation
Benchmark (*) | Desktop | Workstation | Server | ||||
---|---|---|---|---|---|---|---|
Run | Mode | HP | G.Skill | Nexus | Nexus | Nexus DCB | Nexus DC |
1 | Single-core | 999.6 | 1058.4 | 1920.4 | 1534.5 | 1666.6 | 1214.1 |
Multi-core | 5935.7 | 6358.5 | 15755.8 | 12784.6 | 9424.0 | 17088.8 | |
2 | Single-core | 966.0 | 1054.8 | 1923.0 | 1535.0 | 1676.2 | 1206.3 |
Multi-core | 5947.7 | 6371.6 | 15822.5 | 12360.5 | 9445.1 | 17174.3 | |
3 | Single-core | 979.4 | 1053.8 | 1919.2 | 1525.9 | 1666.0 | 1195.8 |
Multi-core | 5947.7 | 6371.0 | 15870.8 | 12655.6 | 9449.0 | 17029.0 | |
Average | Single-core | 981.7 | 1055.7 | 1920.9 | 1531.8 | 1669.6 | 1205.4 |
Multi-core | 5943.7 | 6367.0 | 15816.4 | 12600.2 | 9439.4 | 17097.4 |
3.4. Comparison with EMBL-EBI Course Material from 2024
Development Areas | 2003 | 2020 | This Work |
---|---|---|---|
Hardware resources (servers, cluster) | Yes | Yes | Yes * |
Local copies of public databases | Yes | No | No * |
Disk storage | Yes | Yes | Yes |
Help-desk via email | Yes | Yes | Yes |
Web-site (help documentation) | Yes | Yes | No * |
Web-site (web-based programs) | Yes | No | No * |
One-to-one tutorial sessions | Yes | Yes | No * |
Formal teaching (short practical training courses) | Yes | Yes | No * |
Formal teaching (undergraduate/masters teaching) | Yes | Yes | Yes * |
Development of scripts, pipelines, and interfaces—research | Yes | Yes | Yes * |
Project-based consultation and collaboration | Yes | Yes | Yes * |
Brokering, skills sharing, advocacy | Yes | Yes | No * |
Grant writing | No | Yes | Yes |
Project-specific databases | Yes | Yes | No * |
Sample-tracking/LIMS development | No | Yes | No |
Analysis as a service | Yes | Yes | No * |
Support for other Bioinformatics and core facilities | No | Yes | No * |
Web hosting | No | Yes | No * |
Tissue banking data infrastructure—biobank | No | Yes | Yes * |
Cybersecurity ** | - | - | Yes * |
Data sensitivity, privacy and protection ** | - | - | No * |
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
4Kn | 4KB native sector size |
AF | Advanced Format |
AWS | Amazon Web Services |
BCF | Bioinformatics Core Facility |
C4-UBI | Cloud Computing Competence Centre |
CICS-UBI | Health Sciences Research Centre |
CICS-LTIS | Local Technical Informatics Services |
CPU | Central Processing Unit |
DC | Data Center server |
DCB | Data Center Backup server |
DDR | Double Data Rate |
dRAID | declustered RAID |
ECC | Error Correction Code |
EMBL-EBI | European Molecular Biology Institute-European Bioinformatics Institute |
GBE | Gigabit Ethernet |
GBIC | Gigabit Interface Converter |
GPU | Graphical Processing Unit |
HBA | Host Bus Adapter controller |
HDD | Hard Disk Drive |
HP | Hewlett-Packard |
HPC | High-Performance Computing |
HS | Hardware Specification |
HTC | High-Throughput Computing |
IPMI | Intelligent Platform Management Interface |
JBOD | Just-a-Bunch-of-Disks |
LIMS | Laboratory Information Management System |
MAC | Medium Access Control address |
MB | Motherboard |
MMAP | Multi-core Method for Analysis Pipelines |
NAS | Network-Attached Storage |
NGS | Next-Generation Sequencing |
NIC | Network Interface Card |
NVMe | Non-Volatile Memory express |
OpenZFS | Open Zettabyte File System |
OS | Operating system |
PAS | Private Accounting Storage |
RAID | Redundant Array of Independent Disks |
RAM | Random Access Memory |
RJ45 | Registered Jack-45 |
SAS | Serial Attached SCSI |
SATA | Serial Advanced Technology Attachment |
SFP+ | Small Form-Factor Pluggable + |
SSD | Solid-State Drive |
UBI | University of Beira Interior |
UBI-CIS | Coordinating Informatics Services |
UBI-TIS | Technical Infrastructures Services |
UHD | Ultra-High Definition |
UPS | Uninterruptible Power Supply |
VLAN | Virtual Local Area Network |
WD | Western Digital |
WES | Whole-Exome Sequencing |
WGS | Whole-Genome Sequencing |
References
- Hagen, J.B. The Origins of Bioinformatics. Nat. Rev. Genet. 2000, 1, 231–236. [Google Scholar] [CrossRef] [PubMed]
- Hogeweg, P. The Roots of Bioinformatics in Theoretical Biology. PLoS Comput. Biol. 2011, 7, e1002021. [Google Scholar] [CrossRef] [PubMed]
- Gómez-López, G.; Dopazo, J.; Cigudosa, J.C.; Valencia, A.; Al-Shahrour, F. Precision Medicine Needs Pioneering Clinical Bioinformaticians. Brief. Bioinform. 2019, 20, 752–766. [Google Scholar] [CrossRef] [PubMed]
- Lewitter, F.; Rebhan, M.; Richter, B.; Sexton, D. The Need for Centralization of Computational Biology Resources. PLOS Comput. Biol. 2009, 5, e1000372. [Google Scholar] [CrossRef]
- University of Beira Interior CICS-UBI Health Sciences Research Center, University of Beira Interior. Available online: https://www.ubi.pt/sites/cics/en (accessed on 13 June 2024).
- Courneya, J.-P.; Mayo, A. High-Performance Computing Service for Bioinformatics and Data Science. J. Med. Libr. Assoc. 2018, 106, 494–495. [Google Scholar] [CrossRef]
- Regateiro, F.J.; Silva, H.; Lemos, M.C.; Moura, G.; Torres, P.; Pereira, A.D.; Dias, L.; Ferreira, P.L.; Amaral, S.; Santos, M.A.S. Promoting Advanced Medical Services in the Framework of 3PM—A Proof-of-Concept by the “Centro” Region of Portugal. EPMA J. 2024, 15, 135–148. [Google Scholar] [CrossRef]
- Alhajaj, K.E.; Moonesar, I.A. The Power of Big Data Mining to Improve the Health Care System in the United Arab Emirates. J. Big Data 2023, 10, 12. [Google Scholar] [CrossRef]
- Kong, D.; Yu, H.; Sim, X.; White, K.; Tai, E.S.; Wenk, M.; Teo, A.K.K. Multidisciplinary Effort to Drive Precision-Medicine for the Future. Front. Digit. Health 2022, 4, 845405. [Google Scholar] [CrossRef]
- Petersen, B.-S.; Fredrich, B.; Hoeppner, M.P.; Ellinghaus, D.; Franke, A. Opportunities and Challenges of Whole-Genome and -Exome Sequencing. BMC Genet. 2017, 18, 14. [Google Scholar] [CrossRef]
- Beckmann, J.S.; Lew, D. Reconciling Evidence-Based Medicine and Precision Medicine in the Era of Big Data: Challenges and Opportunities. Genome Med. 2016, 8, 134. [Google Scholar] [CrossRef]
- Alyass, A.; Turcotte, M.; Meyre, D. From Big Data Analysis to Personalized Medicine for All: Challenges and Opportunities. BMC Med. Genom. 2015, 8, 33. [Google Scholar] [CrossRef] [PubMed]
- Mulder, N.J.; Adebiyi, E.; Adebiyi, M.; Adeyemi, S.; Ahmed, A.; Ahmed, R.; Akanle, B.; Alibi, M.; Armstrong, D.L.; Aron, S.; et al. Development of Bioinformatics Infrastructure for Genomics Research. Glob. Heart 2017, 12, 91–98. [Google Scholar] [CrossRef] [PubMed]
- da Fonseca, R.R.; Albrechtsen, A.; Themudo, G.E.; Ramos-Madrigal, J.; Sibbesen, J.A.; Maretty, L.; Zepeda-Mendoza, M.L.; Campos, P.F.; Heller, R.; Pereira, R.J. Next-Generation Biology: Sequencing and Data Analysis Approaches for Non-Model Organisms. Mar. Genom. 2016, 30, 3–13. [Google Scholar] [CrossRef] [PubMed]
- O’Driscoll, A.; Daugelaite, J.; Sleator, R.D. ‘Big Data’, Hadoop and Cloud Computing in Genomics. J. Biomed. Inform. 2013, 46, 774–781. [Google Scholar] [CrossRef]
- Maldonado, E. Bioinformatics Applications for Optimizing Downstream Analyses in Evolutionary Genomics. Ph.D. Thesis, Faculty of Sciences, University of Porto, Porto, Portugal, 2020. Available online: https://hdl.handle.net/10216/127527 (accessed on 2 June 2024).
- Bao, R.; Huang, L.; Andrade, J.; Tan, W.; Kibbe, W.A.; Jiang, H.; Feng, G. Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing. Cancer Inform. 2014, 13, CIN.S13779. [Google Scholar] [CrossRef]
- Boles, N.C.; Stone, T.; Bergeron, C.; Kiehl, T.R. Big Data Access and Infrastructure for Modern Biology: Case Studies in Data Repository Utility. Ann. N. Y. Acad. Sci. 2017, 1387, 112–123. [Google Scholar] [CrossRef]
- Fisch, K.M.; Meissner, T.; Gioia, L.; Ducom, J.C.; Carland, T.M.; Loguercio, S.; Su, A.I. Omics Pipe: A Community-Based Framework for Reproducible Multi-Omics Data Analysis. Bioinformatics 2015, 31, 1724–1728. [Google Scholar] [CrossRef]
- Hasin, Y.; Seldin, M.; Lusis, A. Multi-Omics Approaches to Disease. Genome Biol. 2017, 18, 83. [Google Scholar] [CrossRef]
- Chen, R.; Mias, G.I.; Li-Pook-Than, J.; Jiang, L.; Lam, H.Y.; Chen, R.; Miriami, E.; Karczewski, K.J.; Hariharan, M.; Dewey, F.E.; et al. Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes. Cell 2012, 148, 1293–1307. [Google Scholar] [CrossRef]
- Hawkins, R.D.; Hon, G.C.; Ren, B. Next-Generation Genomics: An Integrative Approach. Nat. Rev. Genet. 2010, 11, 476–486. [Google Scholar] [CrossRef]
- Shi, L.; Wang, Z. Computational Strategies for Scalable Genomics Analysis. Genes 2019, 10, 1017. [Google Scholar] [CrossRef] [PubMed]
- Stephens, Z.D.; Lee, S.Y.; Faghri, F.; Campbell, R.H.; Zhai, C.; Efron, M.J.; Iyer, R.; Schatz, M.C.; Sinha, S.; Robinson, G.E. Big Data: Astronomical or Genomical? PLoS Biol. 2015, 13, e1002195. [Google Scholar] [CrossRef] [PubMed]
- Intel CPU vs. GPU: What’s the Difference? Available online: https://www.intel.com/content/www/us/en/products/docs/processors/cpu-vs-gpu.html (accessed on 4 October 2024).
- Phillips, J.C.; Hardy, D.J.; Maia, J.D.C.; Stone, J.E.; Ribeiro, J.V.; Bernardi, R.C.; Buch, R.; Fiorin, G.; Hénin, J.; Jiang, W.; et al. Scalable Molecular Dynamics on CPU and GPU Architectures with NAMD. J. Chem. Phys. 2020, 153, 044130. [Google Scholar] [CrossRef] [PubMed]
- Ahmad, T.; Ahmed, N.; Al-Ars, Z.; Hofstee, H.P. Optimizing Performance of GATK Workflows Using Apache Arrow In-Memory Data Framework. BMC Genom. 2020, 21, 683. [Google Scholar] [CrossRef]
- Maldonado, E. MMAP: Multi-Core Method for Analysis Pipelines. Available online: https://mymmap.sourceforge.io/ (accessed on 18 June 2024).
- Maldonado, E.; Antunes, A. LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation. BMC Bioinform. 2019, 20, 739. [Google Scholar] [CrossRef]
- Maldonado, E.; Almeida, D.; Escalona, T.; Khan, I.; Vasconcelos, V.; Antunes, A. LMAP: Lightweight Multigene Analyses in PAML. BMC Bioinform. 2016, 17, 354. [Google Scholar] [CrossRef]
- Ghedira, K.; Khamessi, O.; Hkimi, C.; Kamoun, S.; Dhamer, N.; Daassi, K.; Ben Salah, W.; Othman, H.; Belhadj, W.; Ghorbal, Y. Design and Implementation of a Scalable High-Performance Computing (HPC) Cluster for Omics Data Analysis: Achievements, Challenges and Recommendations in LMICs. GigaScience 2024, 13, giae060. [Google Scholar] [CrossRef]
- Jambulingam, D.; Rathinakannan, V.S.; Heron, S.; Schleutker, J.; Fey, V. Kuura—An Automated Workflow for Analyzing WES and WGS Data. PLoS ONE 2024, 19, e0296785. [Google Scholar] [CrossRef]
- Ahmad, T.; Al Ars, Z.; Hofstee, H.P. VC@Scale: Scalable and High-Performance Variant Calling on Cluster Environments. GigaScience 2021, 10, giab057. [Google Scholar] [CrossRef]
- Leung, Y.Y.; Valladares, O.; Chou, Y.-F.; Lin, H.-J.; Kuzma, A.B.; Cantwell, L.; Qu, L.; Gangadharan, P.; Salerno, W.J.; Schellenberg, G.D.; et al. VCPA: Genomic Variant Calling Pipeline and Data Management Tool for Alzheimer’s Disease Sequencing Project. Bioinformatics 2019, 35, 1768–1770. [Google Scholar] [CrossRef]
- Chiara, M.; Gioiosa, S.; Chillemi, G.; D’Antonio, M.; Flati, T.; Picardi, E.; Zambelli, F.; Horner, D.S.; Pesole, G.; Castrignanò, T. CoVaCS: A Consensus Variant Calling System. BMC Genom. 2018, 19, 120. [Google Scholar] [CrossRef] [PubMed]
- Reid, J.G.; Carroll, A.; Veeraraghavan, N.; Dahdouli, M.; Sundquist, A.; English, A.; Bainbridge, M.; White, S.; Salerno, W.; Buhay, C.; et al. Launching Genomics into the Cloud: Deployment of Mercury, a next Generation Sequence Analysis Pipeline. BMC Bioinform. 2014, 15, 30. [Google Scholar] [CrossRef] [PubMed]
- Almeida, D.; Maldonado, E.; Khan, I.; Silva, L.; Gilbert, M.T.; Zhang, G.; Jarvis, E.D.; O’Brien, S.J.; Johnson, W.E.; Antunes, A. Whole Genome Identification, Phylogeny and Evolution of the Cytochrome P450 Family 2 (CYP2) Sub-Families in Birds. Genome Biol. Evol. 2016, 8, 1115–1131. [Google Scholar] [CrossRef] [PubMed]
- Khan, I.; Yang, Z.; Maldonado, E.; Li, C.; Zhang, G.; Gilbert, M.T.; Jarvis, E.D.; O’Brien, S.J.; Johnson, W.E.; Antunes, A. Olfactory Receptor Subgenomes Linked with Broad Ecological Adaptations in Sauropsida. Mol. Biol. Evol. 2015, 32, 2832–2843. [Google Scholar] [CrossRef]
- Alvelos, M.I.; Gonçalves, C.I.; Coutinho, E.; Almeida, J.T.; Bastos, M.; Sampaio, M.L.; Melo, M.; Martins, S.; Dinis, I.; Mirante, A.; et al. Maturity-Onset Diabetes of the Young (MODY) in Portugal: Novel GCK, HNFA1 and HNFA4 Mutations. J. Clin. Med. 2020, 9, 288. [Google Scholar] [CrossRef]
- Lemos, M.C.; Thakker, R.V. Hypoparathyroidism, Deafness, and Renal Dysplasia Syndrome: 20 Years after the Identification of the First GATA3 Mutations. Hum. Mutat. 2020, 41, 1341–1350. [Google Scholar] [CrossRef]
- Maldonado, E.; Khan, I. Omics Biology in Diagnosis of Diseases: Towards Empowering Genomic Medicine from an Evolutionary Perspective. Life 2024, 14, 1637. [Google Scholar] [CrossRef]
- Maldonado, E.; Khan, I. (Eds.) Multi-Omics for Diagnosing Diseases: Bioinformatics Approaches and Integrative Data Analyses. Computation 2025, in press. Available online: https://www.mdpi.com/journal/computation/special_issues/D97DPHGA83 (accessed on 11 March 2025).
- Maldonado, E.; Khan, I. (Eds.) Multi-Omics for Diagnosing Diseases: Bioinformatics Approaches and Integrative Data Analyses. Life 2025, in press. Available online: https://www.mdpi.com/journal/life/special_issues/15JSWLKS45 (accessed on 11 March 2025).
- Fernandes, M.Z.; Caetano, C.F.; Gaspar, C.; Oliveira, A.S.; Palmeira-de-Oliveira, R.; Martinez-de-Oliveira, J.; Rolo, J.; Palmeira-de-Oliveira, A. Uncovering the Yeast Diversity in the Female Genital Tract: An Exploration of Spatial Distribution and Antifungal Resistance. Pathogens 2023, 12, 595. [Google Scholar] [CrossRef]
- Caetano, C.F.; Gaspar, C.; Oliveira, A.S.; Palmeira-de-Oliveira, R.; Rodrigues, L.; Gonçalves, T.; Martinez-de-Oliveira, J.; Palmeira-de-Oliveira, A.; Rolo, J. Study of Ecological Relationship of Yeast Species with Candida Albicans in the Context of Vulvovaginal Infections. Microorganisms 2023, 11, 2398. [Google Scholar] [CrossRef] [PubMed]
- Bonifácio, M.; Mateus, C.; Alves, A.R.; Maldonado, E.; Duarte, A.P.; Domingues, F.; Oleastro, M.; Ferreira, S. Natural Transformation as a Mechanism of Horizontal Gene Transfer in Aliarcobacter Butzleri. Pathogens 2021, 10, 909. [Google Scholar] [CrossRef] [PubMed]
- University of Beira Interior C4-UBI—Cloud Computing Competence Centre, University of Beira Interior. Available online: https://c4.ubi.pt/ (accessed on 13 June 2024).
- Connor, T.R.; Loman, N.J.; Thompson, S.; Smith, A.; Southgate, J.; Poplawski, R.; Bull, M.J.; Richardson, E.; Ismail, M.; Thompson, S.E.; et al. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): An Online Resource for the Medical Microbiology Community. Microb. Genom. 2016, 2, e000086. [Google Scholar] [CrossRef] [PubMed]
- Bagheri, H.; Muppirala, U.; Masonbrink, R.E.; Severin, A.J.; Rajan, H. Shared Data Science Infrastructure for Genomics Data. BMC Bioinform. 2019, 20, 436. [Google Scholar] [CrossRef]
- Simonyan, V.; Chumakov, K.; Dingerdissen, H.; Faison, W.; Goldweber, S.; Golikov, A.; Gulzar, N.; Karagiannis, K.; Vinh Nguyen Lam, P.; Maudru, T.; et al. High-Performance Integrated Virtual Environment (HIVE): A Robust Infrastructure for next-Generation Sequence Data Analysis. Database 2016, 2016, baw022. [Google Scholar] [CrossRef]
- Blitz, R.; Storck, M.; Baune, B.T.; Dugas, M.; Opel, N. Design and Implementation of an Informatics Infrastructure for Standardized Data Acquisition, Transfer, Storage, and Export in Psychiatric Clinical Routine: Feasibility Study. JMIR Ment. Health 2021, 8, e26681. [Google Scholar] [CrossRef]
- Bakken, S. An Informatics Infrastructure Is Essential for Evidence-Based Practice. J. Am. Med. Inform. Assoc. 2001, 8, 199–201. [Google Scholar] [CrossRef]
- Forghani, A.; Sadjadi, S.J.; Moghadam, B.F. A Supplier Selection Model in Pharmaceutical Supply Chain Using PCA, Z-TOPSIS and MILP: A Case Study. PLoS ONE 2018, 13, e0201604. [Google Scholar] [CrossRef]
- Zolghadri, M.; Eckert, C.; Zouggar, S.; Girard, P. Power-Based Supplier Selection in Product Development Projects. Comput. Ind. 2011, 62, 487–500. [Google Scholar] [CrossRef]
- Schrader, D. The Fundamentals of Network Access Management. 2024. Available online: https://blog.netwrix.com/ (accessed on 10 March 2024).
- Gubin, A.V. ZFS RAIDZ vs. Traditional RAID. Available online: https://www.klennet.com/notes/2019-07-04-raid5-vs-raidz.aspx (accessed on 7 August 2023).
- Patterson, D.A.; Gibson, G.; Katz, R.H. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, 1–3 June 1988; Association for Computing Machinery: New York, NY, USA, 1988; pp. 109–116. [Google Scholar]
- iXsystems TrueNAS SCALE—Linux-Based Open Source Storage Infrastructure. Available online: https://www.truenas.com/truenas-scale/ (accessed on 13 June 2024).
- Salter, J. ZFS Fans, Rejoice—RAIDz Expansion Will Be a Thing Very Soon. Available online: https://arstechnica.com/gadgets/2021/06/raidz-expansion-code-lands-in-openzfs-master/ (accessed on 7 August 2023).
- Calomel.org ZFS Raidz Performance, Capacity and Integrity Comparison @ Calomel.Org. Available online: https://calomel.org/zfs_raid_speed_capacity.html (accessed on 7 August 2023).
- Qiao, Z.; Fu, S.; Chen, H.-B.; Settlemyer, B. Building Reliable High-Performance Storage Systems: An Empirical and Analytical Study. In Proceedings of the 2019 IEEE International Conference on Cluster Computing (CLUSTER), Albuquerque, Mexico, 23–26 September 2019; pp. 1–10. [Google Scholar]
- Qiao, Z.; Liang, S.; Fu, S.; Chen, H.-B.; Settlemyer, B. Characterizing and Modeling Reliability of Declustered RAID for HPC Storage Systems. In Proceedings of the 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks—Industry Track, Portland, OR, USA, 24–27 June 2019; pp. 17–20. [Google Scholar]
- OpenZFS dRAID—OpenZFS Documentation. Available online: https://openzfs.github.io/openzfs-docs/Basic%20Concepts/dRAID%20Howto.html (accessed on 7 August 2023).
- Holland, M.; Gibson, G.A. Parity Declustering for Continuous Operation in Redundant Disk Arrays. ACM SIGPLAN Not. 1992, 27, 23–35. [Google Scholar] [CrossRef]
- iXsystems ZFS Primer. Available online: https://www.truenas.com/docs/references/zfsprimer/ (accessed on 14 June 2024).
- Infostor. LinuxCon: OpenZFS Moves Open Source Storage Forward—Infostor.com®. Infostor.com. 2013. Available online: https://www.infostor.com/storage-management/linuxcon-openzfs-moves-open-source-storage-forward.html (accessed on 13 June 2024).
- iXsystems SCALE Hardware Guide. Available online: https://www.truenas.com/docs/scale/gettingstarted/scalehardwareguide/ (accessed on 7 August 2023).
- Harris, R. Why RAID 5 Stops Working in 2009. Available online: https://www.zdnet.com/article/why-raid-5-stops-working-in-2009/ (accessed on 7 August 2023).
- Harris, R. Why RAID 6 Stops Working in 2019. Available online: https://www.zdnet.com/article/why-raid-6-stops-working-in-2019/ (accessed on 7 August 2023).
- iXsystems TrueNAS Community. Available online: https://www.truenas.com/community/ (accessed on 7 August 2023).
- iXsystems ZFS dRAID Primer. Available online: https://www.truenas.com/docs/references/draidprimer/ (accessed on 13 June 2024).
- IDEMA The Advent of Advanced Format | IDEMA. Available online: https://idema.org/?page_id=2369 (accessed on 7 August 2023).
- Fitzpatrick, M.E. 4K Sector Disk Drives: Transitioning to the Future with Advanced Format Technologies; Toshiba America Information Systems, Inc.: Irvine, CA, USA, 2011; p. 12. [Google Scholar]
- Seagate Transition to Advanced Format 4K Sector Hard Drives | Seagate US. Available online: https://www.seagate.com/blog/advanced-format-4k-sector-hard-drives-master-ti/ (accessed on 7 August 2023).
- Hong, J.; Lee, D.; Hwang, A.; Kim, T.; Ryu, H.-Y.; Choi, J. Rare Disease Genomics and Precision Medicine. Genom. Inform. 2024, 22, 28. [Google Scholar] [CrossRef] [PubMed]
- Zandesh, Z.; Ghazisaeedi, M.; Devarakonda, M.V.; Haghighi, M.S. Legal Framework for Health Cloud: A Systematic Review. Int. J. Med. Inf. 2019, 132, 103953. [Google Scholar] [CrossRef] [PubMed]
- Hekel, R.; Budis, J.; Kucharik, M.; Radvanszky, J.; Pös, Z.; Szemes, T. Privacy-Preserving Storage of Sequenced Genomic Data. BMC Genom. 2021, 22, 712. [Google Scholar] [CrossRef] [PubMed]
- Cabading, Z. SAS vs SATA: What’s the Difference? | HP® Tech Takes. Available online: https://www.hp.com/us-en/shop/tech-takes/sas-vs-sata (accessed on 7 August 2023).
- Cook, J.D. Hard Disk Array Failure Probabilities. Available online: https://www.johndcook.com/blog/2009/01/05/rai-failure-probabilities/ (accessed on 7 August 2023).
- Wood, D.E.; Lu, J.; Langmead, B. Improved Metagenomic Analysis with Kraken 2. Genome Biol. 2019, 20, 257. [Google Scholar] [CrossRef]
- Wood, D.E.; Salzberg, S.L. Kraken: Ultrafast Metagenomic Sequence Classification Using Exact Alignments. Genome Biol. 2014, 15, R46. [Google Scholar] [CrossRef]
- Gouy, M.; Guindon, S.; Gascuel, O. SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building. Mol. Biol. Evol. 2010, 27, 221–224. [Google Scholar] [CrossRef]
- Gouy, M.; Tannier, E.; Comte, N.; Parsons, D.P. Seaview Version 5: A Multiplatform Software for Multiple Sequence Alignment, Molecular Phylogenetic Analyses, and Tree Reconciliation. In Multiple Sequence Alignment: Methods and Protocols; Katoh, K., Ed.; Springer: New York, NY, USA, 2021; pp. 241–260. ISBN 978-1-07-161036-7. [Google Scholar]
- Robinson, J.T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E.S.; Getz, G.; Mesirov, J.P. Integrative Genomics Viewer. Nat. Biotechnol. 2011, 29, 24–26. [Google Scholar] [CrossRef]
- Jaganathan, K.; Kyriazopoulou Panagiotopoulou, S.; McRae, J.F.; Darbandi, S.F.; Knowles, D.; Li, Y.I.; Kosmicki, J.A.; Arbelaez, J.; Cui, W.; Schwartz, G.B.; et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell 2019, 176, 535–548.e24. [Google Scholar] [CrossRef]
- Yang, Z. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 2007, 24, 1586–1591. [Google Scholar] [CrossRef]
- Guindon, S.; Dufayard, J.F.; Lefort, V.; Anisimova, M.; Hordijk, W.; Gascuel, O. New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Syst. Biol. 2010, 59, 307–321. [Google Scholar] [CrossRef]
- Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and Applications. BMC Bioinform. 2009, 10, 421. [Google Scholar] [CrossRef] [PubMed]
- Grüning, B.; Dale, R.; Sjödin, A.; Chapman, B.A.; Rowe, J.; Tomkins-Tinch, C.H.; Valieris, R.; Köster, J. Bioconda: Sustainable and Comprehensive Software Distribution for the Life Sciences. Nat. Methods 2018, 15, 475–476. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map Format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [PubMed]
- Barnett, D.W.; Garrison, E.K.; Quinlan, A.R.; Strömberg, M.P.; Marth, G.T. BamTools: A C++ API and Toolkit for Analyzing and Managing BAM Files. Bioinformatics 2011, 27, 1691–1692. [Google Scholar] [CrossRef]
- Katz, K.; Shutov, O.; Lapoint, R.; Kimelman, M.; Brister, J.R.; O’Sullivan, C. The Sequence Read Archive: A Decade More of Explosive Growth. Nucleic Acids Res. 2022, 50, D387–D390. [Google Scholar] [CrossRef]
- Maurya, A.; Szymanski, M.; Karlowski, W.M. ARA: A Flexible Pipeline for Automated Exploration of NCBI SRA Datasets. GigaScience 2023, 12, giad067. [Google Scholar] [CrossRef]
- Andrews, S. FastQC A Quality Control Tool for High Throughput Sequence Data. Available online: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed on 13 February 2025).
- Magoč, T.; Salzberg, S.L. FLASH: Fast Length Adjustment of Short Reads to Improve Genome Assemblies. Bioinformatics 2011, 27, 2957–2963. [Google Scholar] [CrossRef]
- Krueger, F. Trim Galore!: A Wrapper around Cutadapt and FastQC to Consistently Apply Adapter and Quality Trimming to FastQ Files, with Extra Functionality for RRBS Data. Available online: https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ (accessed on 13 February 2025).
- Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D.; et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput. Biol. 2012, 19, 455–477. [Google Scholar] [CrossRef]
- Gurevich, A.; Saveliev, V.; Vyahhi, N.; Tesler, G. QUAST: Quality Assessment Tool for Genome Assemblies. Bioinformatics 2013, 29, 1072–1075. [Google Scholar] [CrossRef]
- Simão, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness with Single-Copy Orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef]
- Kolmogorov, M.; Raney, B.; Paten, B.; Pham, S. Ragout—A Reference-Assisted Assembly Tool for Bacterial Genomes. Bioinformatics 2014, 30, i302–i309. [Google Scholar] [CrossRef] [PubMed]
- Dierckxsens, N.; Mardulyn, P.; Smits, G. NOVOPlasty: De Novo Assembly of Organelle Genomes from Whole Genome Data. Nucleic Acids Res. 2017, 45, e18. [Google Scholar] [CrossRef] [PubMed]
- Boras, M.; Balen, J.; Vdovjak, K. Performance Evaluation of Linux Operating Systems. In Proceedings of the 2020 International Conference on Smart Systems and Technologies (SST), Osijek, Croatia, 14–16 October 2020; pp. 115–120. [Google Scholar]
- Balen, J.; Vajak, D.; Salah, K. Comparative Performance Evaluation of Popular Virtual Private Servers. J. Internet Technol. 2020, 21, 343–356. [Google Scholar]
- CPUID POWERMAX | Softwares | CPUID. Available online: https://www.cpuid.com/softwares/powermax.html (accessed on 27 August 2024).
- CPUID CPU-Z | Softwares | CPUID. Available online: https://www.cpuid.com/softwares/cpu-z.html (accessed on 27 August 2024).
- Miyazaki, N. CrystalDiskMark. Available online: https://sourceforge.net/projects/crystaldiskmark/ (accessed on 27 August 2024).
- Phoronix Media Phoronix Test Suite—Linux Testing & Benchmarking Platform, Automated Testing, Open-Source Benchmarking. Available online: https://www.phoronix-test-suite.com/ (accessed on 27 August 2024).
- Primate Labs Inc. Geekbench 6—Cross-Platform Benchmark. Available online: https://www.geekbench.com/ (accessed on 27 August 2024).
- Ultimate Systems HARDiNFO Benchmark—The Ultimate Hardware Performance Benchmarks. Available online: https://www.hardinfo-benchmark.com/ (accessed on 27 August 2024).
- Dugan, J.; Elliott, S.; Mah, B.A.; Poskanzer, J.; Prabhu, K.; Ashley, M.; Brown, A.; Jaißle, A.; Sahani, S.; Simpson, B.; et al. iPerf—The TCP, UDP and SCTP Network Bandwidth Measurement Tool. Available online: https://iperf.fr/ (accessed on 27 August 2024).
- Lucas, K. Kdlucas/Byte-Unixbench—A Unix Benchmark Suite. Available online: https://github.com/kdlucas/byte-unixbench (accessed on 27 August 2024).
Brand/Model | HP EliteDesk 800 G4 TWR | G.Skill |
---|---|---|
#Specimens | 4 | 1 |
Year | 2019 | 2019 |
MB | Intel Q370 PCH-H—vPro | ASUS TUF B450-PLUS GAMING |
CPU | Intel Core i7-8700 (6C12T) | AMD Ryzen 7 2700X (8C16T) |
RAM | Samsung 16 GB DDR4-2666 MHz | G.Skill AEGIS 64GB DDR4-2400MHz |
Storage 1 | SSD NVMe Samsung PM981 256 GB | SSD SATA BlueRay M8S 240GB |
Storage 2 | na | HDD Toshiba P300 2 TB 7200 rpm |
GPU | NVIDIA GeForce GT 730 | NVIDIA GeForce RTX 2060 SUPER |
ECC | ns | ns |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Maldonado, E.; Lemos, M.C. Computational Resources and Infrastructures for a Novel Bioinformatics Laboratory: A Case Study. Technologies 2025, 13, 285. https://doi.org/10.3390/technologies13070285
Maldonado E, Lemos MC. Computational Resources and Infrastructures for a Novel Bioinformatics Laboratory: A Case Study. Technologies. 2025; 13(7):285. https://doi.org/10.3390/technologies13070285
Chicago/Turabian StyleMaldonado, Emanuel, and Manuel C. Lemos. 2025. "Computational Resources and Infrastructures for a Novel Bioinformatics Laboratory: A Case Study" Technologies 13, no. 7: 285. https://doi.org/10.3390/technologies13070285
APA StyleMaldonado, E., & Lemos, M. C. (2025). Computational Resources and Infrastructures for a Novel Bioinformatics Laboratory: A Case Study. Technologies, 13(7), 285. https://doi.org/10.3390/technologies13070285