An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer

Yang, Xi; Wu, Chengkun; Lu, Kai; Fang, Lin; Zhang, Yong; Li, Shengkang; Guo, Guixin; Du, YunFei

doi:10.3390/molecules22122116

Open AccessArticle

An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer

by

Xi Yang

¹,

Chengkun Wu

^1,*,

Kai Lu

¹,

Lin Fang

²,

Yong Zhang

²,

Shengkang Li

²,

Guixin Guo

³ and

YunFei Du

^4,*

¹

College of Computer, National University of Defense Technology, Changsha 410073, China

²

Beijing Genomics Institute (BGI) Shenzhen, Shenzhen 518083, China

³

National Supercomputing Center of Guangzhou, Guangzhou 510006, China

⁴

School of Data and Computer Science, Sun Yat-Sen University, Guangzhou 510000, China

^*

Authors to whom correspondence should be addressed.

Molecules 2017, 22(12), 2116; https://doi.org/10.3390/molecules22122116

Submission received: 25 October 2017 / Accepted: 29 November 2017 / Published: 1 December 2017

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

Download

Browse Figures

Versions Notes

Abstract

Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion—a big data interface on the Tianhe-2 supercomputer—to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the “allocate-when-needed” paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2.

Keywords: big data; Tianhe-2; Hadoop; Spark; genomics big data

Share and Cite

MDPI and ACS Style

Yang, X.; Wu, C.; Lu, K.; Fang, L.; Zhang, Y.; Li, S.; Guo, G.; Du, Y. An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer. Molecules 2017, 22, 2116. https://doi.org/10.3390/molecules22122116

AMA Style

Yang X, Wu C, Lu K, Fang L, Zhang Y, Li S, Guo G, Du Y. An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer. Molecules. 2017; 22(12):2116. https://doi.org/10.3390/molecules22122116

Chicago/Turabian Style

Yang, Xi, Chengkun Wu, Kai Lu, Lin Fang, Yong Zhang, Shengkang Li, Guixin Guo, and YunFei Du. 2017. "An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer" Molecules 22, no. 12: 2116. https://doi.org/10.3390/molecules22122116

APA Style

Yang, X., Wu, C., Lu, K., Fang, L., Zhang, Y., Li, S., Guo, G., & Du, Y. (2017). An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer. Molecules, 22(12), 2116. https://doi.org/10.3390/molecules22122116

Article Menu

An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI