z-logo
Premium
Moving huge scientific datasets over the Internet
Author(s) -
Liu Wantao,
Tieman Brian,
Kettimuthu Rajkumar,
Foster Ian
Publication year - 2011
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.1779
Subject(s) - computer science , terabyte , petabyte , data transmission , robustness (evolution) , distributed computing , the internet , scheduling (production processes) , transfer (computing) , big data , operating system , computer network , biochemistry , chemistry , operations management , economics , gene
SUMMARY Modern scientific experiments can generate hundreds of gigabytes to terabytes or even petabytes of data that may be maintained in large numbers of relatively small files. Frequently, these data must be disseminated to remote collaborators or computational centers for data analysis. Moving this dataset with high performance and strong robustness and providing a simple interface for users are challenging tasks. We present a data transfer framework comprising a high‐performance data transfer library based on GridFTP, an extensible data scheduler with four data scheduling policies, and a GUI that allows users to transfer their dataset easily, reliably, and securely. This system incorporates automatic tuning mechanisms to select at runtime the number of concurrent threads to be used for transfers. Also included are restart mechanisms for handling client, network, and server failures. Experimental results indicate that our data transfer system can significantly improve data transfer performance and can recover well from failures. Copyright © 2011 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here