Best practices for management and operation of large HPC installations | Zendy

Lathrop Scott | Zendy; Mendes Celso | Zendy; Enos Jeremy | Zendy; Bode Brett | Zendy; Bauer Gregory | Zendy; Sisneros Roberto | Zendy; Kramer William | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Best practices for management and operation of large HPC installations

Author(s) -

Lathrop Scott,

Mendes Celso,

Enos Jeremy,

Bode Brett,

Bauer Gregory,

Sisneros Roberto,

Kramer William

Publication year - 2018

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.5069

Subject(s) - software deployment , best practice , supercomputer , computer science , resource (disambiguation) , engineering management , installation , scale (ratio) , operating system , engineering , management , computer network , physics , quantum mechanics , economics

Summary To achieve their mission and goals, HPC centers continually strive to improve the effectiveness of their resources and services to best serve their constituencies. Collectively, the community has learned a great deal about how to manage and operate HPC centers, provide robust and effective services, and develop new communities as well as about other important aspects. Yet, cataloguing best practices to help inform and guide the broader HPC community is not often done. To improve the situation, the Blue Waters project has documented sets of best practices that have been adopted for the deployment and operation over the past five years of the Blue Waters leadership system, a large Cray XE6/XK7 supercomputer at NCSA. Those practices, described in this paper, cover aspects of managing and operating the system and its resources, supporting its users, and expanding the diversity of applications and communities. Although the technical practices are sometimes discussed relative to Cray systems and leadership‐scale systems, we believe that they would benefit the deployment and operation of other large HPC installations as well.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research