Premium
Cloud‐based simulation studies in R ‐ A tutorial on using doRedis with Amazon spot fleets
Author(s) -
Hirschfeld G.,
Thiele C.
Publication year - 2019
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.8188
Subject(s) - computer science , amazon rainforest , cloud computing , installation , database , parallel computing , programming language , operating system , ecology , biology
Simulation studies are helpful in testing novel statistical methods. From a computational perspective, they constitute embarrassingly parallel tasks. We describe parallelization techniques in the programming language R that can be used on Amazon's cloud‐based infrastructure. After a short conceptual overview of the parallelization techniques in R, we provide a hands‐on tutorial on how the doRedis package in conjunction with the Redis server can be used on Amazon Web Services, specifically running spot fleets. The tutorial proceeds in seven steps, ie, (1) starting up an EC2 instance, (2) installing a Redis server, (3) using doRedis with a local worker, (4) using doRedis with a remote worker, (5) setting up instances that automatically fetch tasks from a specific master, (6) using spot‐fleets, and (7) shutting down the instances. As a basic example, we show how these techniques can be used to assess the effects of heteroscedasticity on the equal‐variance t‐test. Furthermore, we address several advanced issues, such as multiple conditions, cost‐management, and chunking.