Scalable network analytics for characterization of outbreak influence in voluminous epidemiology datasets | Zendy

Shah Naman | Zendy; Malensek Matthew | Zendy; Shah Harshil | Zendy; Pallickara Shrideep | Zendy; Pallickara Sangmi Lee | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Scalable network analytics for characterization of outbreak influence in voluminous epidemiology datasets

Author(s) -

Shah Naman,

Malensek Matthew,

Shah Harshil,

Pallickara Shrideep,

Pallickara Sangmi Lee

Publication year - 2018

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.4998

Subject(s) - computer science , scalability , data science , analytics , data mining , scale (ratio) , spark (programming language) , set (abstract data type) , identification (biology) , transmission (telecommunications) , field (mathematics) , geography , database , cartography , telecommunications , botany , mathematics , pure mathematics , biology , programming language

Summary Planning for large‐scale epidemiological outbreaks in livestock populations often involves executing compute‐intensive disease spread simulations. To capture the probabilities of various outcomes, these simulations are executed several times over a collection of representative input scenarios , producing voluminous data. The resulting datasets contain valuable insights, including sequences of events that lead to extreme outbreaks. However, discovering and leveraging such information is also computationally expensive. In this study, we set out to achieve two goals, ie, (1) providing a distributed framework for modeling disease transmission at scale using Spark, including improvements to the default GraphX partitioning strategy, and (2) giving planners and epidemiologists a means to analyze interactions between entities (herds) during simulated disease outbreaks. Using our disease transmission network (DTN), planners or analysts can isolate herds that have a disproportionate effect on epidemiological outcomes, enabling effective allocation of limited resources such as vaccinations and field personnel. We use a representative dataset to verify our approach and optimized the underlying graph partitioning algorithm to ensure the system will scale with increases in the dataset size or number of participating machines. Our analysis includes identification of influential herds as well as the creation of machine learning models for accurate classifications that generalize to other datasets.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research