Runtime Verification of Scientific Computing: Towards an Extreme Scale.
Author(s) -
Minh Ngoc Dinh,
Chao Jin,
David Abramson,
Clinton L. Jeffery
Publication year - 2016
Publication title -
2016 5th workshop on extreme-scale programming tools (espt)
Language(s) - English
DOI - 10.1109/espt.2016.5
Relative debugging helps trace software errors by comparing two concurrent executions of a program - one code being a reference version and the other faulty. By locating data divergence between the runs, relative debugging is effective at finding coding errors when a program is scaled up to solve larger problem sizes or migrated from one platform to another. In this work, we envision potential changes to our current relative debugging scheme in order to address exascale factors such as the increase of faults and the nondeterministic outputs. First, we propose a statistical-based comparison scheme to support verifying results that are stochastic. Second, we leverage a scalable data reduction network to adapt to the complex network hierarchy of an exascale system, and extend our debugger to support the statistical-based comparison in an environment subject to failures.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom