z-logo
Premium
Big Data and the Quality Profession
Author(s) -
Montgomery Douglas C.
Publication year - 2014
Publication title -
quality and reliability engineering international
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.913
H-Index - 62
eISSN - 1099-1638
pISSN - 0748-8017
DOI - 10.1002/qre.1669
Subject(s) - big data , quality (philosophy) , six sigma , data quality , data science , computer science , variety (cybernetics) , product (mathematics) , order (exchange) , database transaction , quality management , set (abstract data type) , marketing , business , database , data mining , artificial intelligence , lean manufacturing , mathematics , metric (unit) , philosophy , epistemology , geometry , finance , programming language , service (business)
O ver the last 20 or so years, the quality profession has moved from what Juran once called the ‘little q’ view of quality to the ‘big Q’. By ‘little q’, he meant that we focused rather narrowly on issues related to product quality, and most quality engineers and other quality professionals worked in the manufacturing sector of our economy. ‘Big Q’ represents the shift in thinking that has occurred where quality tools are being used to improve all aspects of a business, and not just products, and not restricted to manufacturing businesses. Six sigma and lean six sigma have played a big role in expanding the use of quality improvement techniques into a broader set of business improvement techniques. There is another ‘small to large’ movement underway that can have the same impact on quality engineers, statisticians, and other quality professionals. I am referring of course to what is popularly called the ‘big data’ revolution. Big data refers to collections of data sets that are growing so large that traditional information processing systems and analytical techniques have become inadequate to deal with them usefully. The volume of data that is being collected and stored is astounding, about 10 bytes currently, and this is expected to increase by another order of magnitude in the next year or two. There is also a velocity dimension; these data are being collected, stored, transmitted, and processed at increasing speeds. Finally, there is the variety of data being collected; numerical data, transaction data, documents, emails, voice/phone records, and video are examples. There is a dark side to this. The recent disclosure of the National Security Agency’s phone record and email surveillance programs is an example. The availability of big data presents a lot of opportunities for invasion of privacy and possibly criminal elements to use these data to commit fraud and other illegal activities. However, there are many positive aspects of the big data era, and quality professionals can play an important role in this. Some examples include improved ability to predict product failure modes so that automobiles (for example) can be recalled and repaired before drivers are injured or killed, fraud detection that predict which transactions or applications on your computer are fraudulent, computer network protection that predict which Internet traffic is originated from sources likely to damage your computer system or which emails are spam and can be diverted to a spam folder on your computer, prediction of customer retention and assistance in developing retention strategies, and prediction of which individuals are at higher risk of certain illnesses to improve healthcare options. Notice that an essential component of all of these big data applications is prediction. To be successful in the world of big data, quality professionals are going to have to master some of the skills of computer science, such as understanding the structure of large databases, basic data mining techniques, image processing, and data visualization techniques. Regression analysis is one of the most widely used techniques for prediction in the big data era. Techniques beyond standard linear regression are frequently necessary. Logistic regression for binary data and methods for modeling count data are widely used. Time-series methods are important, because much of the big data that are available has a time dimension and predicting changes over time can be critical. Finally, it is also important to realize that most techniques for mining and analyzing big data are just exploiting the correlative structure of the data. And as we all learned back in basic statistics, correlation does not imply causality. But with enough data and with sufficiently strong correlations that hold up over time, useful business decisions can often be made. The output of data mining activities should at least be a set of hypotheses that potentially can be tested by more rigorous means, including designed experiments. To paraphrase John Tukey, it is a great time to be a statistician because you get to play in everybody else’s backyard. And the world of big data is a very big backyard for us to explore.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here