Premium
Linear regression model with histogram‐valued variables
Author(s) -
Dias Sónia,
Brito Paula
Publication year - 2015
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11260
Subject(s) - mathematics , histogram , linear regression , quantile regression , regression analysis , linear model , range (aeronautics) , statistics , artificial intelligence , computer science , materials science , composite material , image (mathematics)
Histogram‐valued variables are a particular kind of variables studied in Symbolic Data Analysis where to each entity under analysis corresponds a distribution that may be represented by a histogram or by a quantile function. Linear regression models for this type of data are necessarily more complex than a simple generalization of the classical model: the parameters cannot be negative; still the linear relation between the variables must be allowed to be either direct or inverse. In this work, we propose a new linear regression model for histogram‐valued variables that solves this problem, named Distribution and Symmetric Distribution Regression Model . To determine the parameters of this model, it is necessary to solve a quadratic optimization problem, subject to non‐negativity constraints on the unknowns; the error measure between the predicted and observed distributions uses the Mallows distance. As in classical analysis, the model is associated with a goodness‐of‐fit measure whose values range between 0 and 1. Using the proposed model, applications with real and simulated data are presented.