
Handling Massive Proportion of Missing Labels in Multivariate Long-Term Time Series Forecasting
Author(s) -
Jr Cristovão Iglesias,
Varun Mehta,
Alina Venereo-Sánchez,
Xingge Xu,
Julien Robitaille,
Robert Voyer,
René Richard,
Nabil Belacel,
Amine Kamen,
Miodrag Bolić
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2090/1/012170
Subject(s) - missing data , imputation (statistics) , computer science , interpolation (computer graphics) , kriging , multivariate statistics , data mining , domain knowledge , artificial intelligence , machine learning , time series , gaussian process , gaussian , motion (physics) , physics , quantum mechanics
Training Deep Learning (DL) models with missing labels is a challenge in diverse engineering applications. Missing value imputation methods have been proposed to try to address this problem, but their performance is affected with Massive Proportion of Missing Labels (MPML). This paper presents a approach for handling MPML in Multivariate Long-Term Time Series Forecasting. It is an two-step process where interpolation (using Gaussian Processes Regression (GPR) and domain knowledge from experts) and prediction model are separated to enable the integration of prior domain knowledge. First, a set of samples of the possible interpolation of the missing outputs are generated by the GPR based on the domain knowledge. Second, the observed input sensor data and interpolated labels from GPR are used to train the prediction model. We evaluated our approach with the development of a soft-sensor with one real datasets to forecast the biomass during recombinant adeno-associated virus (rAAV) production in bioreactors. Our experimental results demonstrate the potential of the approach through quantitative evaluation of the generated forecasts in a case that would be extremely difficult to train a DL model due to MPML.