
AI Training Datasets & Article 14 GDPR
Author(s) -
Iakovina Kindylidi,
Inês Antas de Barros
Publication year - 2021
Publication title -
revista de direito, estado e telecomunicações
Language(s) - English
Resource type - Journals
eISSN - 1984-9729
pISSN - 1984-8161
DOI - 10.26512/lstr.v13i2.36253
Subject(s) - computer science , notice , general data protection regulation , transparency (behavior) , artificial intelligence , scope (computer science) , obligation , identification (biology) , machine learning , risk analysis (engineering) , data protection act 1998 , data science , computer security , law , political science , business , botany , biology , programming language
[Purpose] At the earliest stages in AI lifecycle, training, verification and validation of machine learning and deep learning algorithm require vast datasets that usually contain personal data, which however is not obtained directly from the data subject, while very often the controller is not in a position to identify the data subjects or such identification may result to disproportionate effort. This situation raises the question on how the controller can comply with its obligation to provide information for the processing to the data subjects, especially when proving the information notice is impossible or requires a disproportionate effort. There is little to no guidance on the matter. The purpose of this paper is to address this gap by designing a clear risk-assessment methodology that can be followed by controllers when providing information to the data subjects is impossible or requires a disproportionate effort.[Methodology] After examining the scope of the transparency principle, Article 14 and its proportionality exemption in the training and verification stage of machine learning and deep learning algorithms following a doctrinal analysis, we assess whether already existing tools and methodologies can be adapted to accommodate the GDPR requirement of carrying a balancing test, in conjunction with, or independently of a DPIA.[Findings] Based on an interdisciplinary analysis, comprising theoretical and descriptive material from a legal and technological point of view, we propose a risk-assessment methodology as well as a series of risk-mitigating measures to ensure the protection of the data subject's rights and legitimate interests while fostering the uptake of the technology.[Practical Implications] The proposed balancing exercise and additional measures are designed to facilitate entities training or developing AI, especially SMEs, within and outside of the EEA, that wish to ensure and showcase the data protection compliance of their AI-based solutions.