
From F to A on the New York Regents Science Exams — An Overview of the Aristo Project
Author(s) -
Clark Peter,
Etzioni Oren,
Khashabi Daniel,
Khot Tushar,
Mishra Bhavana Dalvi,
Richardson Kyle,
Sabharwal Ashish,
Schoenick Carissa,
Tafjord Oyvind,
Tandon Niket,
Bhakthavatsalam Sumithra,
Groeneveld Dirk,
Guerquin Michal,
Schmitz Michael
Publication year - 2020
Publication title -
ai magazine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.597
H-Index - 79
eISSN - 2371-9621
pISSN - 0738-4602
DOI - 10.1609/aimag.v41i4.5304
Subject(s) - milestone , test (biology) , variety (cybernetics) , mathematics education , task (project management) , multiple choice , computer science , field (mathematics) , artificial intelligence , psychology , mathematics , engineering , statistics , cartography , significant difference , paleontology , systems engineering , pure mathematics , biology , geography
Artificial intelligence has achieved remarkable mastery over games such as Chess, Go, and poker, and even Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge. Even as recently as 2016, the best artificial intelligence system could only achieve 59.3 percent on an eighth‐grade science exam (Schoenick et al. 2017). This article reports success on the Grade 8 New York Regents Science Exam, where, for the first time, a system scores more than ninety percent on the exam's non‐diagram, multiple‐choice questions. In addition, our Aristo system, building upon the success of recent language models, exceeded eighty‐three percent on the corresponding Grade 12 Science Exam's non‐diagram, multiple‐choice questions. The results, on unseen test questions, are robust across different test years and different variations of this kind of test. They demonstrate that modern natural language processing methods can result in mastery on this task. While not a full solution to general question answering (the questions are limited to eighth‐grade multiple‐choice science), it represents a significant milestone for the field.