z-logo
open-access-imgOpen Access
Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus
Author(s) -
Julien Abadji,
Pedro Ortiz Suarez,
Laurent Romary,
Benoît Sagot
Publication year - 2021
Publication title -
hal (le centre pour la communication scientifique directe)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.14618/ids-pub-10468
Subject(s) - computer science , pipeline (software) , metadata , natural language processing , license , artificial intelligence , text corpus , information retrieval , resource (disambiguation) , corpus linguistics , modular design , world wide web , programming language , computer network , operating system

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom