Conetxt & Scope
This International Workshop & Panel is organized in the context of the collaboration that links since 2014 the University Sidi Mohamed Ben Abdellah and the the Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR), Pisa, Italy. The scope is to extend CLARIN to the university centers in Morocco, federate and share existing and new data, expand collaborations and leverage resources for collaborative research.
This panel intends therefore to open up a discussion on how to make Language Resources produced in Morocco more visible and accessible to a broader research community, and how the experience and resources for the setting up of various CLARIN data centres and competence centres could be beneficial to this purpose.
CLARIN is the Language Resources Infrastructure for Social Sciences and Humanities. More precisely, it is a European Research Infrastructure Consortium, or ERIC, which is a specific legal form that facilitates the establishment and operation of Research Infrastructures with European interest. CLARIN was established in 2010 from the vision that all digital language resources and tools from all over Europe and beyond are accessible through a single sign-on online environment for the support of researchers in the humanities and social sciences and beyond. Its mission is to create and maintain an infrastructure to support the sharing, use and sustainability of language data and tools for research.
Although CLARIN is a European research infrastructure, it is not limited to European Languages or Resources. Non-European languages are made available by several European CLARIN centres, and the South African Centre for Digital Language Resources (SADiLaR) has recently joined CLARIN. CLARIN has also partnerships with centres in the USA, and its deposit, metadata and single sign on framework can represent a viable solution for anyone wishing to easily set up a Language Resources Repository. Language Resources for Arabic and its varieties can be found via the CLARIN Virtual Language Observatory (VLO), the meta-catalogue which harvests all metadata from CLARIN centres and makes them searchable from a single access point. These can be oral recordings, written corpora, or lexicons. However many important corpora and lexical resources are currently not represented. Moreover, the CLARIN Language Resources Switchboard, a tool that helps to find language processing Web applications, currently lacks any NLP tool for Arabic.
This panel will consist of short presentations by ILC-CNR and USMBA researchers aimed at presenting CLARIN ERIC and its various aspects. CLARIN ERIC and its technical and scientific infrastructure will be introduced, showing how the latter is compliant with the internationally recognised FAIR principles, which recommend that data is Findable, Interoperable, Accessible and Reusable. Moreover, examples of resources and tools from various national consortia, notably CLARIN-IT, will be presented, as well as user involvement activities and currently on going projects (such as ParlaMint). The presentations will be followed by a panel discussion.