EVALDA

The EVALDA project has been financed by the French Ministry of Research in the context of its Technolangue programme. The aim of the project was to establish a permanent evaluation infrastructure for the language engineering sector in France and for the French language.

The aim of such a project was to put together reuseable components such as organisation, logistics, language resources, evaluation protocols, methodologies and metrics as well as major actors in the field (scientific advisory boards, panels of experts, partners etc). This guaranteed the possibility to capitalise on the results of previous experiments, but also to favour collaborative research and the setting up of new and improved evaluation campaigns. It was imperative that the evaluations envisaged in this project could be reproduced by third parties, using the resources assembled over the course of the project, in order to enable a genuine comparison of system performance and benchmarking of the state in the art of language engineering. All evaluation resources have been made available on the ELRA catalogue at the end of the project in the form of an evaluation package.

A second aim of the project was to set up evaluation campaigns involving several linguistic technologies including both written and spoken media. Industrial and academic partners took part in the project. The campaigns were largely based around black box evaluation protocols and quantitative methods, drawing and expanding upon previous evaluation campaigns, such as ARC-AUPELF, GRACE, TREC etc.

Each evaluation campaign was largely independent, however a certain amount of synergy between the campaigns was envisaged. This involved the sharing of know-how, resources or even personnel.

The choice of linguistic technologies to evaluate was made on the basis of those that appeared to be the most crucial or important in the field. Details on the selected projects are provided in the notebook below, along with the link to the corresponding Evaluation Package in the catalogue.

Action de Recherche Concertée sur l’Alignement de Documents et son Evaluation

Evaluation of bilingual text and vocabulary alignment systems. Following the success of ARCADEI, this follow up campaign aims to evaluate alignments between more distant or ’exotic’ languages ie Greek, Russian, Japanse, Chinese.

ARCADE Evaluation Package

Introduction

The ARCADE project, started in 1995 and achieved in 1999, was designed to provide standard methods for the evaluation and comparison of French-English parallel text alignment systems. The ARCADE II aims at exploring the techniques of multilingual text alignment through a fine evaluation of the existing techniques and the development of new alignment methods.

ARCADE II consists of two tracks devoted to the evaluation of alignment at sentence and word level respectively. It differs from previous ARCADE in the multilingual aspect and the investigation of lexical alignment. The concerned languages include 5 European languages (English, French, German, Italian and Spanish) and 6 languages of different writing systems (Arabic, Russian, Chinese, Japanese, Greek and Persian). Multilingual reference corpora have been made available for the evaluation exercise.

Contact : Khalid Choukri - choukri@elda.org

For more information (in French), please visit technolangue.net.

Méthodologie d’Evaluation automatique de la compréhension hors et en contexte du DIAlogue

Evaluation of Man-Machine dialogue systems. In this case, the task of hotel room reservation (including some local touristic information) is envisaged.

MEDIA Evaluation Package & MEDIA Speech Database for French

Introduction

The aim of the MEDIA evaluation campaign is to test an automatic evaluation methodology for man-machine dialogue systems. The evaluation methodology is based on a paradigm that uses test sets taken from a corpus of real-world dialogues, a semantic representation of dialogue and common evaluation metrics. This protocol is designed to test the capacity of dialogue systems, both taking into account and not taking into account, the context of the dialogue.

In order to validate the evaluation protocol and the semantic representations, an evaluation campaign will take place where each partner in the project tests their system. The task chosen is hotel room reservation, with touristic information as an additional point of entry into the dialogue.

The final Media Workshop took place at the Sainte-Marthe University in Avignon, France, on July 6-7 2006.

Contact : Khalid Choukri - choukri@elda.org

For more information (in French), please visit technolangue.net.

Campagne d’Evaluation de Systèmes de Traduction Automatique

Evaluation of Machine Translation Systems. French is to be the pivotal language, however, several languages from and into French are envisaged (English, Spanish, German, Arabic) according to the capabilities of the participants’ systems.

CESTA Evaluation Package

Introduction

Final CESTA Report
(in French, pdf, 99 pages, 1028Ko)

The CESTA campaign proposes a series of evaluation campaigns of machine translation systems for various language pairs towards French. The statistical metrics BLEU/NIST (IBM) are being used for the evaluations and adapted to French as a target language, along with other automatic metrics based on grammatical and semantic scores (X-Score and D-Score). The Weighted N-gram Model (WNM), WER and PER are also used. The other aim of CESTA is to conduct a meta-evaluation, comparing the automatic results with human judgments.

Coordinator

ELDA

Participants

  Université de Lille 3, IDIS/CESARTES
  Ecole Polytechnique Fédérale de Lausanne, LIA
  Université de Leeds
  Temis S.A.
  Systran S.A.
  Softissimo S.A.
  CIMOS S.A.
  Université de Grenoble, IMAG
  Université de Montréal, Dept. Linguistique et Traduction
  Université de Montréal, RALI
  Université de Genève, ISSCO
  University of Aachen, RWTH
  Universitat Politècnica de Catalunya (UPC)
  SDL International
  Comprendium S.L.

Contact : Khalid Choukri - choukri@elda.org

For more information (in French), please visit technolangue.net.

CESART - Evaluation de Systèmes d’Acquisition de Ressources Terminologiques

Evaluation of terminology extraction tools, including tools for extracting ontologies and semantic relations. Evaluation is to take place with reference to a predetermined list of terms/relations.

CESART Evaluation Package

Introduction

CESART project deals with the user-oriented evaluation of terminological resources acquisition tools. This kind of user-oriented evaluation relies on the support of experts in information management who are capable of assessing terminological data and confirming usage. The aim is to propose and validate an evaluation protocol allowing one to objectively evaluate and compare different systems for terminology application such as terminological resource creation and semantic relation extraction. The project also aims to create quality-controlled resources such as domain-specific corpora, automatic scoring tool, etc.

CESART consists of two tracks devoted to the evaluation of term extraction and term structuring. Five French language terminology acquisition tools have been participated in the CESART evaluation exercise. As these tools are based on different models and designed for different applications, two evaluation tasks have been defined : term extraction and semantic relation extraction (synonymy) in order to cope with the context of the use of these tools.

Contact : Khalid Choukri - choukri@elda.org

For more information, please visit technolangue.net (in French).

Evaluation des Analyseurs Syntaxiques du français

An evaluation camapign designed to test syntactic parsers. A side effect of the campaign is the creation of a syntactically parsed reference text composed of several genres of text (newpapers, literary texts, electronic texts etc).

EASY Evaluation Package

Introduction

The EASY project is dedicated to the evaluation of syntactic analysers for the French language. The project is financed by the French Ministry of Research in the context of the Technolangue programme.

The aim of the EASy campaign is to design and test an evaluation methodology to compare syntactic analysers on French and to produce a large validated linguistic resource obtained combining automaticaly the annotated corpora produced. The corpora consists of texts taken from various domains (litterature, medicine, technique, general, ...) and of different types : newspapers, questions, websites, oral transcriptions, ...

The project will last 24 months. The evaluation campaign is currently running and will last until 15th December 2004.

Contacts

  Khalid Choukri
  Olivier Hamon
  Patrick Paroubek (LIMSI)
  Anne Vilnat (LIMSI)
  Isabelle Robba (LIMSI)

Coordinators

ELDA
LIMSI

Corpora providers
  ATILF
  LLF
  DELIC
  STIM
  ELDA

Participants

  ERSS
  FT R&D
  GREYC
  INRIA
  LATL
  LIC2M
  LIRMM
  LORIA
  LPL
  STIM
  SYNAPSE
  SYSTAL
  TAGMATICA
  VALORIA
  XRCE

Contact : Khalid Choukri - choukri@elda.org

For more information (in French), please visit technolangue.net.

Evaluation en Question-Réponse

Evaluation of Question/Answering systems. Three reference corpora are envisaged : a large general corpus (newspapers, general texts), a web corpus and a corpus made up of medical texts.

EQUER Evaluation Package

Introduction

The EQueR Evaluation Campaign provides an evaluation framework for Question/Answering systems for the French language. It aims at giving pertinent input to this research activity by providing it with a state of the art, especially in France.

EQueR includes two tasks of automatic answer retrieval : a generic task over an heterogeneous collection of texts - mainly newspaper articles, and a specialised task over a corpus of medical texts.

Contact : Khalid Choukri - choukri@elda.org

Participants

  ELDA / ELRA, Organiser
  CISMEF Centre Hospitalier de Rouen
  Systal / Pertimm S.A.S.
  France Telecom R&D, DMI/GRI
  iSmart S.A.R.L.
  CNRS/Université d’Avignon, Laboratoire d’Informatique d’Avignon (LIA)
  CEA, Laboratoire d’ingénierie de la connaissance multimédia multilingue (LIC2M)
  CNRS, Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur (LIMSI)
  Université de Neuchâtel, Laboratoire Interfacultaire d’Informatique
  Sinequa S.A.S.
  Assistance Publique / Hôpitaux de Paris, Sciences et Technologies de l’Information Médicale (STIM)
  Synapse S.A.

Scientific committee

  Brigitte Grau, LIMSI - Animatrice
  Patrice Bellot, LIA
  Michel Benoit, iSmart
  Malek Boualem, FranceTelecom RetD
  Mohand Boughanem, IRIT
  Patrick Constant, Systal
  Olivier Ferret, CEA
  Martine Hurault-Plantet, LIMSI
  Dominique Laurent, Synapse
  Claude de Loupy, Sinequa
  Jacques Savoy, Université de Neuchâtel
  Pierre Zweigenbaum, STIM

Contact : Khalid Choukri - choukri@elda.org

For more information (in French), please visit technolangue.net.

Evaluation des Systèmes de Transcription Enrichie d’émissions Radiophoniques

Evaluation of automatic broadcast news transcriptions systems. This campaign includes the evaluation of segmentation tasks and identification of named entities.

ESTER Evaluation Package & ESTER Corpus

Introduction

The purpose of the ESTER Camapign is to evaluate the performance of broadcast news transcription systems.

Contact : Khalid Choukri - choukri@elda.org

For more information, please visit technolangue.net (in French).

Evaluation des Synthétiseurs de parole en français

Evaluation of Speech synthesis systems. This campaign is to feature a novel method for the evaluation of prosody in sythesised speech.

EVASY Evaluation Package

Introduction

The EVASY project is dedicated to the evaluation of speech synthesis systems for the French language. The project is financed by the French Ministry of Research in the context of the Technolangue programme.
This evaluation campaign is intended to expand upon the ARC-AUPELF (now AUF) campaign of 1996-1999, the only previous evaluation campaign for text-to-speech systems for the French language. The EvaSy campaign is subdivided into three components :

evaluation of the grapheme-to-phoneme module,
evaluation of prosody and expressivity,
global evaluation of the quality of the synthesised speech.

If you would like to obtain more information about the project and the related work-in-progress report, you are kindly invited to contact :

Contacts

Khalid Choukri - choukri@elda.org

Christophe d’Alessandro (LIMSI)

Coordinator

ELDA

Consortium Partners

  DELIC
  Bell Labs - Lucent Technologies
  CRISCO
  Elan Speech
  ICP
  LIMSI
  LATL
  LIA
  MULTITEL ASLB

For more information (in French), please visit technolangue.net.

Links

Tags

Latest News

Tag Cloud

ELRA Tweets

Share this page!

Links

Tags

Latest News

Tag Cloud

ELRA Tweets