default search action
8th LREC 2012: Istanbul, Turkey
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis:
Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, May 23-25, 2012. European Language Resources Association (ELRA) 2012, ISBN 978-2-9517408-7-7
Session O1 - Corpora for Machine Translation
- Iñaki San Vicente, Iker Manterola:
PaCo2: A Fully Automated tool for gathering Parallel Corpora from the Web. 1-6 - Mark Fishel, Ondrej Bojar, Maja Popovic:
Terra: a Collection of Translation Error-Annotated Corpora. 7-14 - Ahmet Aker, Evangelos Kanoulas, Robert J. Gaizauskas:
A light way to collect comparable corpora from the Web. 15-20 - Volha Petukhova, Rodrigo Agerri, Mark Fishel, Sergio Penkale, Arantza del Pozo, Mirjam Sepesy Maucec, Andy Way, Panayota Georgakopoulou, Martin Volk:
SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles. 21-28 - Daniele Pighin, Lluís Màrquez, Lluís Formiga:
The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output. 29-35
Session O2 - Infrastructures and Strategies for LRs (1)
- Stelios Piperidis:
The META-SHARE Language Resources Sharing Infrastructure: Principles, Challenges, Solutions. 36-42 - Riccardo Del Gratta, Francesca Frontini, Francesco Rubino, Irene Russo, Nicoletta Calzolari:
The Language Library: supporting community effort for collective resource production. 43-49 - Khalid Choukri, Victoria Arranz, Olivier Hamon, Jungyeul Park:
Using the International Standard Language Resource Number: Practical and Technical Aspects. 50-54 - Valérie Mapelli, Victoria Arranz, Matthieu Carré, Hélène Mazo, Djamel Mostefa, Khalid Choukri:
ELRA in the heart of a cooperative HLT world. 55-59 - Christopher Cieri, Marian Reed, Denise DiPersio, Mark Liberman:
Twenty Years of Language Resource Development and Distribution: A Progress Report on LDC Activities. 60-65
Session O3 - Semantics
- Dan I. Moldovan, Eduardo Blanco:
Polaris: Lymba's Semantic Parser. 66-72 - Sylvia Springorum, Sabine Schulte im Walde, Antje Roßdeutscher:
Automatic classification of German 'an' particle verbs. 73-80 - Livio Robaldo, Jakub Szymanik:
Pragmatic identification of the witness sets. 81-87 - Orphée De Clercq, Véronique Hoste, Paola Monachesi:
Evaluating automatic cross-domain Dutch semantic role annotation. 88-93 - Benoît Robichaud:
Logic Based Methods for Terminological Assessment. 94-98
Session O4 - Speech corpora
- Luis Javier Rodríguez-Fuentes, Mikel Peñagarikano, Amparo Varona, Mireia Díez, Germán Bordel:
KALAKA-2: a TV Broadcast Speech Database for the Recognition of Iberian Languages in Clean and Noisy Environments. 99-105 - Tommaso Raso, Heliana Mello, Maryualê Malvessi Mittmann:
The C-ORAL-BRASIL I: Reference Corpus for Spoken Brazilian Portuguese. 106-113 - Guillaume Gravier, Gilles Adda, Niklas Paulsson, Matthieu Carré, Aude Giraudel, Olivier Galibert:
The ETAPE corpus for the evaluation of speech-based TV content processing in the French language. 114-118 - Daniel Stein, Bela Usabaev:
Automatic Speech Recognition on a Firefighter TETRA Broadcast Channel. 119-124 - Anthony Rousseau, Paul Deléglise, Yannick Estève:
TED-LIUM: an Automatic Speech Recognition dedicated corpus. 125-129
Session P1 - Anaphora and Coreference
- Abdul-Baquee M. Sharaf, Eric Atwell:
QurAna: Corpus of the Quran annotated with Pronominal Anaphora. 130-137 - Stefanie Dipper, Melanie Seiss, Heike Zinsmeister:
The Use of Parallel and Comparable Data for Analysis of Abstract Anaphora in German and English. 138-145 - Lucie Poláková, Pavlína Jínová, Jirí Mírovský:
Interplay of Coreference and Discourse Relations: Discourse Connectives with a Referential Component. 146-153 - Luz Rello, Iria Gayo:
A Portuguese-Spanish Corpus Annotated for Subject Realization and Referentiality. 154-157 - Marilisa Amoia, Kerstin Kunz, Ekaterina Lapshinova-Koltunski:
Coreference in Spoken vs. Written Texts: a Corpus-based Analysis. 158-164 - Marta Recasens, Maria Antònia Martí, Constantin Orasan:
Annotating Near-Identity from Coreference Disagreements. 165-172 - Thomas Kaspersson, Christian Smith, Henrik Danielsson, Arne Jönsson:
This also affects the context - Errors in extraction based summaries. 173-178 - Natsuko Nakagawa, Yasuharu Den:
Annotation of anaphoric relations and topic continuity in Japanese conversation. 179-186 - Olga Uryupina, Massimo Poesio:
Domain-specific vs. Uniform Modeling for Coreference Resolution. 187-191 - Mateusz Kopec, Maciej Ogrodniczuk:
Creating a Coreference Resolution System for Polish. 192-195
Session P2 - Tools, Systems and Evaluation
- Felix Burkhardt:
Fast Labeling and Transcription with the Speechalyzer Toolkit. 196-200 - Bart Jongejan:
Automatic annotation of head velocity and acceleration in Anvil. 201-208 - Przemyslaw Lenkiewicz, Binyam Gebrekidan Gebre, Oliver Schreer, Stefano Masneri, Daniel Schneider, Sebastian Tschöpel:
AVATecH - automated annotation through audio and video analysis. 209-214 - Henk van den Heuvel, Eric Sanders, Robin Rutten, Stef Scagliola, Paula Witkamp:
An Oral History Annotation Tool for INTER-VIEWs. 215-218 - Han Sloetjes, Aarthy Somasundaram:
ELAN development, keeping pace with communities' needs. 219-223 - Michal Marcinczuk, Jan Kocon, Bartosz Broda:
Inforex - a web-based tool for text corpus management and semantic annotation. 224-230 - Binyam Gebrekidan Gebre, Peter Wittenburg, Przemyslaw Lenkiewicz:
Towards Automatic Gesture Stroke Detection. 231-235 - Thomas Schmidt:
EXMARaLDA and the FOLK tools - two toolsets for transcribing and annotating spoken language. 236-240 - Leonardo Campillos Llanos:
Designing a search interface for a Spanish learner spoken corpus: the end-user's evaluation. 241-248
Session P3 - Lexical Resources
- Satoshi Sato:
Dictionary Look-up with Katakana Variant Recognition. 249-255 - Karin Friberg Heppin, Maria Toporowska Gronostaj:
The Rocky Road towards a Swedish FrameNet - Creating SweFN. 256-261 - Marie-Claude L'Homme, Janine Pimentel:
Capturing syntactico-semantic regularities among terms: An application of the FrameNet methodology to terminology. 262-268 - David Graff, Mohamed Maamouri:
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects. 269-274 - Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek, Christian M. Meyer:
UBY-LMF - A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF. 275-282 - Frantisek Cvrcek, Karel Pala, Pavel Rychlý:
Legal electronic dictionary for Czech. 283-287 - Amir Hazem, Emmanuel Morin:
Adaptive Dictionary for Bilingual Lexicon Extraction from Comparable Corpora. 288-292 - Jennifer Williams, Graham Katz:
A New Twitter Verb Lexicon for Natural Language Processing. 293-298
Session P4 - Annotation and Corpora
- Ritesh Kumar:
Challenges in the development of annotated corpora of computer-mediated communication in Indian Languages: A Case of Hindi. 299-302 - Christian Chiarcos:
Ontologies of Linguistic Annotation: Survey and perspectives. 303-310 - Johanka Spoustová, Miroslav Spousta:
A High-Quality Web Corpus of Czech. 311-315 - Xavier Tannier:
WebAnnotator, an Annotation Tool for Web Pages. 316-319 - Chi-Hsin Yu, Yi-jie Tang, Hsin-Hsi Chen:
Development of a Web-Scale Chinese Word N-gram Corpus with Parts of Speech Information. 320-324 - Dominique Fohr, Odile Mella:
CoALT: A Software for Comparing Automatic Labelling Tools. 325-332 - Valentina Bartalesi Lenzi, Giovanni Moretti, Rachele Sprugnoli:
CAT: the CELCT Annotation Tool. 333-338 - Radu Ion, Elena Irimia, Dan Stefanescu, Dan Tufis:
ROMBAC: The Romanian Balanced Annotated Corpus. 339-344 - Ismaïl El Maarouf, Jeanne Villaneau:
A French Fairy Tale Corpus syntactically and semantically annotated. 345-350 - Carlos Morell, Jorge Vivaldi, Núria Bel:
Iula2Standoff: a tool for creating standoff documents for the IULACT. 351-356 - Frédéric Landragin, Thierry Poibeau, Bernard Victorri:
ANALEC: a New Tool for the Dynamic Annotation of Textual Data. 357-362 - Georgios Petasis:
The SYNC3 Collaborative Annotation Tool. 363-370 - Heba Elfardy, Mona T. Diab:
Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations. 371-378
Session O5 - Crowdsourcing (Special Session)
- Arno Scharl, Marta Sabou, Stefan Gindl, Walter Rafelsberger, Albert Weichselbraun:
Leveraging the Wisdom of the Crowds for the Acquisition of Multilingual Language Resources. 379-383 - Anoop Kunchukuttan, Shourya Roy, Pratik Patel, Kushal Ladha, Somya Gupta, Mitesh M. Khapra, Pushpak Bhattacharyya:
Experiences in Resource Generation for Machine Translation through Crowdsourcing. 384-391 - Elena Filatova:
Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing. 392-398 - Luís Marujo, Anatole Gershman, Jaime G. Carbonell, Robert E. Frederking, João Paulo Neto:
Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization. 399-403
Session O6 - Dialogue and Multimodality
- Kristiina Jokinen, Graham Wilcock:
Constructive Interaction for Talking about Interesting Topics. 404-410 - Florian Nothdurft, Wolfgang Minker:
Using multimodal resources for explanation approaches in intelligent systems. 411-415 - Shota Yamasaki, Hirohisa Furukawa, Masafumi Nishida, Kristiina Jokinen, Seiichi Yamamoto:
Multimodal Corpus of Multi-party Conversations in Second Language. 416-421 - Takenobu Tokunaga, Ryu Iida, Asuka Terai, Naoko Kuriyama:
The REX corpora: A collection of multimodal corpora of referring expressions in collaborative problem solving dialogues. 422-429 - Harry Bunt, Jan Alexandersson, Jae-Woong Choe, Alex Chengyu Fang, Kôiti Hasida, Volha Petukhova, Andrei Popescu-Belis, David R. Traum:
ISO 24617-2: A semantically-based standard for dialogue annotation. 430-437
Session O7 - Machine Translation and Language Resources (1)
- Inguna Skadina, Ahmet Aker, Nikos Mastropavlos, Fangzhong Su, Dan Tufis, Mateja Verlic, Andrejs Vasiljevs, Bogdan Babych, Paul D. Clough, Robert J. Gaizauskas, Nikos Glaros, Monica Lestari Paramita, Marcis Pinnis:
Collecting and Using Comparable Corpora for Statistical Machine Translation. 438-445 - Casey Redd Kennington, Martin Kay, Annemarie Friedrich:
Suffix Trees as Language Models. 446-453 - Ralf Steinberger, Andreas Eisele, Szymon Klocek, Spyridon Pilos, Patrick Schlüter:
DGT-TM: A freely available Translation Memory in 22 languages. 454-459 - Reinhard Rapp, Serge Sharoff, Bogdan Babych:
Identifying Word Translations from Comparable Documents Without a Seed Lexicon. 460-466 - Gideon Kotzé, Vincent Vandeghinste, Scott Martens, Jörg Tiedemann:
Large aligned treebanks for syntax-based machine translation. 467-473
Session O8 - Corpus Processing and Infrastructure
- Lars Borin, Markus Forsberg, Johan Roxendal:
Korp - the corpus infrastructure of Spräkbanken. 474-478 - Jonathan Wright, Kira Griffitt, Joe Ellis, Stephanie M. Strassel, Brendan Callahan:
Annotation Trees: LDC's customizable, extensible, scalable, annotation infrastructure. 479-485 - Roland Schäfer, Felix Bildhauer:
Building Large Corpora from the Web Using a New Efficient Tool Chain. 486-493 - Young-Min Kim, Patrice Bellot, Elodie Faath, Marin Dacos:
Annotated Bibliographical Reference Corpora in Digital Humanities. 494-501 - Jan Pomikálek, Milos Jakubícek, Pavel Rychlý:
Building a 70 billion word corpus of English from ClueWeb. 502-506
Session P5 - Information Extraction (1)
- Michael Wiegand, Benjamin Roth, Eva Lasarcyk, Stephanie Köser, Dietrich Klakow:
A Gold Standard for Relation Extraction in the Food Domain. 507-514 - Mathias Bank, Robert Remus, Martin Schierle:
Textual Characteristics for Language Engineering. 515-519 - Ziqi Zhang, Philip Webster, Victoria S. Uren, Andrea Varga, Fabio Ciravegna:
Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing. 520-527 - Xavier Tannier, Véronique Moriceau, Béatrice Arnulphy, Ruixin He:
Evolution of Event Designation in Media: Preliminary Study. 528-531 - Yunqing Xia, Guoyu Tang, Peng Jin, Xia Yang:
CLTC: A Chinese-English Cross-lingual Topic Corpus. 532-537 - Julia Maria Schulz, Daniela Becks, Christa Womser-Hacker, Thomas Mandl:
A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content. 538-543 - Md. Faisal Mahbub Chowdhury, Alberto Lavelli:
An Evaluation of the Effect of Automatic Preprocessing on Syntactic Parsing for Biomedical Relation Extraction. 544-551 - Wei Wang, Romaric Besançon, Olivier Ferret, Brigitte Grau:
Evaluation of Unsupervised Information Extraction. 552-558 - Stéphanie Weiser, Patrick Watrin:
Extraction of unmarked quotations in Newspapers. 559-562 - Martin Aleksandrov, Carlo Strapparava:
NgramQuery - Smart Information Extraction from Google N-gram using External Resources. 563-568
Session P6 - Word Sense Disambiguation and Evaluation
- Héctor Martínez Alonso, Núria Bel, Bolette Sandford Pedersen:
A voting scheme to detect semantic underspecification. 569-575 - Verena Henrich, Erhard W. Hinrichs:
A Comparative Evaluation of Word Sense Disambiguation Algorithms for German. 576-583 - Piek Vossen, Attila Görög, Rubén Izquierdo, Antal van den Bosch:
DutchSemCor: Targeting the ideal sense-tagged corpus. 584-589 - Samuel Fernando, Mark Stevenson:
Mapping WordNet synsets to Wikipedia articles. 590-596 - Myriam Rakho, Éric Laporte, Matthieu Constant:
A new semantically annotated corpus with syntactic-semantic and cross-lingual senses. 597-600 - Minoru Sasaki, Hiroyuki Shinnou:
Detection of Peculiar Word Sense by Distance Metric Learning with Labeled Examples. 601-604 - Soojeong Eom, Markus Dickinson, Graham Katz:
Using semi-experts to derive judgments on word sense alignment: a pilot study. 605-611 - John Vogel, Marc Verhagen, James Pustejovsky:
ATLIS: Identifying Locational Information in Text Automatically. 612-616
Session P7 - Multiword Expressions and Term Extraction
- Behrang QasemiZadeh, Paul Buitelaar, Tianqi Chen, Georgeta Bordea:
Semi-Supervised Technical Term Tagging With Minimal User Feedback. 617-621 - Miriam Buendía-Castro, Beatriz Sánchez-Cárdenas:
Linguistic knowledge for specialized text production. 622-626 - Rita Marinelli, Laura Cignoni:
In the same boat and other idiomatic seafaring expressions. 627-631 - Sabine Schulte im Walde, Susanne Borgwaldt, Ronny Jauch:
Association Norms of German Noun Compounds. 632-639 - Doaa Samy, Antonio Moreno-Sandoval, Conchi Bueno-Díaz, Marta Garrote Salazar, José María Guirao:
Medical Term Extraction in an Arabic Medical Corpus. 640-645 - Matthieu Constant, Isabelle Tellier:
Evaluating the Impact of External Lexical Resources into a CRF-based Multiword Segmenter and Part-of-Speech Tagger. 646-650 - Anita Gojun, Ulrich Heid, Bernd Weißbach, Carola Loth, Insa Mingers:
Adapting and evaluating a generic term extraction tool. 651-656 - Mladen Karan, Jan Snajder, Bojana Dalbelo Basic:
Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian. 657-662 - Thibault Mondary, Adeline Nazarenko, Haïfa Zargayouna, Sabine Barreaux:
The Quaero Evaluation Initiative on Term Extraction. 663-669 - Shiva Taslimipoor, Afsaneh Fazly, Ali Hamzeh:
Using Noun Similarity to Adapt an Acceptability Measure for Persian Light Verb Constructions. 670-673 - Dhouha Bouamor, Nasredine Semmar, Pierre Zweigenbaum:
Identifying bilingual Multi-Word Expressions for Statistical Machine Translation. 674-679 - Aude Grezka, Céline Poudat:
Building a database of French frozen adverbial phrases. 685-692 - Marc Luder:
German Verb Patterns and Their Implementation in an Electronic Dictionary. 693-697
Session P8 - Authoring Tools, Proofing
- Flore Barcellini, Camille Albert, Corinne Grosse, Patrick Saint-Dizier:
Risk Analysis and Prevention: LELIE, a Tool dedicated to Procedure and Requirement Authoring. 698-705 - Mohammad Hoseyn Sheykholeslam, Behrouz Minaei-Bidgoli, Hossein Juzi:
A Framework for Spelling Correction in Persian Language Using Noisy Channel Model. 706-710 - Nizar Habash, Mona T. Diab, Owen Rambow:
Conventional Orthography for Dialectal Arabic. 711-718 - Khaled Shaalan, Mohammed Attia, Pavel Pecina, Younes Samih, Josef van Genabith:
Arabic Word Generation and Modelling for Spell Checking. 719-725 - Jan Rygl, Ales Horák:
Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification. 726-729 - Shaohua Yang, Hai Zhao, Xiaolin Wang, Bao-Liang Lu:
Spell Checking for Chinese. 730-736 - Jordi Atserias, María Fuentes Fort, Rogelio Nazar, Irene Renau:
Spell Checking in Spanish: The Case of Diacritic Accents. 737-742 - Michael Rosner, Albert Gatt, Andrew Attard, Jan Joachimsen:
Incorporating an Error Corpus into a Spellchecker for Maltese. 743-750
Session O9 - Endangered Languages
- Melanie Seiss:
A Rule-based Morphological Analyzer for Murrinh-Patha. 751-758 - Dirk Goldhahn, Thomas Eckart, Uwe Quasthoff:
Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. 759-765 - Helen Aristar-Dry, Sebastian Drude, Menzo Windhouwer, Jost Gippert, Irina Nevskaya:
"Rendering Endangered Lexicons Interoperable through Standards Harmonization": the RELISH project. 766-770 - Ryan Georgi, Fei Xia, William D. Lewis:
Measuring the Divergence of Dependency Structures Cross-Linguistically to Improve Syntactic Projection Algorithms. 771-778
Session O10 - Document Classification, Text Categorisation
- Julian Brooke, Graeme Hirst:
Measuring Interlanguage: Native Language Identification with L1-influence Metrics. 779-784 - John Noecker Jr., Michael Ryan:
Distractorless Authorship Verification. 785-789 - Monica Lestari Paramita, Paul D. Clough, Ahmet Aker, Robert J. Gaizauskas:
Correlation between Similarity Measures for Inter-Language Linked Wikipedia Articles. 790-797