Text Rewriting Improves Semantic Role Labeling

K. Woodsend; M. Lapata

doi:10.1613/jair.4431

PDF PS

Published: Sep 19, 2014

DOI: https://doi.org/10.1613/jair.4431

K. Woodsend

M. Lapata

Abstract

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.

Issue

Vol. 51 (2014)

Section

Articles

afiliatedsites

JAIR is published by AI Access Foundation, a nonprofit public charity whose purpose is to facilitate the dissemination of scientific results in artificial intelligence. JAIR, established in 1993, was one of the first open-access scientific journals on the Web, and has been a leading publication venue since its inception.

Learn more

Article Sidebar

Main Article Content

Abstract

Article Details