Neural Character-Level Syntactic Parsing for Chinese

Zuchao Li; Junru Zhou; Hai Zhao; Zhisong Zhang; Haonan Li; Yuqi Ju

doi:10.1613/jair.1.13052

PDF

Published: Jan 31, 2022

DOI: https://doi.org/10.1613/jair.1.13052

Keywords:

chinese parsing, character-level parsing, character-based NLP, dependency parsing, constituent parsing

Zuchao Li

a:1:{s:5:"en_US";s:29:"Shanghai Jiao Tong University";}

Junru Zhou

Hai Zhao

Zhisong Zhang

Haonan Li

Yuqi Ju

Abstract

In this work, we explore character-level neural syntactic parsing for Chinese with two typical syntactic formalisms: the constituent formalism and a dependency formalism based on a newly released character-level dependency treebank. Prior works in Chinese parsing have struggled with whether to de ne words when modeling character interactions. We choose to integrate full character-level syntactic dependency relationships using neural representations from character embeddings and richer linguistic syntactic information from human-annotated character-level Parts-Of-Speech and dependency labels. This has the potential to better understand the deeper structure of Chinese sentences and provides a better structural formalism for avoiding unnecessary structural ambiguities. Specifically, we first compare two different character-level syntax annotation styles: constituency and dependency. Then, we discuss two key problems for character-level parsing: (1) how to combine constituent and dependency syntactic structure in full character-level trees and (2) how to convert from character-level to word-level for both constituent and dependency trees. In addition, we also explore several other key parsing aspects, including di erent character-level dependency annotations and joint learning of Parts-Of-Speech and syntactic parsing. Finally, we evaluate our models on the Chinese Penn Treebank (CTB) and our published Shanghai Jiao Tong University Chinese Character Dependency Treebank (SCDT). The results show the e effectiveness of our model on both constituent and dependency parsing. We further provide empirical analysis and suggest several directions for future study.

Issue

Vol. 73 (2022)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details