Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Japanese CCGBank is used to develop Japanese CCG parsers
  • Linguistic validity of Japanese CCGBank needs to be verified
  • This paper focuses on analysis of passive/causative constructions in Japanese CCGBank
  • Together with ccg2lambda, Japanese CCGBank yields wrong predictions for nested passives and causatives

Paper Content

Introduction

  • Process of generating wide-coverage syntactic parsers from treebanks established in 1990s
  • Belief at the time that formal syntactic theories too inflexible to describe real texts
  • Theoretical development of formal grammars and emergence of linguistically-oriented treebanks dispelled misconception
  • Combinatory Categorial Grammar (CCG) and CCGbank gave rise to CCG parsers
  • Research on Japanese syntax and parsers impacted by CCG
  • Japanese CCGbank generated from Kyoto Corpus by automatic conversion
  • Syntactic structures of CCG have more elaborated information than CFG
  • CCGBank serves as training and evaluation data for CCG parsers
  • Research from perspective of formal syntax conducted regarding adequacy of syntactic structures in treebanks
  • This paper assesses syntactic structures exhibited by Japanese CCGbank from viewpoint of theoretical linguistics

Passive and causative constructions in japanese

  • Japanese passives and causatives are described in a standard Japanese CCG
  • Ga-marked noun phrases in passive sentences correspond to nimarked or omarked noun phrases in active sentences
  • Syntactic structure of left-side sentences of (1) and (2) are shown in Figure 1
  • Semantic representations of words are defined using event semantics
  • Passive and causative suffixes know the argument structure of its first argument
  • N P ga corresponds to N P o or N P ni in passive constructions and N P ni|o corresponds to N P ga in causative constructions
  • Validity of analysis can be verified by inference data on various constructions including passives and causatives

Ccg2lambda and the s\s analysis

  • Analysis of Japanese CCGBank relies on two CCG parsers
  • Lexical assignments for left-side sentences of (1) and (2) are shown in Figure 2
  • Semantic representation of two-place predicate homera is given
  • Relations between Agent and Theme are relativized by higher-order variables
  • Semantic representation of right-side of (1) is a standard neo-Davidsonian representation
  • Semantic representation of hasira-se is obtained using a semantic template
  • Semantic representation of hasira-sera-re is obtained by applying (13) to (17)
  • Error occurs because passive suffix assumes first argument is given Theme and second argument is given Agent

Conclusion

  • Syntactic analysis of Japanese CCGBank produces false predictions for passive and causative nesting
  • Standard analysis correctly explains all inferences
  • Burden of proof is on CCGBank side
  • Need for outreach to linguistic community to keep treebanks and parsers sound