Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Japanese CCGBank is used to develop Japanese CCG parsers
- Linguistic validity of Japanese CCGBank needs to be verified
- This paper focuses on analysis of passive/causative constructions in Japanese CCGBank
- Together with ccg2lambda, Japanese CCGBank yields wrong predictions for nested passives and causatives
Paper Content
Introduction
- Process of generating wide-coverage syntactic parsers from treebanks established in 1990s
- Belief at the time that formal syntactic theories too inflexible to describe real texts
- Theoretical development of formal grammars and emergence of linguistically-oriented treebanks dispelled misconception
- Combinatory Categorial Grammar (CCG) and CCGbank gave rise to CCG parsers
- Research on Japanese syntax and parsers impacted by CCG
- Japanese CCGbank generated from Kyoto Corpus by automatic conversion
- Syntactic structures of CCG have more elaborated information than CFG
- CCGBank serves as training and evaluation data for CCG parsers
- Research from perspective of formal syntax conducted regarding adequacy of syntactic structures in treebanks
- This paper assesses syntactic structures exhibited by Japanese CCGbank from viewpoint of theoretical linguistics
Passive and causative constructions in japanese
- Japanese passives and causatives are described in a standard Japanese CCG
- Ga-marked noun phrases in passive sentences correspond to nimarked or omarked noun phrases in active sentences
- Syntactic structure of left-side sentences of (1) and (2) are shown in Figure 1
- Semantic representations of words are defined using event semantics
- Passive and causative suffixes know the argument structure of its first argument
- N P ga corresponds to N P o or N P ni in passive constructions and N P ni|o corresponds to N P ga in causative constructions
- Validity of analysis can be verified by inference data on various constructions including passives and causatives
Ccg2lambda and the s\s analysis
- Analysis of Japanese CCGBank relies on two CCG parsers
- Lexical assignments for left-side sentences of (1) and (2) are shown in Figure 2
- Semantic representation of two-place predicate homera is given
- Relations between Agent and Theme are relativized by higher-order variables
- Semantic representation of right-side of (1) is a standard neo-Davidsonian representation
- Semantic representation of hasira-se is obtained using a semantic template
- Semantic representation of hasira-sera-re is obtained by applying (13) to (17)
- Error occurs because passive suffix assumes first argument is given Theme and second argument is given Agent
Conclusion
- Syntactic analysis of Japanese CCGBank produces false predictions for passive and causative nesting
- Standard analysis correctly explains all inferences
- Burden of proof is on CCGBank side
- Need for outreach to linguistic community to keep treebanks and parsers sound