Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Legal Judgment Prediction (LJP) from text on European Court of Human Rights cases is cast as an entailment task.
The case outcome is classified from a combined input of case facts and convention articles.
Model is evaluated on its ability to generalize to zero-shot settings.
Domain adaptation methods are applied to improve zero-shot transfer performance.

Paper Content

Introduction

Legal Judgment Prediction (LJP) is a task to classify/predict the outcome of a case based on a textual description of case facts
Legal practitioners determine relevant rules from legal sources to deduce the outcome of the case
Most current LJP approaches tackle this as a classification problem with the textual descriptions of case facts as the sole input
This work casts LJP into an entailment task to enable the model to learn more authentic reasoning between rules and case facts
The task of LJP as entailment has been explored on Chinese criminal case corpora and US tax law
This work develops and evaluates the model on a public dataset of cases by the European Court of Human Rights
The model pairs case fact descriptions with candidate ECHR articles and assigns a binary target label
Results show that the entailment model outperforms the traditional classification setup
The work extends LJP as entailment to the zero-shot transfer setting
Domain adaptation improves the model’s performance on unseen articles
Domain specific pre-trained encoders have an impact on the zero shot transferability of LJP systems

Legal Judgement Prediction (LJP) has been studied using corpora from different jurisdictions
Early works used bag-of-words features
Large pre-trained transformer models have become the dominant model family
Legal-domain specific pre-trained variants have been employed
Going beyond case fact classification, prior work on Chinese criminal case corpora treat LJP as an entailment problem
This is the first work to adapt the similar approach of entailment to ECHR corpus
Domain Adaptation (DA) is tackled under three different settings
This work is the first to benchmark domain adaptation for LJP
Methods proposed to deal with domain adaptation settings can be categorized into four types
Loss based methods are employed to deal with domain adaptation settings of LJP

Dataset, tasks & settings

ECHR dataset provided by LexGLUE consists of 11k case fact descriptions and target label information
Chronologically split into training (2001-2016), validation (2016-2017) and test set (2017-2019)
Label set includes 10 prominent ECHR articles
Model predicts target from fact description alone
Dataset augmented with texts of 10 articles in label set
Formulate entailment variant for both tasks
Binary outcome of whether article has been alleged/found to be violated
Domain adaptation to determine outcomes based on case facts with regard to particular convention article
Zero-shot transfer to determine violation/allegation of case facts with respect to unseen articles
Two settings: UDA and ADA
Dataset split into two non-overlapping groups of articles of various frequencies
Evaluate UDA and ADA on split_0 as source and split_1 as target, and vice-versa

Method

Employs hierarchical neural entailment model to take case fact description and article as input and output binary outcome
Adapted to deal with long input sequences using hierarchical attention networks
Experiments with two domain adaptation components based on adversarial training

Entailment model

Model outputs a binary label
Model contains an encoding layer, interaction layer, post-interaction encoding layer, and classification header
LegalBERT used to encode case facts
Token attention used to aggregate sentence level representations
Dot product attention used to interact case facts and articles
Article-dependent final representation of case facts obtained using two step procedure
Sentence attention used to obtain article representation
Article representation used to condition GRU layer for case facts
Non-linear projection used to classify entailment outcome

Domain adaptation components

Domain Adaptation seeks to make models generalize from one domain to another.
The domains are mapped to a common latent space to reduce differences between their distributions.
The model is trained to read two texts and interrelate them towards an outcome determination.
A two layer feed forward network is used as a discriminator to predict the domain.
A min-max game adversary objective optimization is used to maximize the model’s ability to capture information for the entailment outcome task.

Experiments & discussion

Models

Employed entailment architecture with fact based encoding and 10 classes in output layer
Binary cross entropy loss used to train model
Weights in LegalBERT sentence encoder frozen to save resources and reduce susceptibility to shallow surface signals

Does entailment perform better than fact classification?

Micro-F1 and macro-F1 scores for both tasks A and B are given in Table 1.
Entailment performance is better than classification.
Macro-F1 score shows greater improvement, indicating entailment approach helps with sparser articles.
Task A saw greater improvement than Task B, as Task B can be understood as topic classification.

Does domain adaptation help to improve zero shot transferability ?

Baseline model performs worse than domain adaptation counterparts on target data
UDA Wasserstein distance performs better on target data than Domain Discriminator
UDA Wasserstein distance performs worse on source data than baseline
ADA Domain Discriminator and Wasserstein distance are comparable on target data
ADA Wasserstein distance performs better on source data in Task A
Zero-shot transfer entailment task is difficult and discrepancy between source and target data is still large

How does encoder pre-training influence zero-shot transferability ?

Replacing LegalBERT embeddings with BERT base embeddings in an experiment on Task A resulted in worse performance on the target data.
Domain specific pre-training is beneficial for generalizing to unseen target articles.
LegalBERT may have injected domain-specific information about the target articles into the encoding.

How does article relatedness affect zero-shot transferability ?

Experiment tested whether article relatedness affects performance
Experiment used Article P1-1 as target domain
Constructed one related and one unrelated source domain
Related domain consists of Articles 6 and 8
Unrelated domain consists of Articles 2, 3 and 5
Related source domain performs better
UDA achieves higher performance overall
Wasserstein method outperforms Domain Discriminator for related source, vice versa for unrelated source

Conclusion

LJP cast into an entailment task with non-finetuned encoders has benefit over a simple case fact classification model
Created a zero-shot benchmark on the ECtHR corpus
Task difficulty, absolute performance, and zero shot transferability depend on how case facts are drafted
Major hurdle dealing with legal domain corpora is their lengthy nature
Hierarchical models limited in that tokens across long distances cannot directly attend to one another
Weights in LegalBERT sentence encoder frozen to save computational resources and reduce model’s susceptibility to shallow surface signals
Experiment with publicly available datasets of ECtHR decisions
Task of legal judgment prediction raises ethical, civil rights, and legal policy concerns
Aim to make incremental technical progress to enable systems to acquire legal reasoning capability
Models developed and trained on Google Colab
Models incorporate pre-trained language models and do not train them from scratch
Employ maximum sentence length of 256 and document length of 50

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Dataset, tasks & settings#

Method#

Entailment model#

Domain adaptation components#

Experiments & discussion#

Models#

Does entailment perform better than fact classification?#

Does domain adaptation help to improve zero shot transferability ?#

How does encoder pre-training influence zero-shot transferability ?#

How does article relatedness affect zero-shot transferability ?#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Related work

Dataset, tasks & settings

Method

Entailment model

Domain adaptation components

Experiments & discussion

Models

Does entailment perform better than fact classification?

Does domain adaptation help to improve zero shot transferability ?

How does encoder pre-training influence zero-shot transferability ?

How does article relatedness affect zero-shot transferability ?

Conclusion