The state of the art in natural language processing is based on transformer models that are pre-trained on general knowledge and enable efficient transfer learning in a wide variety of downstream tasks even with limited data sets. However, these models significantly decrease performance when operating in specific and sectoral domains. This is problematic in the Italian legal context, as there are many discrepancies between the language found in generic open source corpora (e.g., Wikipedia and news articles) and legal language, which can be cryptic, Latin-based, and domain idiolectal formulas. In this paper, we introduce the ITALIAN-LEGAL-BERT model with additional pre-training of the Italian BERT model on Italian civil law corpora. It achieves better results than the ‘general-purpose’ Italian BERT in different domain-specific tasks.
ITALIAN-LEGAL-BERT: A Pre-trained Transformer Language Model for Italian Law
giovanni. comande;daniele. licari
2022-01-01
Abstract
The state of the art in natural language processing is based on transformer models that are pre-trained on general knowledge and enable efficient transfer learning in a wide variety of downstream tasks even with limited data sets. However, these models significantly decrease performance when operating in specific and sectoral domains. This is problematic in the Italian legal context, as there are many discrepancies between the language found in generic open source corpora (e.g., Wikipedia and news articles) and legal language, which can be cryptic, Latin-based, and domain idiolectal formulas. In this paper, we introduce the ITALIAN-LEGAL-BERT model with additional pre-training of the Italian BERT model on Italian civil law corpora. It achieves better results than the ‘general-purpose’ Italian BERT in different domain-specific tasks.File | Dimensione | Formato | |
---|---|---|---|
Licari_Comandè_ITALIAN_LEGAL_BERT.pdf
solo utenti autorizzati
Tipologia:
Altro materiale
Licenza:
PUBBLICO - Pubblico con Copyright
Dimensione
262.78 kB
Formato
Adobe PDF
|
262.78 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.