Automatic Anonymization of Italian Legal Textual Documents using Deep Learning

IRIS

The dissemination of judicial decisions not only provides a valuable source of decision support for judges and legal practitioners but also strengthens public confidence in the judicial system. However, the nature of the data raises privacy concerns as the documents include personal and, often, sensitive data such as health, financial, religious beliefs, sexual orientation, and so on. In recent years, especially since the introduction of GDPR, the international scientific community has paid much attention to the issue of privacy and automatic anonymization tools, but unfortunately, nothing has been done in the Italian legal context. In this paper, we present a first solution aimed at automatic anonymization of the Italian National Jurisprudential Archive (Archivio Giurisprudenziale Nazionale) domain based on pre-trained Transformers embeddings (Clark et al., 2020, Devlin et al., 2019) and spaCy’s transition-based parsing for entity recognition (Honnibal and Montani, 2017). It achieves more than 94.7% recall (>99% for Person and ID entities) and supports several anonymization methods that can be applied to the text depending on the purpose of anonymization

Automatic Anonymization of Italian Legal Textual Documents using Deep Learning

Licari D;Romano MF;Comande' G

2022-01-01

Abstract

The dissemination of judicial decisions not only provides a valuable source of decision support for judges and legal practitioners but also strengthens public confidence in the judicial system. However, the nature of the data raises privacy concerns as the documents include personal and, often, sensitive data such as health, financial, religious beliefs, sexual orientation, and so on. In recent years, especially since the introduction of GDPR, the international scientific community has paid much attention to the issue of privacy and automatic anonymization tools, but unfortunately, nothing has been done in the Italian legal context. In this paper, we present a first solution aimed at automatic anonymization of the Italian National Jurisprudential Archive (Archivio Giurisprudenziale Nazionale) domain based on pre-trained Transformers embeddings (Clark et al., 2020, Devlin et al., 2019) and spaCy’s transition-based parsing for entity recognition (Honnibal and Montani, 2017). It achieves more than 94.7% recall (>99% for Person and ID entities) and supports several anonymization methods that can be applied to the text depending on the purpose of anonymization

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2022
			
	Codice ISBN
	
				9791280153319
			
	Appare nelle tipologie:
	
				4.1 Contributo Atti Congressi/Articoli in extenso

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11382/548773

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

social impact