Proteverb – Legal, ethical and technological aspects of processing textual and speech data for scientific, research and development purposes

ABOUT THE PROJECT

From the perspective of language technology development, Slovenian is a language for which few digital resources are available. As a result, modern research from a computer science perspective and the development of products based around natural language processing are significantly slower than for languages with many digital resources. However, in order to adequately acquire language resources and to make secondary use of them in as natural a form as possible, which may also contain some personal data, it is important to interpret the General Data Protection Regulation (GDPR) and the exceptions specifically for research purposes. It is precisely such exemptions that make it possible to achieve the specific purposes to which the present targeted research project is also linked, which will, for the first time in Slovenia, systematically address the acquisition and processing of (personal) data in a way that is in the interest of science, and thus, through application in the form of a pilot project, contribute to the development of the latter, as well as of the economy, on the basis of new insights and practices.

The research project will bring together, intertwine and deepen the knowledge of several different scientific disciplines in the field of social sciences, natural sciences, technical sciences and humanities. Such synergies are essential to ensure that advances in technological development are understood in the appropriate context and regulated in a way that maximises societal benefits and simultaneously minimises negative impacts and interference with ethical and legal standards and human rights. Such a comprehensive approach is the only way to develop the concept of open science to the full extent.

Funded by the Slovenian Research and Innovation Agency and Ministry of Digital Transformation

Project no: V5-2265

Project duration: 2022 – 2024

Partner research organisations: the Jožef Stefan Institute, the Institute of Criminology at the Faculty of Law in Ljubljana, the Faculty of Electrical Engineering at the University of Ljubljana, the Faculty of Computer Science and Informatics at the University of Ljubljana.

CONTENT OF A PROJECT

The targeted research project will be divided into several phases:

  1. We will examine the legal framework of data processing for research and scientific purposes. The starting point will be the General Regulation and the ZVOP-1, which will be built upon through a comparative legal analysis and monitoring of the development of the ZVOP-2 legislative proposal.
  1. We will look at the current practices of data collection for scientific research purposes, with an interest both in the access to data by researchers and research organisations and in the experience of data sharing by public authorities and institutions (e.g. courts). We will identify the key risk factors that have prevented access to data in the past, in order to develop a protocol to protect privacy in the course of data processing for scientific research purposes.
  1. The project will develop procedures for appropriate data access and anonymisation, based on the adaptation and improvement of existing anonymisers. Recommendations will be made on methods for biometric anonymisation of audio speech recordings based on machine learning methods, with the aim of reducing the impact on the reliability of automatic speech recognisers.

We will attempt to acquire the data (pilot) using a privacy protocol and data access procedures, including anonymisation. The pilot part of the research will consist of the preparation of the necessary groundwork for data acquisition, data takeover, data anonymisation, and the organisation of documentation, procedures and rules for the needs of data processing within the research institution. On the basis of the data obtained for the pilot part of the targeted research project, we will specialise a text anonymiser as well as a speech recogniser for the Slovene language.

RELEVANCE TO THE DEVELOPMENT OF SCIENCE

The research project will bring together, intertwine and deepen the knowledge of several different scientific disciplines in the field of social sciences, natural sciences, technical sciences and humanities. Such synergies are essential to ensure that advances in technological development are understood in the appropriate context and regulated in a way that maximises societal benefits and simultaneously minimises negative impacts and interference with ethical and legal standards and human rights. Such a comprehensive approach is the only way to develop the concept of open science to the full extent.

The project will primarily make a significant contribution to the development of three scientific fields, namely law, informatics and computer science, and humanities.

In all three fields of science, the research team will immediately transfer the findings and increased knowledge into the undergraduate and postgraduate studies at national and foreign universities, both through the participation of students in the development of the above-mentioned technologies and in the teaching carried out by the researchers involved in this project, who are also working as professors at different universities.

The academic results of this project will overcome key obstacles to the advancement of science and will optimise the use of data for research purposes without violating legal standards and human rights.

Project team

et|icon_pin_alt|
Inštitut za kriminologijo
Poljanski nasip 2
1000 Ljubljana
et|icon_mail_alt|
inst.crim@pf.uni-lj.si

 

et|icon_phone|
Copyright © Inštitut za kriminologijo pri Pravni fakulteti v Ljubljani

Pravno obvestilo

Accessibility