Towards creating an ELSA Curriculum for Data Scientists 

  • When? Wednesday, 1 June 2022, 2-4 PM
  • Where? Online
  • Language: English

The explosion of data driven applications in recent years has put Ethical, Legal, and Societal Aspects (ELSA) of data science in the spotlight. One way to tackle these issues is by raising awareness via education.

This workshop is organized within the framework of the BMBF-funded project FAIR Data Spaces and aims to bring together stakeholders with ELSA and technical backgrounds, interested to identify the main concepts for the creation of an ELSA curriculum for Data Scientists. This will hopefully lead to the creation of a community which will continue to work towards this goal.

The workshop agenda comprises presentations, impulse talks, discussions and a Q&A session. Audience participation is highly encouraged.

 

Agenda

    1. Welcome-Introduction (Daniela Mockler, NFDI)
    2. Data Scientists and ELSA: Describing the Landscape (Maria Christoforaki, UzK)
      A brief description of the status quo: ELSA as part of the Data Science curriculum in tertiary education; ELSA awareness demands in industry; determining the Data Scientist profile now and in the future.
    3. Data Protection in FAIR-DS (Dara Hallinan, FIZ)
      In relation to the FAIR-DS project, data flows are foreseen between a scientific data infrastructure, and an industrial data infrastructure. Some of this data will relate to identifiable individuals, and will thus be personal data and be subject to specific legal protection under data protection law. In this regard, this presentation will consider the work being done concerning the applicability of data protection law to the data flows foreseen in FAIR-DS.
    4. What’s the Point of Ethics in Data Sharing? (Andreas Bruns, UKHD)
      The talk gives a very short introduction into some of the themes within the ethics of data sharing. What are (some of) the ethical questions surrounding data sharing and use? What is the relationship between ethics and other normative domains, such as law and institutionalized (ethical) codes? And what’s the point of ethical reflection where such more tangible (e.g., legal) rules already exist?
    5. Legal and Ethical Aspect of a Personalized Learning Dashboard (Mohammadreza Tavakoli, TIB)
      eDoer is a platform which provides AI-based personalized educational recommendations for users toward their learning goals. In eDoer, learners can 1. set their goals, 2. receive personalized recommendations (learning paths and resources) toward their goals, 3. request for mentoring, and 4. test their achieved knowledge. Moreover, eDoer offers a hybrid Human-AI based solution for authors and content curators to create and maintain up-to-date curricula in various contexts (e.g., knowledge area, location, language). In this presentation, we will discuss the legal and ethical aspects that we have faced to develop the eDoer platform. Furthermore, we will explain the other ethical and legal areas that need to be considered for the future versions.
    6. Towards an ELSA curriculum for Data Scientists: Setting the program and learning objectives (Maria Christoforaki, UzK)
      Review of the existing approaches and identification of the challenges with respect to the context of developing a curriculum for practitioners, such as variance in experience, education, and cultural background.
    7. Panel Discussion / Q&A

    Recap

    The FAIR Data Spaces project is going to develop a curriculum for Ethical, Legal and Societal Aspects training for Data scientist professionals. For that purpose we organize a series of workshops that will bring together experts of a variety of disciplines and with the help of whom we are going to collect the requirements for such a curriculum. We kicked off the workshop series on 1st June 2022 and virtually welcomed a mix of ethics and legal experts, computer scientists and participants from other fields. All slides and the respective recordings have been published on the FAIR Data Spaces Community of Zenodo.

    The workshop comprised five talks and a subsequent panel discussion. After a short introduction to FAIR Data Spaces and the purpose of the workshop series, Maria Christoforaki started the workshop with a description of the current landscape, namely the profile of data scientists:

    • What are their demographics?
    • What is their education?
    • What is their knowledge and awareness regarding ELSA topics?

    A review of multiple studies showed that data scientists build a heterogenous group with a variety of educational, social and national backgrounds. As they often do not follow a university education path, they miss potential existing university study offers regarding the ELSA domain. Thus, the review of the studies shows that there is a need for an ELSA training for data scientists who are already working in this profession.

    In the second talk Dara Hallinan focused on data protection in FAIR DS. The development of a common cloud-based data space between industry and science involves the handling of personal data. The processing of this data is subject to specific legal rules and, therefore, a general analysis of data protection conditions under which data in Gaia-X and NFDI can be processed and exchanged was necessary as a first step towards connecting both domains. Currently, the analysis of the conditions under which personal data can be processed in the FAIR DS demonstrators is ongoing. These demonstrators are technical applications that showcase the feasibility of connecting the two domains industry and science. With this task at hand, some questions arise on how data scientists should go about understanding what the law applicable to them is, how they should understand the uncertainty inherent in the law and how they should deal with this uncertainty. In discussing these questions the importance of understanding which aspects of the law are applicable and of being aware of the remaining uncertainty of the interpretation of the law was highlighted.

    After discussing the legal aspects, Andreas Bruns continued with a presentation on the ethics of data sharing. Data ethics is defined as the area of applied ethics that addresses real world moral issues in relation to data, mainly personal data. Andreas presented  a mapping of ethical principles with four main principles of autonomy, no harming, aiding and justice. These principles go beyond the usually quite narrow understanding of data ethics as primarily concerned with privacy issues. Of this mapping, the following points were mentioned in particular:

    1. Ethics governs everyone’s conduct, not just that of data scientists. In a good governance, all vulnerable groups have to be considered.
    2. When building a framework of cooperation in which the potential of big data can be realized, the building of trust is of utmost importance.
    3. Value alignment concerns, i.e. personal and ethical concerns towards data science and technology, should be addressed in data science and research.
    4. Finally, the ethical dimension of data science stretches across the whole data life cycle.

    In a subsequent fruitful discussion we examined questions of operationalization of ethics by data scientists, the relation of the FAIR data principles and ethics and more.

    After hearing the legal and ethics experts, Mohammadreza Tavakoli brought in the experience of computer scientists in the third presentation and how they are dealing with ELSA issues they encountered while developing eDoer, a platform which provides AI-based personalized educational recommendations for users toward their learning goals. To enable this personalized environment eDoer lets learners define their journeys (chosen profession) with their individual target courses, according to their individual preferences. The system crawls the internet for educational materials from Google, YouTube and Wikipedia to help the content developers to add educational resources for each topic. Since eDoer is offered as a service, there were some legal issues that had to be tackled in its Terms of Use, including scope and goal of the application, the registration and admission processes, how to handle deletion of user accounts and their personal data, and several others. Also ethical questions arise, such as transparency vs usability tradeoff or how to compare one person to another one regarding assessment.

    As a starting point for the concluding discussion round, Maria considered the steps that need to be taken in formulating an ELSA curriculum for data scientists in the final presentation. The needs and context of the learners have to be taken into account when developing such a curriculum. Furthermore, the desired learning objectives and suited teaching methods have to be identified. In the discussion round at the end of the workshop, the point was raised that the curriculum should aim to make the data scientists neither legal nor ethics experts but to equip them with the knowledge to understand when and how they can address the domain experts. More often than not, data scientists are unaware of the basic concepts of the law or ethics and are ignorant towards their limit of understanding the law and ethical principles. This may prompt data scientists to act on the presumption of what the law is, which in turn may lead to unlawful decisions. Hence, the goal is to raise awareness of these issues and to promote knowledge of ethical and legal frameworks among data scientists both in science and industry, such that a collaborative relationship between the researchers and scientists with their legal and data protection representatives will be established.

    We will organize the second FAIR DS ELSA workshop in June 2023. Stay tuned. 

    Grant agreement BMBF
    Grant agreement FAIRDS