MultiplEYE Data Collection Metadata Form

!! Attention !!

1. It is not possible to save this form and continue later!
2. Go through the questions and make sure you know all answers.
3. When you are ready to fill in the form, take 30 minutes of uninterrupted time to fill it in all at once.

1. Enter the title for your data collection

To ensure consistency within the COST Action Project “MultiplEYE”, it is essential that the name of your data collection/dataset follows the MultiplEYE naming convention and is consistently applied throughout. The name is composed of the terms “MultiplEYE”, the tested language (ISO-639-1, 2-letter language code), the name of your country (ISO-3166, 2-letter country code), the name of your city, your identifier, and the year when your data collection will end. The name should have already been generated at pre-registration.The entire name MUST be identical with the one that has been pre-registered.

MultiplEYE_

Example: MultiplEYE_DE_DE_Berlin_1_2024

For an additional data collections (e.g., with elderly participants, children, or alternative stimuli), please use the following field to provide a short descriptor (max. 10 characters) specifying the nature of the dataset. This descriptor will later be added to the title of your data collection or dataset when it will be published in EyeStore. If you have already provided a descriptor when completing the pre-registration form, please provide the same descriptor here.

Examples for descriptors: “elderly” → for datasets collected with older adult participants, “children” → for datasets collected with child participants, “shortstim” → for datasets collected with shortened or alternative stimuli. Please propose a clear and concise descriptor.

2. Enter all person(s) responsible for the research data / for creating the dataset at the collection site

2.1. Please provide the contact information (i.e., full name and email address of the corresponding contributor at the responsible institution).

Contact:

2.2. Provide the name of any person(s) as lead creator(s)/contributor(s) who contributed to your dataset and how they contributed. A lead contributor to a data collection is anyone who is responsible for the data collection at a lab. This includes those who take on central roles in planning, organizing, and executing the data collection process, ensuring the quality and accuracy of the data collected, and providing documentation. Please name all lead contributors specifically from your collection site / lab / institution.

Note: Add as many names as you need. Use a semicolon to separate the persons. Use full names (titles are omitted) and state the type(s) of contribution.

Lead Creator(s):

Responsible researcherSupervision of the data collection processOrganizational dutiesAdministrative dutiesTranslation duties (e.g. translation of stimuli, comprehension questions, instructions etc.)Other type of contribution (please specify below)Not specified

If you would like to add another lead creator / contributor, please select below:

2.3. If applicable, provide the name of any person(s) as supporting creator(s) who contributed to your dataset and how they contributed. A supporting contributor to a data collection includes students or other supporting staff who assist with data collection and support running the experiment in the lab with participants. They help in various aspects of the data collection process but may not hold overall responsibility for it. They play a supporting role alongside the lead contributors, assisting in various aspects of the data collection process.

Note: Add as many names as you need. Use a semicolon to separate the persons.

Supporting Creator(s):

example: Martin Average, collecting data from participants in the lab; Hailey Miller, participant appointment coordination….

3. Please, specify the time frame of your data collection

When did your data collection start? (use: yyyy-mm)

When did your data collection end? (use: yyyy-mm)

Example: 2022-11, 2023-10

4. State your location where the study took place

Note: Complete below: name of institution or lab, city and country

Location “institution”:

Example: University of Zurich, Department of Computational Linguistics

Location “city”:

Example: Zurich

Location “country”:

Example: Switzerland

Please select below if your setting includes more than one location (i.e., if your data collection occurred at two different labs) and therefore is connected to another data collection registration?
If applicable, please enter the pre-registered title / ID of the corresponding data collection below.

Note: If your data collection happens at two different labs, BOTH labs must have pre-registered and will have to complete this form.

Corresponding data collection name:

Example: MultiplEYE_DE_DE_Berlin_1_2024

5. Enter a description of your dataset

Please specify whether your dataset is part of the core data collection of MultiplEYE or if it pertains to any other additional data collection related to the MultiplEYE core dataset. This includes additional datasets serving different purposes or aimed at investigating supplementary research questions.

Note: Please see the MultiplEYE data management plan or data collection guidelines for definitions of core and additional data collection. If you have also collected or are planning to collect data for additional datasets (next to this core dataset), please provide further information below, so the core dataset can be linked to the additional dataset(s).

Please Select:

a. Data collection description: The core data collection for the MultiplEYE COST Action aims at fostering an interdisciplinary network of research groups working on eye-tracking data from reading across multiple languages. For this purpose, the data is collected through eye trackers from adult native speakers in many different countries. The development of such a large multilingual eye-tracking corpus enables researchers to study human language processing from a psycholinguistic perspective as well as to improve and evaluate computational language processing from a machine learning perspective.

Please indicate here if there are any other additional datasets already existing that are related to this dataset. If applicable, please enter the name of the related data collection (i.e.,
“MultiplEYE_languageCode_countryCode_city_identifier_endYearOfDataCollection):

Example: MultiplEYE_DE_CH_Zurich_2_2025

Please specify here how your core dataset is related to the other mentioned dataset.

example: This core dataset is related to an additional dataset investigating individual difference in human language processing.

b. This dataset belongs to an additional data collection which was collected in addition to the core data collection for the MultiplEYE. Please provide a short description of your additional data collection.

The additional dataset has tested / consists of:

Same participants but different stimuli or different experiment than the MultiplEYE stimuli/experimentDirect replication of the MultiplEYE experiment (i.e., different participants but same stimuli than MultiplEYE)MultiplEYE experiment was conducted as a pilot studyDifferent group of participants with core MultiplEYE stimuli (e.g., with L2 speakers / with elderly participants/or children/ or participants with dyslexia etc.)Same MultiplEYE stimuli but different procedure (e.g. different stimulus presentation such as different font, or different order or amount of stimuli presentation per session etc.)

Other, please specify:

Please state here to which core dataset it is related to. If applicable, also provide the name of your related core dataset (i.e., “MultiplEYE_languageCode_countryCode_identifier_endYearOfDataCollection):

Example: MultiplEYE_DE_CH_Zurich_1_2024

Please specify here how this dataset is related to the core dataset.

If your core dataset (which is related to the above mentioned additional dataset) has already been published, insert the link to its publication below:

Are there any other additional datasets existing which are related to this dataset? If so, enter their name here (i.e., “MultiplEYE_languageCode_countryCode_identifier_endYearOfDataCollection):

6. Please provide information about sponsoring agencies, individuals, or contractual arrangements for your study

This study is part of the COST Action “MultiplEYE” (CA21131) funded by the European Union. COST Actions are research networks supported by the European Cooperation in Science and Technology.

If your data collection was funded by any funding agency, please add the information below.

Note: Include the grant ID. If there is more than one funding/sponsor involved in your research, please separate the sponsors by a semicolon.

Example: Institute for Advanced Research (IAR), grant no./funding code: IAR-2024-001; Foundation for Scientific Innovation (FSI), grant no./funding code: FSI-2024-123

7. Describe the research goals and objectives for your data collection

Please specify whether the goals and objectives refer to the core or additional data collection by checking one of the following boxes. If you are filling out this form for an additional data collection, check the associated box and describe the research objectives in your own words.

The following description of the objectives refers to the core dataset: The aim of this data collection/study, within the framework of the COST Action MultiplEYE, is to contribute to the development of a multilingual eye-tracking corpus. Each dataset contributed to the core data collection will be made accessible to collaborators of the COST Action MultiplEYE, as well as to the broader scientific community and the general public. These datasets will serve to investigate research inquiries related to human language processing from a psycholinguistic perspective and to enhance computational language processing techniques using machine learning methodologies. The primary focus of the MultiplEYE core dataset is to investigate reading comprehension across multiple languages based on eye movement measurements.

The following description of the objectives refers to the additional dataset:

8. Choose the language that has been tested within your reading experiments / for your data collection

Note: Use
ISO-639-1
for the representation of names of languages.

8.1. Language code:

8.2. Select the language family to which the language belongs to.

Other, namely:

8.3. Select the language script.

Other, namely:

9. State your type of research design and describe the used methods

9.1. If you are filling out this form for a core dataset please check the following box describing the research design of the MultiplEYE experiment.

The study, as a part of the MultiplEYE core data collection, constitutes an observational study using a standardized experimental setup. While participants engage in a structured eye-tracking-while-reading experiment, there is no experimental manipulation of independent variables – rather, the goal is to collect naturalistic reading behavior in response to continuous, ecologically valid text stimuli. The design allows for detailed observation of spontaneous processes under controlled conditions. The experiment follows a within-subjects design with respect to stimulus exposure, as each participant reads multiple texts presented in a (pseudo-)randomized order. However, with respect to the language of the stimuli, the study adopts a between-subject design, since each participant is tested only in their native language.

9.2. If you are filling out this form for an additional dataset and the research design differs from the one above (for example, if your research design does not include an experimental manipulation, or if it does not represent an experimental research design), please describe here and select your used method.

Have you used a reading experiment for the collection of your additional data?

YesNo

Used method:

Other, namely:

10. Tests and measures

10.1. Select all your measures collected in your study:

Recording of eye movements (i.e., horizontal and vertical gaze coordinates) via eye trackerResponses to the MultiplEYE comprehension questionsStandard MultiplEYE participant questionnaire (i.e., demographic data, text familiarity and difficulty assessment)Psychometric tests (please indicate further in section 10.2.)

Other measures, namely:

10.2. State here if you collected any data through psychometric test(s).

MultiplEYE has selected and implemented six psychometric tests to assess various cognitive capacities. Collecting data through those tests was non-obligatory for the core data collection. Please indicate below if you have collected data from psychometric tests. If yes, select which of the following tests you have conducted.

Verbal and non-verbal working memory: Lewandowsky et al. Working Memory Capacity (LWMC) battery

Collected for all participants