Curating a CSV batch file

(The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.)

These instructions are intended to help with the use of a CSV file, provided to you (as a Responsible Organisation) by the IRD service, so that you can update some information about the repositories listed in the file.

This CSV has been exported from IRD. It contains the most recent information, known to IRD, about repositories in your area of responsibility. It contains 13 columns. Several of the columns are constrained. Some accept only values from a controlled list of allowed values. All of this is detailed in the table (below).

The values in each row of this file can be edited, constrained by the rules in the table. The file can then be sent back to the IRD in order to update the records there. When you receive the CSV file it will contain records for repositories known to IRD. This means that the first column - ID - contains an ID for each of them. If you edit any value in a row which has one of these IDs then the record in IRD with that ID will be updated with those values.

You MAY also add new rows to the CSV file, if you believe that there is a repository which is not already known to IRD. Such rows will also have values, constrained by the same rules in the table below, but they MUST NOT have an ID. The absence of an ID will cause IRD to create a new record for that repository.

When you submit a CSV file back to the IRD to update records, it will be automatically checked before being processed. The structure of the file MUST exactly match the original file (e.g. the number of columns, the order of the columns, the allowed values in the rows etc.).

If there are rows where no update is required, you MAY include those rows (but this is OPTIONAL). When the CSV file is processed by IRD, any rows which do not require an update to the IRD record are simply ignored.

If the CSV file contains a row for a repository which you believe no longer exists, then you MUST indicate this by entering the value “archived” in the record_status column.

Summary of process for using the CSV file update

  1. Receive the CSV file from IRD
  2. Check each row (repository) in the file for accuracy of the information
  3. Add new rows for any new repositories that you think should be included in the IRD
  4. Update the record_status column for each row:

In the table below, the following terms are used

Columns in the CSV file

Column Name Type Description Requirement Constraints & Rules Example
id UUID or nil The IRD ID for this repository (will be blank for a new repository) OPTIONAL - If there is already a record in the IRD for this repository, then this is REQUIRED
- If there is not yet a a record in the IRD for this repository, then this MUST be nil
1b74aa75-db97-4ea3-a344-baafc0911ee8
name Free text The name of the repository REQUIRED   Zenodo
homepage URL The URL of the homepage of the repository REQUIRED This MUST be a valid HTTP or HTTPS URI. https://zenodo.org/
contact Free text A way for users to contact the repository service. RECOMMENDED This is often a support email address, but MAY also be a URL to a “help-desk” or contact form, or even just a free-text instruction [email protected]
owner_ror HTTPS URL This is the Research Organization Registry (ROR) identifier for the organisation which owns this repository RECOMMENDED This MUST be a valid ROR in its HTTPS URI form. https://ror.org/01ggx4157
owner_name Free text The name of the organisation which owns this repository OPTIONAL This field is not used when the CSV is processed by IRD - it is provided only as a convenience for you to be able to recognise the owning organisation. The organisation is identified only by the value in the owner_ror column. European Organization for Nuclear Research
repository_type term This describes the “scope” of the repository. REQUIRED This field MUST contain one value from the list Repository Types (see below) generalist_repository
software term This identifies the software platform that the repository runs on. RECOMMENDED This field MUST contain one value from the list Software Platforms (see below) invenio
software_version Free text This is the version number or label of the software platform that the repository runs on. OPTIONAL   3
oai_pmh_base_url URL This is the Base URL of the repository’s OAI-PMH interface REQUIRED This MUST be a valid HTTP or HTTPS URI https://zenodo.org/oai2d
media_types list (terms) This describes the type(s) of content in the repository. REQUIRED This field MUST contain one or more values from the list Media Types (see below).
Separate each value with a “pipe” character: “|”
research-articles|conference-papers|research-data
primary_subject term This describes the primary subject/discipline of the content in the repository. REQUIRED This field MUST contain one value from the list Primary Subjects (see below) multidisciplinary
record_status term This identifies the status of the IRD record REQUIRED This field MUST contain one value from the list Record Statuses (see below).
- If the repository is no longer valid for inclusion in IRD, use the value “archived”
- If the repository has been checked, and all information is up-to-date, then use the value “reviewed”
- Otherwise, use the value “under_review”.
verified

Controlled lists of terms

Repository Types

Media Types

Primary Subjects

Software Platforms

Record Statuses