Description of "Transform a SKOS/RDF-XML file to a comma-separated CSV file" service
Files containing "skos:Concept" or "Concept" type "rdf:Description" are processed by this service.
This service generates a comma-separated CSV file from a valid SKOS/RDF-XML file. The output file can be imported into a spreadsheet (Excel, LibreOffice, etc.) for editing (see the import procedure in Excel later).
The data are transformed as follows:
- A first row "column headers" is created from the elements (skos or other properties) used to describe the different concepts of the RDF-XML file:
- An "ID" tag is created for concept identifiers.
- Properties with an "xml: lang" attribute are listed by cancatenating the element name (without namespace) with the language code (for example, "skos:prefLabel/@xml:lang='en'"gives the label "prefLabel_en").
- For properties that have an attribute other than "xml:lang":
- those corresponsding to the semantic relations ("skos:broader", "skos:narrower" et "skos:related") are translated into "broader_en", "narrower_en" and "related_en",
- the others (mapping properties, etc.) are output with the name of the element only (without namespace, for example,"exactMatch" for "skos:exactMatch").
- Properties that have no attributes are output with the element name only (without namespace).
- If the file contains collections, a "group_en" label is created. This label can be redundant if the concepts contain properties reflecting their belonging to groups (domain, microthesaurus, etc.).
- Then, a line is generated for each concept of the file:
- the value of the "rdf:about" attribute is put in the "ID" column,
- the content of the textual elements (terms, definitions, notes, etc.) is put in the column corresponding to that element and to the language code of that element,
- hierarchical and associative relations (links) are replaced by the corresponding English preferred terms,
- the content of the other elements is output as is,
- if the concept belongs to a collection, the English name of the collection is put in the "group_en" column.
It should be noted that:
- the contents of the different fields are put between quotes (quotation marks) to avoid the problems of separation when these contents contain the comma as element of punctuation,
- if the content of a field contains quotes, they are doubled to protect them,
- the contents of multiple-occurrence fields (for example, "skos:altLabel") are dropped into the same "cell" but separated by this separator "ยงยง".
How to import a CSV file to Excel:
- Create a new file in Excel ("File" / "New").
- Click on "Data" menu, choose "From Text" and then choose the file to import.
- Import the file ("Import" button).
- At the Text Importation Wizard:
- choose "Delimited",
- at the "File origin" menu, choose "65001 : Unicode (UTF-8)"
- Click the "Next" button:
- At the "Delimiters" column, select "Comma",
- Keep quotes (") as "Text qualifier",
- Check the imported data with the "Data Preview",
- Click the "Finish" button.
The file modified in Excel and saved as CSV file can be transformed into SKOS using the service "Transform a semicolon-separated CSV file into a SKOS-XML file" or "Transform a comma-separated CSV file into a SKOS-XML file" depending on the type of separator used while saving the csv file.