This infographic is available in Catalan and English at this link: http://hdl.handle.net/2072/537020
Best practices recommended for the preservation of CSV files
The general guidelines for ensuring data quality are applicable to any data format, but there are also guidelines that apply specifically to certain formats. Here are some relevant for CSV.
Include one table of data per file
Each CSV file should only contain one table. If you are publishing a spreadsheet containing multiple sheets, you should create a CSV file for each sheet.
Avoid including additional information in the data file
A CSV file should only contain the data that is going to be actionable in a reuse, i.e., the column header (optional) and the values of each record in the table.
Ensure that all rows have the same number of columns
Each row or record in a CSV file must contain the same number of columns. This implies that each row must have the same number of delimiters.
Include a single first row header
Data tables may optionally contain one and only one header line to specify field names. The header row is a type of annotation or metadata that names each column and is not part of the data.
Try to avoid Microsoft Excel when exporting CSVs
It often mishandles different text encodings. We recommend using LibreOffice Calc for manual work with CSVs.
Use UTF-8 encoding
Save CSVs in UTF-8 (or UTF-16) encoding format whenever possible.
Incorporate a .csvt file to specify CSV column data type
Add an additional metadata CSVT or a data dictionary. This will allow you to explain the meaning and type of each variable/column header.
Source:
“Guía práctica para la mejora de la calidad de datos abiertos: Secretaría de Estado de Digitalización e Inteligencia Artificial del Ministerio de Asuntos Económicos y Transformación Digital” (Septiembre 2022). https://datos.gob.es/ca/documentacion/guia-practica-para-la-mejora-de-la-calidad-de-datos-abiertos
Last updated: 23/02/2024