In planning a research project, it is important that you consider which file formats you will use to store your research data, because this decision has a direct impact on sharing and reuse.

File formats suitable for long-term preservation can be classified into two categories: preferred formats and acceptable formats.

  • Preferred formats: are the formats that they will offer the best long-term guarantees in terms of usability, accessibility and sustainability, and should preferably be used.
  • Acceptable formats: are the formats that are frequently used in the scientific community and which will be moderately to reasonably usable, accessible and robust in the long term.

As a general guideline is considered the file formats best suited for long-term sustainability and accessibility have the following features:

  • Are frequently used by the scientific community.
  • Have open and documented standards.
  • Are not proprietary, that is they are independent of specific software, developers or vendors.
  • Have standard representation (ASCII, UNicode).
  • Are unencrypted and uncompressed.

Best practices

Whenever possible, you should save data in an open and sustainable format, (proprietary software often allows «saving as» an open format without difficulty). If conversion to an open data format generates a loss of data from the files, consider saving them in both, proprietary and open formats. In this way, at least part of the information will be available.

When it is necessary to save files in a proprietary format, it is recommended to include a README file as a guide for documenting the name and version of the software used to generate the file, as well as the company that made the software. It can be very helpful in case someone need to open these files.

To avoid the risk of obsolescence and ensure the accessibility and sustainability of files, several measures can be taken. One is to use file formats that have a high probability of remaining usable for many years.

Type of dataPreferred formats (recommended for sharing, reuse and preserving)Acceptable formats
TextPDF/A (.pdf)
ODT (.odt)
Microsoft Word (.doc)
Office Open XML (.docx)
Rich Text File (.rtf)
PDF other than PDF/A (.pdf)
Plain textUnicode text (.txt)Non-Unicode text (.txt)
Mark-up languageXML (.xml)
YAML (.yaml)
JSON (.json)
ReStructuredText (.rst)
Related files: .css, .xslt, .js, .es
HTML (.html)
SGML (.sgml)
Markdown (.md)
Programming languagesNetCDF
TextFabric
R (.r)
MATLAB (.mat)
SpreadsheetsCSV (.csv)
ODS (.ods)
Microsoft Excel (.xls)
Office Open XML Workbook (.xlsx)
DatabasesCSV (.csv)
SQL (.sql)
SIARD (.siard)
FITS (.fits, .fit, .fts)
(Apache) Parquet (.parquet)
Microsoft Access (.mdb, .accdb)
dBase (.dbf)
HDF5 (.hdf5, .he5, .h5)
Statistical dataSPSS (.dat/.sps)
STATA (.dat/.DO)
R (.rdat/.rdara)
SPSS Portable (.por)
SPSS (.sav)
STATA (.dta)
SAS (.7dat; .sd2; .tpt)
ImagesTIFF (.tif, .tiff)
JPEG (.jpg, .jpeg)
PNG (.png)
JPEG 2000 (.jp2)
Vector imagesSVG (.svg)Adobe Illustrator (.ai)
EPS (.eps)
WMF/EMF (.wmf, .emf)
CDR (.cdr)
AudioBWF (.bwf)
MXF (.mxf)
Matroska (.mka)
FLAC (.flac)
OPUS
WAVE (.wav)
MP3 (.mp3)
AAC (.aac, .m4a)
AIFF (.aif, .aiff)
OGG (.ogg)
VideoMXF (.mxf)
Matroska (.mkv)
MPEG-4 (.mp4, .m4a, .m4v)
MPEG-2 (.mpg, .mpeg, .m2v, mpg2)
AVI (.avi)
QuickTime (.mov, .qt)
Computer Aided Design (CAD)AutoCAD DXF versioR12 (ASCII) (.dxf)
SVG (.svg)
AutoCAD other versions than R12 (ASCII) (.dwg, .dxf)
DWG (.dwg)
DGN (.dgn)
3DWaveFront Object (.obj)
Polygon file format (.ply)
X3D (.x3d)
COLLADA (.dae)
Autodesk FBX (.fbx)
Blender (.blend)
3D PDF (.pdf)
Geographical Information Systems (GIS)GML (.gml)
MIF/MID (.mif/.mid)
Esri Shapefiles (.shp & related files)
MapInfo (.tab & related files)
KML (.kml)
Esri Geodatabase (.gdb)
Project files/Workspaces
(.mxd, .wor, .qgs)
Georeferenced imagesGeoTIFF (.tif, .tiff)TIFF World File (.tfw & .tif, possibly with additional files)
JPEG World File (.jgw & .jpg, possibly with additional files)
ERDAS IMAGINE File Format (.img)
Raster GISASCII GRID (.asc, .txt)Esri GRID (.grd & related files)
Surfer Grid (.grd; .srf)
ERDAS IMAGINE File Format (.img)
RDFRDF/XML (.rdf)
Trig (.trig)
Turtle (.ttl)
NTriples (.nt)
JSON-LD
Computer Assisted Qualitative Data Analysis (CAQDAS)REFI-QDA (Qualitative Data Analysis)ATLAS.TI Copy bundle
NVivo Project file
LiDAR (Light Detection and Ranging)LAS (.las)
PTS (.pts)
PTX (.ptx)
Based on the tables of file formats of Data Archiving and Networked Services (DANS) and recommendations of the Library of Congress and Consorci de Serveis Universitaris de Catalunya (CSUC).

Questions?

For more information and assistance, contact Documentation Centre and Library.


Source:

File formats recommended by the Catalan Institute of Classical Archaeology (ICAC). http://hdl.handle.net/2072/530624


Last updated: 21/02/2024