In planning a research project, it is important that you consider which file formats you will use to store your research data, because this decision has a direct impact on sharing and reuse.
File formats suitable for long-term preservation can be classified into two categories: preferred formats and acceptable formats.
- Preferred formats: are the formats that they will offer the best long-term guarantees in terms of usability, accessibility and sustainability, and should preferably be used.
- Acceptable formats: are the formats that are frequently used in the scientific community and which will be moderately to reasonably usable, accessible and robust in the long term.
As a general guideline is considered the file formats best suited for long-term sustainability and accessibility have the following features:
- Are frequently used by the scientific community.
- Have open and documented standards.
- Are not proprietary, that is they are independent of specific software, developers or vendors.
- Have standard representation (ASCII, UNicode).
- Are unencrypted and uncompressed.
Best practices
Whenever possible, you should save data in an open and sustainable format, (proprietary software often allows «saving as» an open format without difficulty). If conversion to an open data format generates a loss of data from the files, consider saving them in both, proprietary and open formats. In this way, at least part of the information will be available.
When it is necessary to save files in a proprietary format, it is recommended to include a README file as a guide for documenting the name and version of the software used to generate the file, as well as the company that made the software. It can be very helpful in case someone need to open these files.
To avoid the risk of obsolescence and ensure the accessibility and sustainability of files, several measures can be taken. One is to use file formats that have a high probability of remaining usable for many years.
Type of data | Preferred formats (recommended for sharing, reuse and preserving) | Acceptable formats |
Text | PDF/A (.pdf) ODT (.odt) | Microsoft Word (.doc) Office Open XML (.docx) Rich Text File (.rtf) PDF other than PDF/A (.pdf) |
Plain text | Unicode text (.txt) | Non-Unicode text (.txt) |
Mark-up language | XML (.xml) YAML (.yaml) JSON (.json) ReStructuredText (.rst) Related files: .css, .xslt, .js, .es | HTML (.html) SGML (.sgml) Markdown (.md) |
Programming languages | NetCDF TextFabric R (.r) | MATLAB (.mat) |
Spreadsheets | CSV (.csv) ODS (.ods) | Microsoft Excel (.xls) Office Open XML Workbook (.xlsx) |
Databases | CSV (.csv) SQL (.sql) SIARD (.siard) FITS (.fits, .fit, .fts) (Apache) Parquet (.parquet) | Microsoft Access (.mdb, .accdb) dBase (.dbf) HDF5 (.hdf5, .he5, .h5) |
Statistical data | SPSS (.dat/.sps) STATA (.dat/.DO) R (.rdat/.rdara) | SPSS Portable (.por) SPSS (.sav) STATA (.dta) SAS (.7dat; .sd2; .tpt) |
Images | TIFF (.tif, .tiff) JPEG (.jpg, .jpeg) PNG (.png) JPEG 2000 (.jp2) | |
Vector images | SVG (.svg) | Adobe Illustrator (.ai) EPS (.eps) WMF/EMF (.wmf, .emf) CDR (.cdr) |
Audio | BWF (.bwf) MXF (.mxf) Matroska (.mka) FLAC (.flac) OPUS | WAVE (.wav) MP3 (.mp3) AAC (.aac, .m4a) AIFF (.aif, .aiff) OGG (.ogg) |
Video | MXF (.mxf) Matroska (.mkv) | MPEG-4 (.mp4, .m4a, .m4v) MPEG-2 (.mpg, .mpeg, .m2v, mpg2) AVI (.avi) QuickTime (.mov, .qt) |
Computer Aided Design (CAD) | AutoCAD DXF versioR12 (ASCII) (.dxf) SVG (.svg) | AutoCAD other versions than R12 (ASCII) (.dwg, .dxf) DWG (.dwg) DGN (.dgn) |
3D | WaveFront Object (.obj) Polygon file format (.ply) X3D (.x3d) COLLADA (.dae) | Autodesk FBX (.fbx) Blender (.blend) 3D PDF (.pdf) |
Geographical Information Systems (GIS) | GML (.gml) MIF/MID (.mif/.mid) | Esri Shapefiles (.shp & related files) MapInfo (.tab & related files) KML (.kml) Esri Geodatabase (.gdb) Project files/Workspaces (.mxd, .wor, .qgs) |
Georeferenced images | GeoTIFF (.tif, .tiff) | TIFF World File (.tfw & .tif, possibly with additional files) JPEG World File (.jgw & .jpg, possibly with additional files) ERDAS IMAGINE File Format (.img) |
Raster GIS | ASCII GRID (.asc, .txt) | Esri GRID (.grd & related files) Surfer Grid (.grd; .srf) ERDAS IMAGINE File Format (.img) |
RDF | RDF/XML (.rdf) Trig (.trig) Turtle (.ttl) NTriples (.nt) JSON-LD | |
Computer Assisted Qualitative Data Analysis (CAQDAS) | REFI-QDA (Qualitative Data Analysis) | ATLAS.TI Copy bundle NVivo Project file |
LiDAR (Light Detection and Ranging) | LAS (.las) PTS (.pts) PTX (.ptx) |
Source:
File formats recommended by the Catalan Institute of Classical Archaeology (ICAC). http://hdl.handle.net/2072/530624
Last updated: 21/02/2024