Data policy and guidelines for researchers
Data policy
The safe deposit was developed at the unit for field based forest research (“the unit”) at SLU with the specific intent that it should act as a permanent data repository and archive of data generated by researchers using the research infrastructures provided by the unit. Additionally, we would like to see ongoing projects as living entities that grow with time and can act as a platform for collaboration. Therefore we provide two modes for files that have been uploaded to each project: 1. When the file is first uploaded it can be modified and deleted by any project collaborator with sufficient access rights to the project. 2. When the project moves more toward acting as storage for complementary data we provide the functionality of making files immutable, meaning that they can no longer be changed or deleted by any user of the system. Immutable files are easily spotted on project pages since they are marked with a key symbol.
Whenever a file is uploaded to the safe deposit it is given a Unique Universal Identifier (UUID), this UUID remains the same even if the file is made immutable later on. Each file is accessible by visiting https://www.safedeposit.se/assets/xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx
where the string of x’s and dashes is replaced by the specific UUID. File immutability can also be verified by performing a GET request to https://www.safedeposit.se/assets/xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx/immutable
(for example by visiting the address using a web browser) which will return a plain-text answer including the UUID and the filename.
Data safety
The Safe Deposit is developed in-house at the unit and hosted on our very own hardware. The system itself builds on tried and true Open Source software and follows tight security conventions. This means that you can feel confident that your data will not be compromised before you are ready to share it with the world.
Additionally, all storage used for the safe deposit is configured in RAID 1, meaning that each hard-drive has one completely redundant fallback. Furthermore, encrypted backups of the system and all files are taken once per day and stored in two separate locations for several days, providing very robust protection against drive failure and potential data loss.
Data upload guidelines
As a digital platform, the safe deposit faces certain challenges regarding storage techniques and making sure data is available in the future, therefore we provide these guidelines for users of our platform.
- Whenever possible, store information in plain-text:
- Prefer delimited text files (e.g. CSV-files) over Excel-files or similar formats.
- If this is not possible, always use .xlsx or .ods instead of .xls, since .xls files are binary while .xlsx and .ods is compressed XML. This means that the data from these formats can, at least in theory, more easily be extracted as plain text without specialized software.
- Do not upload primarily tabular data as PDF-files, PDFs are not designed to store tabular data.
- Prefer plain text-files over Word-files or similar formats.
- For example, Markdown is a concise formatting language that is easy to learn and provides many formatting options (including equations) while still being easily readable in any old text editor.
- If this is not possible, always use .docx or .odt instead of .doc, since .doc files are binary while .docx and .odt is compressed XML. This means that the data from these formats can, at least in theory, more easily be extracted as plain text without specialized software.
- Prefer PDF-files over World-documents (or similar), since PDF is a very well established and self-contained format.
- Avoid the shape file format when uploading GIS-data to the safe deposit, prefer coordinates in a table of comma separated values or GeoJSON.
- Shape is a cumbersome format, both since several files are needed to contain both the GIS-data and the attribute table and because the format is binary, proprietary, and based on old technology. The safe deposit can automatically convert shapefiles to GeoJSON.
- Prefer delimited text files (e.g. CSV-files) over Excel-files or similar formats.
- In case you upload custom software packages (developed as part of the project), include the source code whenever possible.
- With the source code available, the software can be recompiled for different processor architectures and relatively easily translated to more modern technologies in the future.
- Alternatively, describe very clearly how the software works.
- When uploading archived files, prefer .zip or .tar-formats over alternatives, since these formats have been around for a very long time and are unlikely to go away any time soon.