Keeping your data files well organised with a consistent system of file naming, versioning and folder structure will help you and your collaborators to easily locate and keep track of your data.
Consider developing file naming conventions early on in your project. Such conventions include
Other tips for naming your files
Also consider using a hierarchical file structure, where folders are nested in other folders. Try to make the categories of your folders not too broad — to avoid that a folder contains so many files it becomes difficult to manage — and not too deep — to avoid having to click through a large number of folders to find a file. The UK Data Archive advises to restrict the level of folders to three or four deep and not to have more than 10 subfolders in each folder.
It may be worth to reassess your folder structure now and then, perhaps moving unused items to an ‘Archive’.
It is easy to lose important information by accidentally saving over an existing file. Version control might help to prevent this. It may also help you to avoid working with outdated files. File versioning will also provide a record of how your work and the thinking behind it have developed.
There are several options when versioning your files. If you only need the latest version of each file, then you do not need to version — each new version can simply overwrite the old one. If versioning is required, than you could include a version number, a date, and/or the author’s initials in the filename when saving. This can be combined with the ‘track changes’ feature available in many software packages such as Microsoft Word. For some purposes dedicated version control software (such as Git) can be useful.
It is good practice to start documenting your data at the very beginning of your research project, and to keep it up as the project progresses.
Good documentation makes your data discoverable, understandable and usable by yourself and others. It includes all contextual information that a future user may need to interpret your data, for example information about when, why, and by whom the data was created, what methods were used to collect the data, and any explanations of acronyms, coding, or jargon.
There are three ways you can add documentation to your data
Data-level documentation describes the data that is contained within a file. This information can be integrated into the data file, for example as a header in a spreadsheet. It can also be recorded in a separate document, often a .txt file.
There is useful guidance to data level documentation from the UK Data Service From their guidance, data level documentation can include:
Study-level documentation is usually contained in a separate file that accompanies your data. It provides context to your research project.
This type of documentation can be seen as all the information necessary to allow for reuse.
It could also be seen as the information necessary to support reproducibility of your data: in experimental research this would be the information needed to re-run the experiment so that results can be confirmed; in observational research this would be the information required to derive the final results from your raw data, or to collect new data that may legitimately be compared with the original.
According to the UK Data Service, good study-level data documentation includes information on
Often you will have already given sufficient study-level documentation in the form of laboratory notebooks, questionnaires, interview guides and protocols, working papers, final project reports, and publications. You can include these in your dataset, or refer to them if they are deposited in SHURA or otherwise made publicly available.
Metadata simply means ‘data about data’. It can refer to all the contextual information that describes your data, as described above, but the term is often used more specifically to indicate highly structured information that conforms to certain international standards — a list of fields — and that is machine readable.
The UK Data Service has an excellent overview of catalogue metadata.
Catalogue metadata are usually assigned to your datasets by a repository or data archive at the moment when you deposit your materials with them. Examples of catalogue metadata are
These catalogue metadata include all the information that is necessary for researchers that re-use your data, to cite your dataset appropriately.
Repositories may have different metadata requirements. For example, the SHU Research Data Archive (SHURDA) adds several metadata fields to the catalogue metadata, which include
Guidance and further reading
Data storage during your research project
The University’s Research Data Management Policy asks for all live research data to be stored on the University’s networked storage facilities, and it recommends the use of the SHU Research Store There is no cap on the amount of storage a specific research project can use. Data will be backed up automatically to several locations on a daily basis, and are securely kept for a period of 90 days. The Research Store is conveniently accessible from wherever and whenever required, and access can be granted to students and third parties when required. It is therefore ideal for master copies of your research data.
For more information see SHU Research Store below.
Generally there are four options for data storage:
Backing-up your research data
Ideally, backing-up happens automatically and to several locations. If you are using the SHU Research Store, all files are automatically backed up every night. Two copies of each daily backup are kept on two separate locations, which means they are secure and protected by firewalls and access permissions. Each backup will be kept for 90 days. They can be restored on request if deleted by mistake or if older versions of the files need to be recovered. The SHU Research Store is therefore a good place for the master copy of your data.
The 3-2-1-rule is a simple way to remember best practice for backing up.
3. Keep 3 copies of important files (a primary and two backups)
2. on 2 different media types (such as encrypted: hard drives, memory sticks, CDs and online storage)
1. with 1 copy being stored offsite (or online)
Data security is needed to prevent unauthorised access to data, which may lead to disclosure of personal or sensitive data, or to changes to data or even their destruction. The principle investigator is responsible for ensuring data security.
Personal, confidential or sensitive data
Storage on portable devices and transferring personal information from one medium to another (for example via email) needs to be done with special care. If research data needs to be stored temporarily on portable devices such as laptops, tablets, phones, CDs and USB sticks, the researcher must ensure that this is done securely and that they comply with the University’s Electronic Data Encryption Policy. DTS publish Data Encryption Guidance (staff only).
Transferring any personal, confidential and sensitive information also requires encryption. When sending these data via email, the email needs to be encrypted.
When sending data via a USB stick, an encrypted USB stick should be used — FIPS 140-2 compliant USB sticks (conforming to normal encryption requirements) are available through the DTS self-service portal. Please note- at present DTS cannot supply encrypted USB sticks.
Data can also be sent and received securely using the SHU ZendTo service.
SHU policies and guidance
The SHU Research Store service (aka Q:\Research drive) provides shared storage for currently active research projects. This includes research data as well as supporting materials.
How to request a folder
If you are a researcher or doctoral student attached to a Research Centre please email your Research Store Custodian to arrange project space and access.
If you are not attached to a research centre or your research centre has no custodian then please contact the IT Service Desk. Indicate where you are based, and who should have access.
If you are a doctoral student not attached to a Research Centre, please ask your Director of Studies to request a folder on the SHU Research Store for you.
In all cases:
What are the benefits of the SHU Research Store?
Overview of the service
The SHU Research Store service (aka Q:\Research drive) provides shared storage for currently active research projects. This includes research data as well as supporting materials. From a SHU Managed Desktop PC this storage is accessed through Q:\Research. A shared folder is provided for each research centre. Sub-folders are created for research projects as required. Access to these folders is restricted to researchers working on the project. This includes SHU staff and students as well as researchers based in other organisations. The SHU Research Store is accessible both on and off campus from a wide range of devices. At project close down all data relating to the project should be securely archived and deleted from this service.
Folder structures and access rights
Access to data stored in the Research Store is restricted by permissions set on the file system. Each research centre has a folder in the Research Store. By default, members of a research centre can access files and folders within their research centre’s main folder. Subfolders can be created for individual research projects and access can be restricted to researchers working on that project. This can include staff and students within the research centre owning the project as well as staff and students from other faculties in the university and external researchers working in other organisations. When changes to permissions are requested, a nominated person within the owning research centre authorises the change. DTS staff administering the service then set the permissions. The only people who can access the data in any particular folder are those researchers who have been authorised to do so and the system administrators in the DST. Authorised researchers can access data held in the Research Store from university PCs and Macs as well as through mobile devices such as smartphones and tablets. When working off campus the Research Store is accessible using a web browser.
The primary copy of the data is stored on a storage array located in one of the university’s data centres. As data is written it is replicated over a secure private network to a storage array located in the other data centre. This provides an up to date second copy of the data giving us excellent disaster recovery capabilities. Either data centre is capable of delivering the Research Store service should the operation of one of the data centres be impacted for any reason.
Backup copies of all files are stored on a separate system. This is primarily to protect against the accidental deletion of files. It also enables us to recover files if data stored on the Research Store is corrupted for any reason. Backup copies are taken every evening and retained for 90 days. Two copies of the backup data are created and one is retained in each university data centre. The transfer of backup data between data centres happens electronically so data on backup media is never removed from the secure environment of the data centres. All backup media is securely disposed of when it is decommissioned and certificates of secure data destruction are obtained.
All data placed in the Research Store is stored on the university’s Storage Area Network (SAN). This provides highly available, robust storage. The SAN comprises of a number of separate storage arrays located in two separate university data centres. Each of these arrays has a number of high availability features with no single point of failure.
Each data centre provides a secure controlled environment housing the university’s servers, storage and network equipment. Physical access is controlled by an DTS Operations team and monitored via CCTV. Intruder detection alarms connect back the university’s central security control. Each data centre has its own uninterruptable power supply (UPS) and generator. In the event of disruption to the mains power supply, this is capable of maintaining power to all equipment in the data centres for an extended period of time. The data centres are located of separate university campuses approximately 1 mile apart.
Access to the Research Store over the network is secured by a number of methods. Users are required to enter a valid username and password before accessing the research store. The service is protected from malicious attack by firewalls and anti-virus software. Systems are patched on a regular basis to protect against known vulnerabilities. All data transfers over the internet are encrypted.
All storage in the SAN is securely disposed of when it is decommissioned at the end of its useful life. Certificates of secure data destruction covering the SAN storage are obtained as part of the decommissioning process. Disks and tapes that are used for the SHU Research Store are destroyed in ways that comply with ISO 27001:2005 Security Management certification, and ISO 14001:2004 Environmental Standards.