LibGuides: Library Research Support: Managing your data

Organising data

Keeping your data files well organised with a consistent system of file naming, versioning and folder structure will help you and your collaborators to easily locate and keep track of your data.

File naming

Consider developing file naming conventions early on in your project. Such conventions include

which terms to use in your file names (vocabulary)
which abbreviations to use
punctuation and spelling, eg will you use CamelCase or not, and will you use dashes (-) or underlines (_) instead of spaces
format of dates, eg YYYY-MM-DD is easier to sort than DD-MM-YYYY
versioning
the order of the elements in the filename

Other tips for naming your files

Make sure your file names are unique, and keep them independent of their location (‘interview_2015_05_01’ is better than ‘2015_05_01’ even if the file is located in a folder called ‘interview’)
Use file names that are concise but informative, so that you can tell the contents of the file without having to open it
Be consistent
Think about what comes first in the filename, because operating systems usually sort files alphabetically
It can be helpful to include a version number in the file name, especially when you have multiple versions of a file and when it is important to keep several versions

Folder structure

Also consider using a hierarchical file structure, where folders are nested in other folders. Try to make the categories of your folders not too broad — to avoid that a folder contains so many files it becomes difficult to manage — and not too deep — to avoid having to click through a large number of folders to find a file. The UK Data Archive advises to restrict the level of folders to three or four deep and not to have more than 10 subfolders in each folder.

It may be worth to reassess your folder structure now and then, perhaps moving unused items to an ‘Archive’.

Version control

It is easy to lose important information by accidentally saving over an existing file. Version control might help to prevent this. It may also help you to avoid working with outdated files. File versioning will also provide a record of how your work and the thinking behind it have developed.

There are several options when versioning your files. If you only need the latest version of each file, then you do not need to version — each new version can simply overwrite the old one. If versioning is required, than you could include a version number, a date, and/or the author’s initials in the filename when saving. This can be combined with the ‘track changes’ feature available in many software packages such as Microsoft Word. For some purposes dedicated version control software (such as Git) can be useful.

Guidance and good practice

Naming and organising files from Cambridge University Library
Choosing a filename from Jisc Digital Media (archived)

Online training module

Organising data is an excellent interactive online training module from the MANTRA project

Documenting data

It is good practice to start documenting your data at the very beginning of your research project, and to keep it up as the project progresses.

Good documentation makes your data discoverable, understandable and usable by yourself and others. It includes all contextual information that a future user may need to interpret your data, for example information about when, why, and by whom the data was created, what methods were used to collect the data, and any explanations of acronyms, coding, or jargon.

There are three ways you can add documentation to your data

data-level (embedded) documentation, which covers descriptions and annotations that are embedded in a data file
study-level (supporting) documentation, which describes the research project, the data creation process, and the general context, and is usually not embedded in a data file
catalogue metadata for discovery and machine-readable description

Data-level documentation

Data-level documentation describes the data that is contained within a file. This information can be integrated into the data file, for example as a header in a spreadsheet. It can also be recorded in a separate document, often a .txt file.

There is useful guidance to data level documentation from the UK Data Service From their guidance, data level documentation can include:

names, labels and descriptions for variables, records and their values
explanation of codes and classification schemes used
codes of, and reasons for, missing values
derived data created after collection, with code, algorithm or command file used to create them
weighting and grossing variables created and how they should be used
data list describing cases, individuals or items studied, for example for logging qualitative interviews

Study-level documentation

Study-level documentation is usually contained in a separate file that accompanies your data. It provides context to your research project.

This type of documentation can be seen as all the information necessary to allow for reuse.

It could also be seen as the information necessary to support reproducibility of your data: in experimental research this would be the information needed to re-run the experiment so that results can be confirmed; in observational research this would be the information required to derive the final results from your raw data, or to collect new data that may legitimately be compared with the original.

According to the UK Data Service, good study-level data documentation includes information on

the context of data collection: project history, aims, objectives and hypotheses
data collection methods: data collection protocols, sampling design, instruments used, hardware and software used, data scale and resolution, temporal coverage and geographic coverage, and digitisation or transcription methods
structure of data files, number of cases, records, variables and relationships between files
data sources used and provenance of materials, eg for transcribed or derived data
data validation, checking, proofing, cleaning and other quality assurance procedures carried out, such as checking for equipment and transcription errors, calibration procedures, data capture resolution and repetitions, or editing, proofing or quality control of materials
modifications made to data over time since their original creation and identification of different versions of datasets
for time series or longitudinal surveys, changes made to methodology, variable content, question text, variable labelling, measurements or sampling
information on data confidentiality, access and use conditions, where applicable

Often you will have already given sufficient study-level documentation in the form of laboratory notebooks, questionnaires, interview guides and protocols, working papers, final project reports, and publications. You can include these in your dataset, or refer to them if they are deposited in SHURA or otherwise made publicly available.

Catalogue metadata

Metadata simply means ‘data about data’. It can refer to all the contextual information that describes your data, as described above, but the term is often used more specifically to indicate highly structured information that conforms to certain international standards — a list of fields — and that is machine readable.

The UK Data Service has an excellent overview of catalogue metadata.

Catalogue metadata are usually assigned to your datasets by a repository or data archive at the moment when you deposit your materials with them. Examples of catalogue metadata are

creator
title
description
keywords

These catalogue metadata include all the information that is necessary for researchers that re-use your data, to cite your dataset appropriately.

Repositories may have different metadata requirements. For example, the SHU Research Data Archive (SHURDA) adds several metadata fields to the catalogue metadata, which include

keywords
collection period
geographic coverage
data collection method
data processing and preparation activities (which covers how the data was processed after it was collected)
resource language
additional information

Guidance and further reading

The UK Data Service has an excellent overview of documenting your data.
The University of Cambridge’s research data management Web pages offer information on Documentation and Metadata.

Online training

Documentation, metadata, citation is an interactive online training module from the MANTRA Project.
MANTRA also offers software practicals designed to enhance data handling skills in four software packages: SPSS, R, ArcGIS, and NVivo.

Storing and backing up

Data storage during your research project

Research Store

The University’s Research Data Management Policy asks for all live research data to be stored on the University’s networked storage facilities, and it recommends the use of the SHU Research Store There is no cap on the amount of storage a specific research project can use. Data will be backed up automatically to several locations on a daily basis, and are securely kept for a period of 90 days. The Research Store is conveniently accessible from wherever and whenever required, and access can be granted to students and third parties when required. It is therefore ideal for master copies of your research data.

Personal, confidential or sensitive data requires a storage solution that is compliant with the Data Protection Act and the University’s privacy policy. The Research Store can be used for these types of data under the condition that access permissions have been set up appropriately for a limited number of users. It is important to periodically review these access permissions, for example to reflect staff changes.

For more information see SHU Research Store below.

Options

Generally there are four options for data storage:

Networked drives.
You can store all kinds of information on your personal F: drive or the N: drive which is shared with members of staff. However, storage is limited and files on the N: drive cannot be shared with others than members of staff. Access to the Research Store can be granted to students and third parties when required.
Local drives on your PC or laptop.
Data can be lost because local drives can fail, or the computer may be lost or stolen — and, unless the information on them is encrypted, could be used by other people. Local drives may be convenient for short-term storage and data processing but they should normally not be relied upon for storing master copies.
Cloud-based storage.
The University has no control over cloud-based storage such as Dropbox and the company hosting your data will have access to all the material. Depending on their terms and conditions, they may also have the rights to use or publish the information in any way they choose — and their levels of security may not meet the level expected by members of staff and may not be backed up on a regular basis.The University does have an agreement with Google which allows staff to use a version of Google drive through their University log in details. This has different type of agreement from the personal versions of Google drive and keeps the copyright with the University, but is still not as secure as using the SHU Research Store. Please refer to the University’s Cloud Storage Policy if you are considering this option.Cloud storage solutions for personal, confidential and sensitive information — such as Dropbox and Google Drive — is not permitted, but exceptions may be made only if they meet the requirements of the University’s Electronic Data Encryption Policy and if they are agreed by local management and documented.
External portable storage devices.
External hard drives, USB drives, DVDs and CDs may be very convenient, cheap and portable, but they are not recommended for long-term storage. These types of portable storage can easily be lost, damaged or stolen — and, unless the information on them is encrypted, could be used by other people. They should never be used for unencrypted sensitive data.

Backing-up your research data

Ideally, backing-up happens automatically and to several locations. If you are using the SHU Research Store, all files are automatically backed up every night. Two copies of each daily backup are kept on two separate locations, which means they are secure and protected by firewalls and access permissions. Each backup will be kept for 90 days. They can be restored on request if deleted by mistake or if older versions of the files need to be recovered. The SHU Research Store is therefore a good place for the master copy of your data.

The 3-2-1-rule is a simple way to remember best practice for backing up.

3. Keep 3 copies of important files (a primary and two backups)
2. on 2 different media types (such as encrypted: hard drives, memory sticks, CDs and online storage)
1. with 1 copy being stored offsite (or online)

Security

Data security is needed to prevent unauthorised access to data, which may lead to disclosure of personal or sensitive data, or to changes to data or even their destruction. The principle investigator is responsible for ensuring data security.

Personal, confidential or sensitive data

Personal, confidential or sensitive data need higher levels of security than other data. In those cases, it is important that the storage solution you choose is compliant with the Data Protection Act and the University’s privacy policy. The University has Guidance on Data Protection (staff only) and guidance on the use of personal data by students.

Portable devices

Storage on portable devices and transferring personal information from one medium to another (for example via email) needs to be done with special care. If research data needs to be stored temporarily on portable devices such as laptops, tablets, phones, CDs and USB sticks, the researcher must ensure that this is done securely and that they comply with the University’s Electronic Data Encryption Policy. DTS publish Data Encryption Guidance (staff only).

Transferring

Transferring any personal, confidential and sensitive information also requires encryption. When sending these data via email, the email needs to be encrypted.

When sending data via a USB stick, an encrypted USB stick should be used — FIPS 140-2 compliant USB sticks (conforming to normal encryption requirements) are available through the DTS self-service portal.

Data can also be sent and received securely using the SHU ZendTo service.

SHU policies and guidance

the University’s privacy policy
the University’s Research Data Management Policy
the University’s Cloud Storage Policy (DTS) as well as the Electronic Information Security Framework (EISF) for all SHU policies relating to storing and transmitting electronic information
the University’s Electronic Data Encryption Policy (DTS) and Data Encryption Guidance (DTS) (staff only)
Guidance on the use of personal data by students: your responsibilities

Research store (Q:)

The SHU Research Store service (aka Q:\Research drive) provides shared storage for currently active research projects. This includes research data as well as supporting materials.

How to request a folder

If you are a researcher or doctoral student attached to a Research Centre please email your Research Store Custodian to arrange project space and access.

If you are not attached to a research centre or your research centre has no custodian then please contact the IT Service Desk. Indicate where you are based, and who should have access.

If you are a doctoral student not attached to a Research Centre, please ask your Director of Studies to request a folder on the SHU Research Store for you.

In all cases:

If you need one or more additional folders with tighter access restrictions, for example to deal with confidential materials that not all project members should be able to access, then this can be included in your request.
If external access is required this must also be approved by the custodians. Include the external person’s name and email address in the request.

What are the benefits of the SHU Research Store?

Keeping research data in a central store allows them to be easily accessible to all researchers in a group on and off campus from a wide range of devices.
Files can be easily shared between all members of a research group whether staff, students or external researchers. Each folder will be restricted to the group unless a special request is made to give access to others.
All files are backed up every night to a remote location which means they are secure and protected by firewalls and access permissions.
Files can be restored on request if deleted by mistake or if older versions of the files need to be recovered.
Alternative methods of storage — such as USB sticks, DVDs or portable hard drives – unless encrypted, are not as secure and can be lost or stolen. Cloud-based drop boxes may give the companies hosting the storage rights to use or publish the information should they wish.

Overview of the service

The SHU Research Store service (aka Q:\Research drive) provides shared storage for currently active research projects. This includes research data as well as supporting materials. From a SHU Managed Desktop PC this storage is accessed through Q:\Research. A shared folder is provided for each research centre. Sub-folders are created for research projects as required. Access to these folders is restricted to researchers working on the project. This includes SHU staff and students as well as researchers based in other organisations. The SHU Research Store is accessible both on and off campus from a wide range of devices. At project close down all data relating to the project should be securely archived and deleted from this service.

Folder structures and access rights

Access to data stored in the Research Store is restricted by permissions set on the file system. Each research centre has a folder in the Research Store. By default, members of a research centre can access files and folders within their research centre’s main folder. Subfolders can be created for individual research projects and access can be restricted to researchers working on that project. This can include staff and students within the research centre owning the project as well as staff and students from other faculties in the university and external researchers working in other organisations. When changes to permissions are requested, a nominated person within the owning research centre authorises the change. DTS staff administering the service then set the permissions. The only people who can access the data in any particular folder are those researchers who have been authorised to do so and the system administrators in the DST. Authorised researchers can access data held in the Research Store from university PCs and Macs as well as through mobile devices such as smartphones and tablets. When working off campus the Research Store is accessible using a web browser.

Automatic backups

The primary copy of the data is stored on a storage array located in one of the university’s data centres. As data is written it is replicated over a secure private network to a storage array located in the other data centre. This provides an up to date second copy of the data giving us excellent disaster recovery capabilities. Either data centre is capable of delivering the Research Store service should the operation of one of the data centres be impacted for any reason.

Backup copies of all files are stored on a separate system. This is primarily to protect against the accidental deletion of files. It also enables us to recover files if data stored on the Research Store is corrupted for any reason. Backup copies are taken every evening and retained for 90 days. Two copies of the backup data are created and one is retained in each university data centre. The transfer of backup data between data centres happens electronically so data on backup media is never removed from the secure environment of the data centres. All backup media is securely disposed of when it is decommissioned and certificates of secure data destruction are obtained.

Security

All data placed in the Research Store is stored on the university’s Storage Area Network (SAN). This provides highly available, robust storage. The SAN comprises of a number of separate storage arrays located in two separate university data centres. Each of these arrays has a number of high availability features with no single point of failure.

Each data centre provides a secure controlled environment housing the university’s servers, storage and network equipment. Physical access is controlled by an DTS Operations team and monitored via CCTV. Intruder detection alarms connect back the university’s central security control. Each data centre has its own uninterruptable power supply (UPS) and generator. In the event of disruption to the mains power supply, this is capable of maintaining power to all equipment in the data centres for an extended period of time. The data centres are located of separate university campuses approximately 1 mile apart.

Access to the Research Store over the network is secured by a number of methods. Users are required to enter a valid username and password before accessing the research store. The service is protected from malicious attack by firewalls and anti-virus software. Systems are patched on a regular basis to protect against known vulnerabilities. All data transfers over the internet are encrypted.

All storage in the SAN is securely disposed of when it is decommissioned at the end of its useful life. Certificates of secure data destruction covering the SAN storage are obtained as part of the decommissioning process. Disks and tapes that are used for the SHU Research Store are destroyed in ways that comply with ISO 27001:2005 Security Management certification, and ISO 14001:2004 Environmental Standards.