Research data are a valuable resource. It is University policy that all research data, including that from doctoral research, should be kept for a minimum of 10 years after completion of the project. This covers both physical and digital data. For research funded by the Medical Research Council this may be considerably longer.
These data are required to substantiate any research findings that are published or reported, so that your peers may be able to validate the findings. Sharing research outcomes also enables future researchers to open up new lines of inquiry or develop new insights based on your data, without the duplication of effort that would be needed to collect the data again, if re-collecting the same data would be feasible or possible in the first place. Increasingly, research funders encourage the sharing of data.
Thus when you get to the end of your research project, there are several things you will need to do:
One of the first things to do when preparing your data for preservation and sharing is to select the data that you are going to keep.
Decisions on what data to keep are left to the discretion of researchers, taking into account
Decisions about what data to keep should ideally be considered at the planning stage, ie when you write your data management plan and obtain ethical approval, and in any case well before the end of your project.
Before depositing data ensure that all personally identifying information has been removed.
Guidelines for selecting data
According to guidance from the Digital Curation Centre, at least three considerations should be made when determining what primary research data to keep.
1. What is the purpose that the data could fulfil?
Datasets can be defined by the purpose of keeping them
The minimum requirements for keeping datasets depend on this purpose.
The preserved dataset(s) should allow full scrutiny of the research output; this should usually consist of the dataset that was used to reach the conclusions in the research output, and any additional data that is required to replicate the reported study findings in their entirety. This is known as the ‘replication standard’.
The preserved dataset(s) would normally consist of the raw primary data that was collected or created, possibly after any noise has been removed, and always under the condition that these data are fully documented in such a way that they are usable by other researchers within relevant subject domain(s).
2. What data must be kept (or destroyed) because of policies and regulations
Policies and regulations may require you to keep certain data
The General Data Protection Regulation (GDPR) and Freedom of Information Act (FOI) may require you to keep and/or destroy certain data.
General Data Protection Regulation
The GDPR states that non-anonymised personal and sensitive personal data ‘processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes’. This means that non-anonymised data can usually only be kept beyond the duration of the research project if the conditions for the research exemption in the GDPR are met, i.e.:
In addition it can be shown that the data is of long-term academic interest to the researcher or the academic community and that the data will be protected against unauthorised access. SHU provides detailed guidance on data protection for research participants.
Freedom of Information
The University may be required to disclose some research data to third parties via a Freedom of Information (FOI) or Environmental Information Regulations (EIR) request. Once the request has been received, it is a criminal offence to delete the data or datasets that have been requested (a so-called ‘shredding offence’). There may be reasons not to provide the requested data; these could be the same as the constraints on data sharing you may have mentioned in your data management plan. Furthermore, on 1 October 2014 a new exemption for research has been added to the FOI Act which states that research data may be exempt from FOI requests if
This and other exemptions need to be considered on a case by case basis.
The University has published information on its arrangements in respect of Freedom of Information. The Information Commissioner has published guidance on FOI and research data. Jisc have also published an overview how FOI relates to research data.
3. What data should be kept because it is of long-term value
Here is a short checklist that may help you to determine whether your data may be of long-term value
If your data can be considered of long-term value for any of the reasons above, as a general rule your data should be kept.
The following classification of research data, based on the 2008 report Stewardship of digital research data – principles and guidelines from the Research Information Network (archived), may be useful to help you determine your data’s long term value.
|observational data||these data are captured in real-time and usually cannot be reproduced; they are primary candidates for archiving||observations of ocean temperature on a specific date, medical scans and images, SEM images, interviews, and surveys|
|experimental data||these data are captured from laboratory equipment and are usually reproducible but reproduction may be costly or too complex to reproduce because of all the experimental variables||gene sequences, chromatograms, mircoassays|
|computational or simulation data||these data are generated by computational or simulation models; when complete information about the computer model and its execution (eg hardware, software, input data) is preserved, the output can in theory be reproduced; the model and its associated metadata may be more important than the output from the model||climate, mathematical and economic models|
|derived or compiled data||these data result from processing or combining ‘raw’ data and are often reproducible but reproduction may be costly||text and data mining, compiled databases|
A researcher collects information via paper questionnaires with both open ended and closed questions. Informed consent is captured on paper forms. The information in the questionnaires is digitally recorded in an Excel spreadsheet and the quantitative data is analysed in SPSS.
Paper consent forms
The paper consent forms should be kept as a responsibility of the individual researcher, as stated in the University’s records retention schedule. These consent forms do not need to be shared.
The paper questionnaires contain the raw primary data. If these data are digitally recorded, for example in an Excel spreadsheet including transcriptions of written answers to open ended questions, it may not be necessary to keep the paper questionnaires in which case they may be shredded. Otherwise it is essential to keep the paper questionnaires.
All paper data — consent forms and questionnaires — can be deposited in their paper form in the SHU Research Data Archive (SHURDA).
Digital processed data
The Excel spreadsheet in which the answers are recorded and analysed should be kept and documented. A pdf of the original questionnaire should also be retained.
When keeping your data for the long term it would also need sufficient documentation. There are two levels of documentation
Study-level documentation could be provided in a separate file outlining the research context and introducing the constituent parts of your dataset, but often you will have given sufficient study-level documentation in any research outputs that are based on these data, such as publications and final reports to funders. You can include these in your dataset, or refer to them if they are deposited in SHURA or otherwise publicly available.
A researcher interviews a number of participants who have given consent for their interviews to be audio recorded, transcribed, and their data to be shared once anonymised. The audio recordings are transcribed, and analysed using NVivo.
Paper consent forms
Audio recordings and transcriptions
The audio recordings may contain valuable information that cannot be fully captured in transcription but may be considered useful for future analysis. These files may be kept, but because they may be difficult to anonymise (just as video files would be difficult to anonymise) it may not be possible to share the audio recordings with others. It should be feasible, however, to anonymise the transcriptions, and share these as the primary data emanating from this research project.
Analysis in NVivo
The analysis in NVivo can be fully documented and saved. The University of Edinburgh’s online learning modules MANTRA Research Data Management Training has an excellent data handling tutorial in NVivo.
3. Laboratory measurements
A researcher produces experimental data by taking measurements with laboratory equipment located in the basement of the Harmer building. These experiments are documented in lab notebooks and the measurements are taken with proprietary software and saved as CSV files. These raw data are then entered into Excel spreadsheets, where any noise in the data is removed. Analysis of the data, resulting in graphs where the measurements are plotted against time and compared to calibration data from previous experiments, usually takes place in Excel as well.
This research produces a number of datasets which may need to be kept. It may be that only part of the paper lab notebooks used in the experiments are relevant for this particular project. These pages can be digitised and added to the digital dataset as necessary study-level documentation. If whole lab notebooks are relevant to the research project, than these may be kept in their non-digital form and deposited in the SHU Research Data Archive. When these analogue notebooks are deposited, they should be referred to in the digital dataset that is deposited, for example by using a persistent URL or DOI.
CSV and Excel files
Depending on common practice in the discipline and the judgment of the researcher in question, either the raw data and/or the processed data may need to be kept. In this case study, the measurements captured in CSV files directly from the laboratory equipment constitute the raw data. The Excel files in which any noise is removed are the processed data. The “replication standard” would require the raw data to be made available, as well as a clear description of how this data was processed in order to arrive at the results in the research paper in such a way that peers are able to replicate the results.
Making decisions about the long-term preservation of your research data includes thinking about retention periods, file formats suitable for long-term preservation, and finding a place where to deposit the data.
Where to deposit
Preserving the data generally means that the data should be deposited in a repository or archive during the project or shortly afterwards. There are many options, but in any case a record should be created in the SHU Research Data Archive (SHURDA) which points to the URL of the deposited data.
Before depositing data ensure that all personally identifying information has been removed.
1. Deposit with the SHU Research Data Archive (SHURDA)
If your funder does not expect you to deposit your data in a designated data archive and your journal does not provide a facility to preserve your data that also meets your funder’s requirements, then you could use SHU’s institutional data repository.
2. Deposit with an external data archive
Some funders have set up data archives specifically for the curation and dissemination of data created as part of their funded programmes. Examples are ESRC’s UK Data Service and the seven data centres that NERC supports. Researchers are often expected to deposit their data in these designated data archives. Other research funders expect you to deposit your data in an institutional or subject specific repository that is not supported by a research council.
Find your funders’ requirements
Find data archives
Some considerations when deciding on a repository
If you are considering using an external data archive and need advice, please contact firstname.lastname@example.org.
3. Submit your data to a journal
An increasing number of journals require that authors make their data promptly available to others without undue restrictions, such as the journals that are part of the Nature Publishing Group and the Public Library of Science PLOS. These data must generally be available to reader from the date of publication, and must be provided to the editor and peer-reviewers at submission. Some of these journals encourage data to be submitted as supplementary materials to the article; other journals require the data to be deposited and published in a repository. It is worth checking your journal’s data policy.
A note on project websites
You can also make your data available on your own project website, but this is generally not recommended. If you make your data available via a website, than you should also deposit the data in a discipline-based repository or SHU’s institutional data repository. Project websites offer little sustainability for your data for the longer-term, and unless you put specific procedures in place, it may be difficult to control who uses your data and how they use it. Also, a dataset in a repository is usually far easier to find by your peer researchers than an individual’s website.
The University’s Research Data Management policy stipulates that ‘data must be stored for a period at least as long as that required by any funder or sponsor of the research, any publisher of the research or as set out in the University’s Research and Knowledge Transfer Records Retention Schedule’.
The University Records Retention Schedule- Research clarifies that primary data generated by research — both on paper and in electronic form, and both by staff and postgraduate students — should be kept for a period of
expiry of “privileged access” / embargo period + 10 years
last date on which access to the data was requested by a third party + 10 years
Should an external funder stipulate a longer retention period, then the longer retention period shall apply. If the legal contract governing the research stipulates a longer or shorter period, then the retention period set out in the contract shall apply. For clinical and health studies that are funded by the Medical Research Council, retention periods are considerably longer — 20 yearscientific-data-policys if consent of individuals/patients was obtained, 30 years if it wasn’t.
The UKRI bodies have varying requirements, within a common data management framework. You will find an overview of funders’ data sharing and retention policies in the table below. The DCC also provides information about funders’ data policies.
Overview of funders’ data sharing and retention policies
|Research Council||Minimum length of time data should be kept||Starting from||Where to be kept|
|AHRC||3 years||within 3 months of project completion||archaeology grant holders to deposit in the Archaeology Data Service (http://ads.ahds.ac.uk/); for other subjects no archival service is provided|
|BBSRC||10 years||no later than the release of main findings through publication, or after completion of project||no archival service is provided|
|EPSRC||10 years||end of researcher ‘privileged access’ period or from last date on which access to the data was requested by a third party, whichever is later||no archival service is provided|
|ESRC||not stated||within 3 months after project completion||UK Data Service (http://ukdataservice.ac.uk/)|
|MRC||10 years minimum but some data need to be kept longer (depending on the type of study)||in a timely manner but a limited and defined period of exclusive data use is reasonable||no archival service is provided|
|NERC||not stated||at the end of a project, or after a ‘reasonable period’ of exclusive use, normally a maximum of 2 years from the end of data collection||expected to deposit in a network of seven data centres|
|STCF||10 years but data that is not re-measurable should be kept ‘in perpetuity’||within 6 months of publication||several data centres are in place but deposit is not mandated|
|NC3Rs||10 years minimum but some data need to be kept longer (depending on the type of study)||in a timely manner but a limited and defined period of exclusive data use is reasonable||no archival service is provided|
|Cancer Research UK||5 years following the end of a grant||no later than the acceptance for publication of the main findings. A limited period of exclusive use of data for primary research is reasonable||no archival service provided|
|Wellcome Trust||10 years||on publication but opportunities for timely and responsible pre-publication sharing of data should also be maximised||no archival service provided|
It is useful to consider which file formats you will use for your data, since the choice of file format has repercussions for the long-term access to your data. All digital files depend upon hardware and software for access, and it may be that the file formats you choose will become obsolescent in the future.
The safest option is to use open formats (such as comma-separated values or CSV) and not proprietary formats, although some proprietary formats (such as SPSS, PDF, Excel and Word) are widely used and likely to be accessible in the long term. Formats that enable long-term preservation and sharing of data are listed in this table of recommended formats from the UK Data Service.
It may be that you will use different file formats for creating and processing your data, depending on the hardware, software and staff expertise available, or on discipline specific practices. In that case, you may need to consider converting from the original formats into formats that are suitable for preservation.
If you are applying for UKRI funding, any anticipated costs that you incur for preparing and ingesting the data into a repository or archive can be directly costed into you grant proposal. You should provide adequate justification for the costs. Also keep in mind that any expenditure must take place before the actual end date of your project.
For more information, see
Open access and data availability statements
This is also policy of UK Research and Innovation (UKRI – formerly RCUK). Some funders, such as EPSRC, further specify that the data availability statement should identify how the underpinning data can be accessed and on what terms, including any compelling legal or ethical reasons to protect access, if there are any. The statement would also require a persistent identifier such as a DOI. If access needs to be requested via email, EPSRC deem a personal email address to be insufficient but ask for an institutional email address; this should be email@example.com. This email address is monitored on a daily basis and any incoming requests will be forwarded to the relevant researcher.
1. Funding statement
‘This work was supported by FUNDER [grant reference XXX].’
There is more detailed guidance on acknowledgement of funders in scholarly journal articles by the Research Information Network.
2. Data availability statement
You should include a data availability statement even where there is no data associated with the research.
The exact format and placement of the data availability statement will depend on your journal’s house-style. Some journals have their own template for a data availability statement.
The University of Bath has example data access statements that may be useful.
For further advice email firstname.lastname@example.org
There are many benefits to sharing your data.
More information on citation impact
Studies have indicated a significant increase in citation impact of publications where datasets have been made openly available
Restrictions to openness
SHU has an extensive guidance document on restrictions to openness of primary research data which you are encouraged to consult.
Although the default position of many public funders is that the data need to be made openly available, they accept there might be restrictions to openness, as recognised in the UKRI’s Common Principles on Data Policy, in particular
Before depositing data ensure that all personally identifying information has been removed.
Please keep in mind that personal and sensitive data can often be shared ethically if informed consent for data sharing has been given and if the data are processed as required, e.g. redaction of interview transcripts.
The UK Data Service has excellent guidance on consent and ethics for data sharing.
All research data that is shared needs to have a license that indicates what users may or may not do with the data. Licenses can only be granted by the holder of the Intellectual Property Right, so it is important that this is established from the outset. Data archives and repositories will indicate what licenses are available for the data they house.
There are many licenses that you could apply when sharing your data. You can either choose a license that applies to your data from the moment it is published, or negotiate an ad hoc agreement in response to particular requests. You can also use a dual licensing strategy, permitting some rights automatically to all users and agreeing additional rights for specific users (such as collaborators or commercial companies) on request.
When choosing a license, you must take into account
The Information Commissioner’s Office provides a Data Sharing Code of Practice.
There are many license models available that may be applied to your research outputs. Some of your options are
Data archives may have specific requirements to the kinds of licenses you can attach to your data. When depositing in the SHU Research Data Archive (SHURDA), the default license is a Creative Commons Attribution license (CC BY). This license allows others to use your datasets as long as they acknowledge your work. You can also choose one of the other Creative Commons licenses or the GNU GPL3 license if you are depositing computer code.
If you deposit your data in SHURDA, users will need to register before they can download your dataset; you may share your data with all users that register or you may require that the PI or their nominee give consent for sharing on an individual basis if the data is of a (commercially) sensitive nature.
If you wish to negotiate license terms with someone, you should seek advice from Research and Innovation Services.
More information on licensing research data
Making your data accessible: writing a README document
You should add a README document to your data. This provides information about the dataset that makes it easier to find, understand, and re-use.
We provide a template to help you write your README document.
Data citation allows your work to be attributed and credited. A citation provides the information necessary to discover data and access them. A dataset citation should include
You can arrange these elements following the order and punctuation specified by your style guide such as APA, MLA or Chicago, or you can use the preferred format by DataCite, the organisation that assigns DOIs to datasets
See the following example from the DataCite website
More information on citing data