Skip to Main Content

Library Research Support

Sharing data

Research data are a valuable resource. It is University policy that all research data, including that from doctoral research, should be kept for a minimum of 10 years after completion of the project. This covers both physical and digital data. For research funded by the Medical Research Council this may be considerably longer.

These data are required to substantiate any research findings that are published or reported, so that your peers may be able to validate the findings. Sharing research outcomes also enables future researchers to open up new lines of inquiry or develop new insights based on your data, without the duplication of effort that would be needed to collect the data again, if re-collecting the same data would be feasible or possible in the first place. Increasingly, research funders encourage the sharing of data.

Thus when you get to the end of your research project, there are several things you will need to do:

  • select the research data you will want or need to keep and destroy the rest
  • preserve the research data by depositing them in an institutional repository or a data archive
  • publish those data in the repository or archive and share them with others, with access restrictions if appropriate

Selecting

One of the first things to do when preparing your data for preservation and sharing is to select the data that you are going to keep.

Decisions on what data to keep are left to the discretion of researchers, taking into account

  • all relevant SHU policies
  • requirements and contractual arrangements from the relevant research funder(s), sponsor(s) and contractual partners
  • guidelines and requirements from the repository or archive where you intend to deposit your data
  • guidance available within the relevant subject domains (good practice within your discipline)

Decisions about what data to keep should ideally be considered at the planning stage, ie when you write your data management plan and obtain ethical approval, and in any case well before the end of your project.

Before depositing data ensure that all personally identifying information has been removed.

Guidelines for selecting data

According to guidance from the Digital Curation Centre, at least three considerations should be made when determining what primary research data to keep.

1. What is the purpose that the data could fulfil?

Datasets can be defined by the purpose of keeping them

  • verification — the dataset supports research outputs such as journal articles, PhD theses, and patent applications
  • further analysis — the dataset is of long-term value and could be useful to you and other researchers at some point in the future

The minimum requirements for keeping datasets depend on this purpose.

Verification

The preserved dataset(s) should allow full scrutiny of the research output; this should usually consist of the dataset that was used to reach the conclusions in the research output, and any additional data that is required to replicate the reported study findings in their entirety. This is known as the ‘replication standard’.

Further analysis

The preserved dataset(s) would normally consist of the raw primary data that was collected or created, possibly after any noise has been removed, and always under the condition that these data are fully documented in such a way that they are usable by other researchers within relevant subject domain(s).

2. What data must be kept (or destroyed) because of policies and regulations

Policies and regulations may require you to keep certain data

  • SHU’s Research Data Management Policy. This policy specifically states that all research data that underpin publications and patent applications must be kept, as well as data that can be considered of long-term value
  • SHU’s University Records Retention Schedule. This schedule states that primary research data must be kept for a minimum of 10 years after the end of the research project, and that ethics records such as consent forms must also be kept
  • your research funder’s or other sponsor’s Research Data Management requirements
  • any contractual or legal reasons to keep certain data, eg data that is used in a patent application; data that falls under contractual terms and conditions when working with external partners such as NHS, governmental bodies and SMEs; data that underpins evaluative reports that could be legally challenged
  • any requirements or restrictions imposed by the repository or archive where you intend to deposit the data; these could be a national archive (eg UK Data Archive), a generic data sharing platform (eg Dryadfigshare), a publisher who may want to include the data as supplementary material to your research article, or the SHU Research Data Archive (SHURDA)
  • ethics approval for your research project

The General Data Protection Regulation (GDPR) and Freedom of Information Act (FOI) may require you to keep and/or destroy certain data.

General Data Protection Regulation

The GDPR states that non-anonymised personal and sensitive personal data ‘processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes’. This means that non-anonymised data can usually only be kept beyond the duration of the research project if the conditions for the research exemption in the GDPR are met, i.e.:

  • the data must not be used to ‘support measures or decisions with respect to particular individuals’
  • the data are not processed in such a way that substantial damage or substantial distress is, or is likely to be caused to any data subject’

In addition it can be shown that the data is of long-term academic interest to the researcher or the academic community and that the data will be protected against unauthorised access. SHU provides detailed guidance on data protection for research participants.

Freedom of Information

The University may be required to disclose some research data to third parties via a Freedom of Information (FOI) or Environmental Information Regulations (EIR) request. Once the request has been received, it is a criminal offence to delete the data or datasets that have been requested (a so-called ‘shredding offence’). There may be reasons not to provide the requested data; these could be the same as the constraints on data sharing you may have mentioned in your data management plan. Furthermore, on 1 October 2014 a new exemption for research has been added to the FOI Act which states that research data may be exempt from FOI requests if

  • the data will be used for a future publication
  • and disclosure of the data before the date of publication would, or would be likely to, prejudice
    • the research programme
    • the interests of any individual participating in the programme
    • the interests of the authority which holds the data and/or the authority which anticipates to publish

This and other exemptions need to be considered on a case by case basis.

The University has published information on its arrangements in respect of Freedom of Information. The Information Commissioner has published guidance on FOI and research data. Jisc have also published an overview how FOI relates to research data.

3. What data should be kept because it is of long-term value

Here is a short checklist that may help you to determine whether your data may be of long-term value

  • Is the data of good enough quality in terms of completeness, sample size, accuracy, validity, reliability or any other criterion relevant in your subject domain?
  • Is the data sufficiently documented to allow re-use by your peers?
  • Is there likely to be a demand for your data?
  • Is it difficult to replicate your data?
  • Are the barriers to re-using your data sufficiently low for the intended or likely audience of your research data? For example, does it require proprietary hardware or software, and if so, how widely used are these in your field of study?

If your data can be considered of long-term value for any of the reasons above, as a general rule your data should be kept.

The following classification of research data, based on the 2008 report Stewardship of digital research data – principles and guidelines from the Research Information Network (archived), may be useful to help you determine your data’s long term value.

data long-term value examples
observational data these data are captured in real-time and usually cannot be reproduced; they are primary candidates for archiving observations of ocean temperature on a specific date, medical scans and images, SEM images, interviews, and surveys
experimental data these data are captured from laboratory equipment and are usually reproducible but reproduction may be costly or too complex to reproduce because of all the experimental variables gene sequences, chromatograms, mircoassays
computational or simulation data these data are generated by computational or simulation models; when complete information about the computer model and its execution (eg hardware, software, input data) is preserved, the output can in theory be reproduced; the model and its associated metadata may be more important than the output from the model climate, mathematical and economic models
derived or compiled data these data result from processing or combining ‘raw’ data and are often reproducible but reproduction may be costly text and data mining, compiled databases

Case studies

1. Questionnaires

A researcher collects information via paper questionnaires with both open ended and closed questions. Informed consent is captured on paper forms. The information in the questionnaires is digitally recorded in an Excel spreadsheet and the quantitative data is analysed in SPSS.

Paper consent forms
The paper consent forms should be kept as a responsibility of the individual researcher, as stated in the University’s records retention schedule. These consent forms do not need to be shared.

Paper questionnaires
The paper questionnaires contain the raw primary data. If these data are digitally recorded, for example in an Excel spreadsheet including transcriptions of written answers to open ended questions, it may not be necessary to keep the paper questionnaires in which case they may be shredded. Otherwise it is essential to keep the paper questionnaires.

All paper data — consent forms and questionnaires — can be deposited in their paper form in the SHU Research Data Archive (SHURDA).

Digital processed data
The Excel spreadsheet in which the answers are recorded and analysed should be kept and documented. A pdf of the original questionnaire should also be retained.

Further documentation
When keeping your data for the long term it would also need sufficient documentation. There are two levels of documentation

  • data-level documentation, which covers descriptions and annotations at the file and within-file level,
  • study-level documentation, which describes the research project, the data creation process, and the general context.

Study-level documentation could be provided in a separate file outlining the research context and introducing the constituent parts of your dataset, but often you will have given sufficient study-level documentation in any research outputs that are based on these data, such as publications and final reports to funders. You can include these in your dataset, or refer to them if they are deposited in SHURA or otherwise publicly available.

2. Interviews

A researcher interviews a number of participants who have given consent for their interviews to be audio recorded, transcribed, and their data to be shared once anonymised. The audio recordings are transcribed, and analysed using NVivo.

Paper consent forms
As above.

Audio recordings and transcriptions
The audio recordings may contain valuable information that cannot be fully captured in transcription but may be considered useful for future analysis. These files may be kept, but because they may be difficult to anonymise (just as video files would be difficult to anonymise) it may not be possible to share the audio recordings with others. It should be feasible, however, to anonymise the transcriptions, and share these as the primary data emanating from this research project.

Analysis in NVivo
The analysis in NVivo can be fully documented and saved. The University of Edinburgh’s online learning modules MANTRA Research Data Management Training has an excellent data handling tutorial in NVivo.

Further documentation
As above.

3. Laboratory measurements

A researcher produces experimental data by taking measurements with laboratory equipment located in the basement of the Harmer building. These experiments are documented in lab notebooks and the measurements are taken with proprietary software and saved as CSV files. These raw data are then entered into Excel spreadsheets, where any noise in the data is removed. Analysis of the data, resulting in graphs where the measurements are plotted against time and compared to calibration data from previous experiments, usually takes place in Excel as well.

Lab notebooks
This research produces a number of datasets which may need to be kept. It may be that only part of the paper lab notebooks used in the experiments are relevant for this particular project. These pages can be digitised and added to the digital dataset as necessary study-level documentation. If whole lab notebooks are relevant to the research project, than these may be kept in their non-digital form and deposited in the SHU Research Data Archive. When these analogue notebooks are deposited, they should be referred to in the digital dataset that is deposited, for example by using a persistent URL or DOI.

CSV and Excel files
Depending on common practice in the discipline and the judgment of the researcher in question, either the raw data and/or the processed data may need to be kept. In this case study, the measurements captured in CSV files directly from the laboratory equipment constitute the raw data. The Excel files in which any noise is removed are the processed data. The “replication standard” would require the raw data to be made available, as well as a clear description of how this data was processed in order to arrive at the results in the research paper in such a way that peers are able to replicate the results.

Preserving

Making decisions about the long-term preservation of your research data includes thinking about retention periods, file formats suitable for long-term preservation, and finding a place where to deposit the data.

Where to deposit

Preserving the data generally means that the data should be deposited in a repository or archive during the project or shortly afterwards. There are many options, but in any case a record should be created in the SHU Research Data Archive (SHURDA) which points to the URL of the deposited data.

Before depositing data ensure that all personally identifying information has been removed.

1. Deposit with the SHU Research Data Archive (SHURDA)

If your funder does not expect you to deposit your data in a designated data archive and your journal does not provide a facility to preserve your data that also meets your funder’s requirements, then you could use SHU’s institutional data repository.

2. Deposit with an external data archive

Some funders have set up data archives specifically for the curation and dissemination of data created as part of their funded programmes. Examples are ESRC’s UK Data Service and the seven data centres that NERC supports. Researchers are often expected to deposit their data in these designated data archives. Other research funders expect you to deposit your data in an institutional or subject specific repository that is not supported by a research council.

Find your funders’ requirements

  • SHERPA/JULIET is a database that lists all research funders’ open access policies, including their rules for depositing in specific data archives

Find data archives

Some considerations when deciding on a repository

  • Are the repository’s terms and conditions acceptable?
  • Will your dataset be given a permanent DOI? A DOI provides a permanent link to your dataset that will never change, even if the website is redesigned or if you leave the university
  • What type of data does the repository accept and what is its subject focus? Is the repository used by the people in your discipline? Does the repository already have a good reputation in your field and is it recommended by your funder or your journal?
  • Does the repository allow you to describe your data sufficiently, so that it is easy to find and easy to cite by others?
  • Are access restrictions and embargoes permitted?
  • Is the archive established and well funded so that you can rely on it still preserving your data in 10 years’ time?

If you are considering using an external data archive and need advice, please contact library-research-support@shu.ac.uk.

3. Submit your data to a journal

An increasing number of journals require that authors make their data promptly available to others without undue restrictions, such as the journals that are part of the Nature Publishing Group and the Public Library of Science PLOS. These data must generally be available to reader from the date of publication, and must be provided to the editor and peer-reviewers at submission. Some of these journals encourage data to be submitted as supplementary materials to the article; other journals require the data to be deposited and published in a repository. It is worth checking your journal’s data policy.

A note on project websites

You can also make your data available on your own project website, but this is generally not recommended. If you make your data available via a website, than you should also deposit the data in a discipline-based repository or SHU’s institutional data repository. Project websites offer little sustainability for your data for the longer-term, and unless you put specific procedures in place, it may be difficult to control who uses your data and how they use it. Also, a dataset in a repository is usually far easier to find by your peer researchers than an individual’s website.

Retention periods

SHU

The University’s Research Data Management policy stipulates that ‘data must be stored for a period at least as long as that required by any funder or sponsor of the research, any publisher of the research or as set out in the University’s Research and Knowledge Transfer Records Retention Schedule’.

The University Records Retention Schedule- Research clarifies that primary data generated by research — both on paper and in electronic form, and both by staff and postgraduate students — should be kept for a period of

expiry of “privileged access” / embargo period + 10 years
OR
last date on which access to the data was requested by a third party + 10 years

Should an external funder stipulate a longer retention period, then the longer retention period shall apply. If the legal contract governing the research stipulates a longer or shorter period, then the retention period set out in the contract shall apply. For clinical and health studies that are funded by the Medical Research Council, retention periods are considerably longer — 20 yearscientific-data-policys if consent of individuals/patients was obtained, 30 years if it wasn’t.

UKRI 

The UKRI bodies have varying requirements, within a common data management framework. You will find an overview of funders’ data sharing and retention policies in the table below. The DCC also provides information about funders’ data policies.

Overview of funders’ data sharing and retention policies

Research Council Minimum length of time data should be kept Starting from Where to be kept
AHRC 3 years within 3 months of project completion archaeology grant holders to deposit in the Archaeology Data Service (http://ads.ahds.ac.uk/); for other subjects no archival service is provided
BBSRC 10 years no later than the release of main findings through publication, or after completion of project no archival service is provided
EPSRC 10 years end of researcher ‘privileged access’ period or from last date on which access to the data was requested by a third party, whichever is later no archival service is provided
ESRC not stated within 3 months after project completion UK Data Service (http://ukdataservice.ac.uk/)
MRC 10 years minimum but some data need to be kept longer (depending on the type of study) in a timely manner but a limited and defined period of exclusive data use is reasonable no archival service is provided
NERC not stated at the end of a project, or after a ‘reasonable period’ of exclusive use, normally a maximum of 2 years from the end of data collection expected to deposit in a network of seven data centres
STCF 10 years but data that is not re-measurable should be kept ‘in perpetuity’ within 6 months of publication several data centres are in place but deposit is not mandated
NC3Rs 10 years minimum but some data need to be kept longer (depending on the type of study) in a timely manner but a limited and defined period of exclusive data use is reasonable no archival service is provided
Cancer Research UK 5 years following the end of a grant no later than the acceptance for publication of the main findings. A limited period of exclusive use of data for primary research is reasonable no archival service provided
Wellcome Trust 10 years on publication but opportunities for timely and responsible pre-publication sharing of data should also be maximised no archival service provided

File formats

It is useful to consider which file formats you will use for your data, since the choice of file format has repercussions for the long-term access to your data. All digital files depend upon hardware and software for access, and it may be that the file formats you choose will become obsolescent in the future.

The safest option is to use open formats (such as comma-separated values or CSV) and not proprietary formats, although some proprietary formats (such as SPSS, PDF, Excel and Word) are widely used and likely to be accessible in the long term. Formats that enable long-term preservation and sharing of data are listed in this table of recommended formats from the UK Data Service.

It may be that you will use different file formats for creating and processing your data, depending on the hardware, software and staff expertise available, or on discipline specific practices. In that case, you may need to consider converting from the original formats into formats that are suitable for preservation.

Costs

If you are applying for UKRI funding, any anticipated costs that you incur for preparing and ingesting the data into a repository or archive can be directly costed into you grant proposal. You should provide adequate justification for the costs. Also keep in mind that any expenditure must take place before the actual end date of your project.

For more information, see

Data statement

Open access and data availability statements

According to the University’s policies on Open Access and Research Data Management, published results should always include

  • details of the funding that supported the research
  • if applicable, a statement on how to access the supporting data

This is also policy of UK Research and Innovation (UKRI – formerly RCUK). Some funders, such as EPSRC, further specify that the data availability statement should identify how the underpinning data can be accessed and on what terms, including any compelling legal or ethical reasons to protect access, if there are any. The statement would also require a persistent identifier such as a DOI. If access needs to be requested via email, EPSRC deem a personal email address to be insufficient but ask for an institutional email address; this should be library-research-support@shu.ac.uk. This email address is monitored on a daily basis and any incoming requests will be forwarded to the relevant researcher.

Links:

Templates

1. Funding statement

‘This work was supported by FUNDER [grant reference XXX].’

There is more detailed guidance on acknowledgement of funders in scholarly journal articles by the Research Information Network.
2. Data availability statement

You should include a data availability statement even where there is no data associated with the research.

The exact format and placement of the data availability statement will depend on your journal’s house-style. Some journals have their own template for a data availability statement.

Process

  • Check whether your journal has its own template or requirements for a data availability statement.
  • Agree the wording of your data availability statement.
  • Obtain a DOI or persistent URL from your data repository for inclusion in the data availability statement.

The University of Bath has example data access statements that may be useful.

For further advice email library-research-support@shu.ac.uk

Publishing

Benefits

There are many benefits to sharing your data.

  • It can help you build your academic track record. An increasing number of studies are showing that there is a correlation between open access to data and citation impact of articles based on those data. See below for more information.
  • It enables your research outputs to be validated and tested, improving the scientific record.
  • It means that data can be re-used for scientific and educational purposes, thus creating new insights.
  • It may reduce duplication of effort.
  • It meets funding body requirements.
  • It is in the public interest, where research data has been publicly funded.

More information on citation impact

Studies have indicated a significant increase in citation impact of publications where datasets have been made openly available

Restrictions to openness

SHU has an extensive guidance document on restrictions to openness of primary research data which you are encouraged to consult.

Although the default position of many public funders is that the data need to be made openly available, they accept there might be restrictions to openness, as recognised in the UKRI’s Common Principles on Data Policy, in particular

  • legal, ethical and commercial constraints, which may include
    • making a patent application
    • arrangements with commercial partners sponsoring your research that classify your data as confidential
    • confidential human patient data

Before depositing data ensure that all personally identifying information has been removed.

  • embargo periods to enable research teams to publish the results of their research in order to get appropriate recognition for the effort involved in collecting and analysing data. The length of this period varies by research discipline and, where appropriate, is discussed further in the published policies of individual Research Councils

Please keep in mind that personal and sensitive data can often be shared ethically if informed consent for data sharing has been given and if the data are processed as required, e.g. redaction of interview transcripts.

The UK Data Service has excellent guidance on consent and ethics for data sharing.

Licensing

All research data that is shared needs to have a license that indicates what users may or may not do with the data. Licenses can only be granted by the holder of the Intellectual Property Right, so it is important that this is established from the outset. Data archives and repositories will indicate what licenses are available for the data they house.

There are many licenses that you could apply when sharing your data. You can either choose a license that applies to your data from the moment it is published, or negotiate an ad hoc agreement in response to particular requests. You can also use a dual licensing strategy, permitting some rights automatically to all users and agreeing additional rights for specific users (such as collaborators or commercial companies) on request.
When choosing a license, you must take into account

  • what permissions were established in participant consent forms
  • who owns the Intellectual Property Rights over the data
  • what funding contracts specify
  • Data Protection

The Information Commissioner’s Office provides a Data Sharing Code of Practice.
There are many license models available that may be applied to your research outputs. Some of your options are

  • placing your data in the public domain (eg with a CC0 or PDDL license)
  • using a Creative Commons license, eg with attribution required (CC BY), share-alike required (CC SA), for non-commercial use only (CC NC), no derivatives allowed (CC ND), or a combination of these
  • using an Open Data Commons license that is specifically designed for data and databases, either with attribution required (ODC-By) or with attribution and share-alike required (ODC-ODbl)
  • using an Open Government Licence (OGL) or Non-Commercial Government Licence which are designed for UK public sector information
  • using a license specifically designed for computer code, such as Apache LicenseGNU General Public License and MIT License. The GNU General Public License v3 (GPL3)which requires that copies and modified versions of your code carry the same license conditions as the original, allowing you to reuse any improved code in your own projects
  • terms negotiated individually with requestors on an ad hoc basis

Data archives may have specific requirements to the kinds of licenses you can attach to your data. When depositing in the SHU Research Data Archive (SHURDA), the default license is a Creative Commons Attribution license (CC BY). This license allows others to use your datasets as long as they acknowledge your work. You can also choose one of the other Creative Commons licenses or the GNU GPL3 license if you are depositing computer code.

If you deposit your data in SHURDA, users will need to register before they can download your dataset; you may share your data with all users that register or you may require that the PI or their nominee give consent for sharing on an individual basis if the data is of a (commercially) sensitive nature.

If you wish to negotiate license terms with someone, you should seek advice from  Research and Innovation Services.

More information on licensing research data

Making your data accessible: writing a README document

You should add a README document to your data. This provides information about the dataset that makes it easier to find, understand, and re-use.

We provide a template to help you write your README document.

Citing

Data citation allows your work to be attributed and credited. A citation provides the information necessary to discover data and access them. A dataset citation should include

  • Creator: Name(s) of each individual or organisational entity responsible for the creation of the dataset.
  • Publication Year: Year the dataset was published or disseminated.
  • Title: Complete title of the dataset.
  • Version: Optional element.
  • Publisher: Organisational entity that makes the dataset available by archiving, producing, publishing, and/or distributing the dataset.
  • Resource Type: Optional element.
  • Electronic Location or Identifier: Web address or unique, persistent, global identifier such as a DOI, preferably as a linkable, permanent URL.

You can arrange these elements following the order and punctuation specified by your style guide such as APA, MLA or Chicago, or you can use the preferred format by DataCite, the organisation that assigns DOIs to datasets

  • Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier

See the following example from the DataCite website

More information on citing data

Adsetts Library [map pdf]
Collegiate Library [map pdf]

Sheffield Hallam University
City Campus, Howard Street
Sheffield S1 1WB
Sheffield Hallam Library Signifier