projects
The Data Rescue Projects are a way to connect data custodians to highly skilled graduate students who will work to tidy, document and archive priority Canadian datasets.
-
WHAT IS DATA RESCUE
-
FOR DATA CUSTODIANS
<
>
Data rescue is the identification, preservation, and sharing of valuable data and associated metadata at risk of loss¹.
For data custodians, hosting a data rescue project provides qualified work force to archive important data that can potentially be lost due to inadequate preservation and contribute to the open data ecosystem in Canada.
For students, this is an opportunity to build up your data wrangling skills portfolio, connect with researchers and organizations throughout Canada, be mentored on open science practices and help to breathe new life into legacy data. Examples of some of the data that have been rescued to date can be found on the Outputs -> Datasets page.
¹Read more about the data rescuing process in our paper: https://doi.org/10.1098/rspb.2022.0938
For data custodians, hosting a data rescue project provides qualified work force to archive important data that can potentially be lost due to inadequate preservation and contribute to the open data ecosystem in Canada.
For students, this is an opportunity to build up your data wrangling skills portfolio, connect with researchers and organizations throughout Canada, be mentored on open science practices and help to breathe new life into legacy data. Examples of some of the data that have been rescued to date can be found on the Outputs -> Datasets page.
¹Read more about the data rescuing process in our paper: https://doi.org/10.1098/rspb.2022.0938
Criteria for legacy datasets
- The dataset includes biological data that is relevant to ecology, evolution or environmental science. The dataset does not need to be entirely biological; physical and chemical data is acceptable if in the context of an ecological or evolutionary process. For example, water chemistry is an important part of understanding aquatic ecology.
- The dataset should be important in one of the follow ways: (1) extensive in either space or time (e.g. bird surveys over more than a decade, or national surveys of caribou density); (2) describes a study that was ground-breaking in the history of science; (3) concerns a species or ecosystem that is considered at risk in Canada, or of high cultural or societal value.
- Priority will be given to datasets that concern a Canadian species or ecosystem, or have been collected by a Canadian researcher or Canadian organization.
- The custodian of the dataset commits the data to be permanently archived, and open and accessible. We realize that some dataset custodians may need to temporarily delay the publication of their data, and in such cases we may consider a short-term embargo.
FREQUENTLY ASKED QUESTIONS
1. Is the Living Data Project setting up its own repository?
No. There are now a large number of excellent data repositories that have been set up by government agencies, university libraries, synthesis centres, non-profit organizations and international consortia. We will work with you to decide what repository fits your dataset best – this is your data, and you can decide on its home as long as the repository meets our criteria for long-term digital archiving. We also realize that some employers (e.g. Canadian government) mandate particular repositories.
2. What if the data owner is no longer alive or the data belonged to a defunct organization?
No problem. In this case, the students will be doing a lot of data forensics, piecing together the meaning of each variable by cross-referencing other sources or by interviewing contemporaneous researchers.
3. If I make my data public, won’t other people use it erroneously because they do not understand the context?
Misuse of data is a valid concern, but we have an important tool to combat it: metadata. Students and faculty will work together with you to develop detailed descriptions (metadata) of each variable in the dataset to explain the complexities of each variable. Furthermore, we have the ability to create additional variables that describe any heterogeneity in data quality or context. The goal is to provide enough information about the data that no further explanations are needed to use it.
4. I can’t make my data public yet because I still have publications planned.
Just because your data is archived doesn’t mean that you cannot publish papers based on it. In fact, many journals now required data to be properly archived before accepting a manuscript for publication. You may also worry that another researcher will ‘scoop’ you on your own data. This is often more fear than fact. Instead, what tends to happen when other researchers see your data carefully organized is that you receive more invitations for collaborations.
5. If I make my data public, won’t I lose credit for all the time I invested in collecting this data?
Not necessarily. You can decide the type of Creative Commons open licence associated with your archived dataset. If you choose a CCBY licence, future researchers are required to acknowledge you as the data collector.
6. I already have planned to give my data to a younger colleague so I don’t need to archive it.
That’s great that you have found such a colleague. However, it would be much more valuable to pass on this dataset if it was already properly documented, validated, organized and archived – otherwise they will be stuck doing this without your valuable input or, even worse, not at all. We would be happy to work with both you and your junior colleague to design your data archiving project so that it meets both of your needs as well as that of the greater research community.
7. If my data is in [database program X], isn’t it already archived?
Unfortunately not. If it isn’t in a data repository, in a simple machine-readable format like a .csv or a .txt file, then it isn’t futureproof. Any data held in a proprietary software package may become unreadable if that package is no longer supported by the software company or the company ceases to exist. Remember those Lotus 1-2-3 files you used to have? Good luck opening them.
8. What will I get out of this?
A lasting legacy for your research career, and the satisfaction that your data will continue to inform future generations of scientists. You will also have a career summary posted on the CIEE – Living Data Project website honouring your achievements and the importance of your dataset.
No. There are now a large number of excellent data repositories that have been set up by government agencies, university libraries, synthesis centres, non-profit organizations and international consortia. We will work with you to decide what repository fits your dataset best – this is your data, and you can decide on its home as long as the repository meets our criteria for long-term digital archiving. We also realize that some employers (e.g. Canadian government) mandate particular repositories.
2. What if the data owner is no longer alive or the data belonged to a defunct organization?
No problem. In this case, the students will be doing a lot of data forensics, piecing together the meaning of each variable by cross-referencing other sources or by interviewing contemporaneous researchers.
3. If I make my data public, won’t other people use it erroneously because they do not understand the context?
Misuse of data is a valid concern, but we have an important tool to combat it: metadata. Students and faculty will work together with you to develop detailed descriptions (metadata) of each variable in the dataset to explain the complexities of each variable. Furthermore, we have the ability to create additional variables that describe any heterogeneity in data quality or context. The goal is to provide enough information about the data that no further explanations are needed to use it.
4. I can’t make my data public yet because I still have publications planned.
Just because your data is archived doesn’t mean that you cannot publish papers based on it. In fact, many journals now required data to be properly archived before accepting a manuscript for publication. You may also worry that another researcher will ‘scoop’ you on your own data. This is often more fear than fact. Instead, what tends to happen when other researchers see your data carefully organized is that you receive more invitations for collaborations.
5. If I make my data public, won’t I lose credit for all the time I invested in collecting this data?
Not necessarily. You can decide the type of Creative Commons open licence associated with your archived dataset. If you choose a CCBY licence, future researchers are required to acknowledge you as the data collector.
6. I already have planned to give my data to a younger colleague so I don’t need to archive it.
That’s great that you have found such a colleague. However, it would be much more valuable to pass on this dataset if it was already properly documented, validated, organized and archived – otherwise they will be stuck doing this without your valuable input or, even worse, not at all. We would be happy to work with both you and your junior colleague to design your data archiving project so that it meets both of your needs as well as that of the greater research community.
7. If my data is in [database program X], isn’t it already archived?
Unfortunately not. If it isn’t in a data repository, in a simple machine-readable format like a .csv or a .txt file, then it isn’t futureproof. Any data held in a proprietary software package may become unreadable if that package is no longer supported by the software company or the company ceases to exist. Remember those Lotus 1-2-3 files you used to have? Good luck opening them.
8. What will I get out of this?
A lasting legacy for your research career, and the satisfaction that your data will continue to inform future generations of scientists. You will also have a career summary posted on the CIEE – Living Data Project website honouring your achievements and the importance of your dataset.