The Data Rescue Intern: Nadia Páez
The Nature Conservancy of Canada-NCC is a non-profit organization for land conservation. In Alberta, the NCC has collected species records data for more than 30 years as part of baseline assessments or monitoring projects. These species records would be of high value to the province of Alberta if they were made publicly available. However, the records are currently , stored on a private NCC database, which is not compatible with the two provincial biodiversity repositories: the "Fisheries and Wildlife Management Information System “(FWMIS)"database and the "Alberta Conservation Information Management System” (ACIMS )" database for plant and invertebrate records.
The goal of this internship was to build a digital pipeline between the NCC and Alberta repositories, to enable transfer of both past and future data from the private to public realm. To accomplish this, Ph.D. student Nadia Páez worked with NCC Alberta's Data and Information Coordinator, Beth McLarnon, and Director of Conservation Science and Planning, Craig Harding, under the guidance of LPD members Dr. Ellen Bledsoe and Dr. Diane Srivastava, to develop an R project that first assessed the quality of the data to migrate and then formatted to meet the submission requirements of the provincial repository. The development of reproducible scripts allowed the management of large amounts of data. These scripts are valuable not only for a single-use to migrate the historical records but will remain suitable for data collected in the future.
For the historical data, we assessed with a script if the records had all the mandatory
information for the migration (observation date, location, and species-level identification), manually completing missing information where possible and excluding records where not. Our script also ensured that location coordinates were within Alberta provincial boundaries, and allowed us to detect some errors in the original coordinates. Another challenge was to
evaluate the validity of species names since the taxonomic system used by NCC's and Alberta's databases are different. Once the data were cleaned, a migration script standardized the data to the required format of coordinates, dates, and species' names. Only certain plant and invertebrate records are allowed to be submitted to ACIMS, so we filtered the NCC's records by these taxonomic groups.
As a result of the historical data migration, we processed 32300 historical records collected over 18 years, from which we could export 8745 records. From these, 8701 are wildlife observations belonging to 318 species from 113 locations, and 44 are plant and invertebrate observations from 22 localities and 25 different species. This information is now available through the "Fisheries and Wildlife Management Information System -FWMIS"
and the "Alberta Conservation Information Management System -ACIMS"