CIEE/ICEE
  • Home
  • Living Data
    • Courses
    • LDP Certificates
    • Internships
    • Living Data Stories
    • Working Groups
  • Working Groups
    • About
    • Apply
  • Training
    • CIEE Workshops and Training
  • Apply
    • For a Data Rescue Internship
    • To host a workshop
  • News
  • Outputs
    • Datasets
    • Publications
    • Documentary
  • Get Involved
    • Membership
    • Donations
    • Governance
    • Contact
  • Accueil
  • Données vivantes
    • Cours
    • LDP Certificats
    • Stages
    • Histoires de données vivantes
    • Groupes de travail
  • Groupes de travail
    • À Propos
    • POSTULER
  • Formation
    • Ateliers et formations de l’ICEE
  • POSTULER
    • Effectuer un stage de sauvetage de données
    • Animer un atelier
  • Nouvelles
  • La Production
    • Ensembles de données
    • Publications
    • Documentary
  • COMMENT S’ENGAGER
    • Adhésion
    • Faire un Don
    • Gouvernance
    • Nous Joindre

LIVING DATA PROJECT STORIES

Rescuing four-decade-old benthic invertebrate data from the Turkey Lakes Watershed in Ontario

7/31/2025

 
Data Rescue Intern: Diana Bertuol Garcia

From December 2024 to February 2025, I worked as a Data Rescue Intern with the Department in Fisheries and Ocean (DFO) from Ontario. Since the 1980s, the DFO office in Sault Ste. Marie, Ontario has collected samples of lakes and streams sediments to characterize benthic communities across Ontario water bodies and to assess the magnitude of temporal and spatial variability in species composition across seasons, years, depths and different lakes. Much of the work has been concentrated in the Turkey Lakes Watershed (TLW), as part of the whole-ecosystem monitoring project on the effect of acid rain on terrestrial and aquatic ecosystems. Many sub-projects related to benthic communities have been conducted throughout the years, and the consistency of data collection and methods was dependent on the duration of each sub-project and funding available. Moreover, most of the data had been collected in the 1980s, and thus most people involved in data collection had already left the organization. As a result, the data was scattered across Excel files and lots of PDF files of scanned old documents without documentation or explanatory documents. Hence, my goal for the internship was to organize these files into a cohesive and well-documented relational dataset that could be readily uploaded to the government open data portal.
Picture
Photos courtesy of Fisheries and Oceans Canada in Sault Ste. Marie, ON
At the beginning of the internship, I spent a considerable amount of time going over each Excel and PDF file, trying to understand what data was available, and how they linked together.I felt like a detective trying to understand little bits of information at a time to create a larger picture of the data. For example, while somePDF files contained information about the number of specimens of invertebrate species across different samples, I could not figure out from those pages where the samples had been taken and what the methods were. However, I would then later find notes in another file with notes explaining the sampling codes used and how they related to locations on a map, and then another file that linked those codes to particular methods used, such as Ekman dredges or Kick-and-Sweep samples. Overall, piece-by-piece, I was able to put together a larger picture of what data was available and how and where they were collected.
​


The next step was to transcribe data from PDF to tabular format, a task for which I thankfully had the help of an undergraduate research assistant (and of AI tools that facilitate transcribing data from PDF to tables). Finally, with everything in tabular format , I spent the rest of my internship using R to identify duplicated data among files, standardize taxon names, format everything into a relational database, and perform data validation to ensure the data was clean and organized. It was really rewarding to see the lots and lots of unorganized pdf and Excel files take shape and be put into a format that is usable and understandable for future researchers.

​Overall, I learned a lot about best practices for data management, data documentation, and reproducibility, as well as how to build and structure relational databases, and (why not, being the plant ecologist that I am) what the heck Kick-and-Sweep and other benthos sampling methods are.


Picture
Photos courtesy of Fisheries and Oceans Canada in Sault Ste. Marie, ON

Comments are closed.

    Archives

    August 2025
    July 2025
    May 2025
    February 2025
    January 2025
    December 2024
    September 2024
    October 2023
    April 2023
    March 2023
    February 2023
    January 2023
    November 2022
    July 2020

    Categories

    All

    RSS Feed

Home
Synthesis
Training
Living Data
Funding
News
Membership

Contact

  • Home
  • Living Data
    • Courses
    • LDP Certificates
    • Internships
    • Living Data Stories
    • Working Groups
  • Working Groups
    • About
    • Apply
  • Training
    • CIEE Workshops and Training
  • Apply
    • For a Data Rescue Internship
    • To host a workshop
  • News
  • Outputs
    • Datasets
    • Publications
    • Documentary
  • Get Involved
    • Membership
    • Donations
    • Governance
    • Contact
  • Accueil
  • Données vivantes
    • Cours
    • LDP Certificats
    • Stages
    • Histoires de données vivantes
    • Groupes de travail
  • Groupes de travail
    • À Propos
    • POSTULER
  • Formation
    • Ateliers et formations de l’ICEE
  • POSTULER
    • Effectuer un stage de sauvetage de données
    • Animer un atelier
  • Nouvelles
  • La Production
    • Ensembles de données
    • Publications
    • Documentary
  • COMMENT S’ENGAGER
    • Adhésion
    • Faire un Don
    • Gouvernance
    • Nous Joindre