CIEE/ICEE
  • Home
  • Living Data
    • Courses
    • LDP Certificates
    • Internships
    • Living Data Stories
    • Working Groups
  • Working Groups
    • About
    • Apply
  • Training
    • CIEE Workshops and Training
  • Apply
    • For a Data Rescue Internship
    • To host a workshop
  • News
  • Outputs
    • Datasets
    • Publications
    • Documentary
  • Get Involved
    • Membership
    • Donations
    • Governance
    • Contact
  • Accueil
  • Données vivantes
    • Cours
    • LDP Certificats
    • Stages
    • Histoires de données vivantes
    • Groupes de travail
  • Groupes de travail
    • À Propos
    • POSTULER
  • Formation
    • Ateliers et formations de l’ICEE
  • POSTULER
    • Effectuer un stage de sauvetage de données
    • Animer un atelier
  • Nouvelles
  • La Production
    • Ensembles de données
    • Publications
    • Documentary
  • COMMENT S’ENGAGER
    • Adhésion
    • Faire un Don
    • Gouvernance
    • Nous Joindre

LIVING DATA PROJECT STORIES

Data Curation for PFAS in Quebec Surface Waters

7/28/2025

 
Data Rescue Intern: Ming Qiu

In the winter of 2024, I had the opportunity to participate in a Data Rescue Internship focused on data curation for per- and polyfluoroalkyl substances (PFAS) in Quebec’s surface waters. PFAS are a group of synthetic chemicals widely used in products such as food packaging, clothing, and non-stick cookware. Often referred to as “forever chemicals,” they are highly persistent in the environment due to their resistance to degradation. Dr. Sébastien Sauvé and his team at the Université de Montréal have been conducting regular monitoring of these harmful substances across freshwater bodies in Quebec, Canada. Curating this dataset enables researchers to track changes in PFAS levels over time and assess their potential impact on the environment and public health.
Picture
The primary goal of the internship was to prepare and upload a clean, well-formatted PFAS dataset to DataStream, an open-access platform dedicated to water quality data. One of the biggest challenges was correctly mapping the measured PFAS compounds to their standardized names in the Water Quality eXchange (WQX), a national formatting framework for sharing water quality data, which has been adapted for data uploaded to DataStream (see DataStream’s open data schema DS-WQX [https://datastream.org/en-ca/documentation/data-schema]). This required obtaining the unique Chemical Abstracts Service (CAS) number for each PFAS to ensure proper alignment with the standard database. I carefully screened the raw dataset to verify that all target PFAS were assigned the correct CAS numbers. For compounds not yet included in the WQX database, I collaborated with my mentor to submit requests for their registration. Once the PFAS records and sampling site data were quality-checked, I developed a series of R scripts to format the dataset according to the DataStream schema for publication.
 
This experience has been incredibly valuable, especially in helping me understand the entire process of preparing an open-access water quality dataset. I am especially grateful to my DataStream mentor, Charlotte, for her patience and responsiveness—she set an excellent example of effective workplace communication. I also want to thank my coordinator, Pierre, for his dedication in organizing meetings that accommodated everyone’s schedules, ensuring the internship ran smoothly. His meticulous meeting notes greatly improved our communication and workflow efficiency.
 
Since there were a few PFAS that were not able to get uploaded at this time due to the process of verification for registering them in DataStream and WQX, I housed all the R scripts and metadata details in Open Sciences Framework (https://osf.io/9rxsw/; DOI: 10.17605/OSF.IO/9RXSW) to showcase my work. Once the data are finally published, the DataStream DOI will be added in the OSF project profile.

Comments are closed.

    Archives

    August 2025
    July 2025
    May 2025
    February 2025
    January 2025
    December 2024
    September 2024
    October 2023
    April 2023
    March 2023
    February 2023
    January 2023
    November 2022
    July 2020

    Categories

    All

    RSS Feed

Home
Synthesis
Training
Living Data
Funding
News
Membership

Contact

  • Home
  • Living Data
    • Courses
    • LDP Certificates
    • Internships
    • Living Data Stories
    • Working Groups
  • Working Groups
    • About
    • Apply
  • Training
    • CIEE Workshops and Training
  • Apply
    • For a Data Rescue Internship
    • To host a workshop
  • News
  • Outputs
    • Datasets
    • Publications
    • Documentary
  • Get Involved
    • Membership
    • Donations
    • Governance
    • Contact
  • Accueil
  • Données vivantes
    • Cours
    • LDP Certificats
    • Stages
    • Histoires de données vivantes
    • Groupes de travail
  • Groupes de travail
    • À Propos
    • POSTULER
  • Formation
    • Ateliers et formations de l’ICEE
  • POSTULER
    • Effectuer un stage de sauvetage de données
    • Animer un atelier
  • Nouvelles
  • La Production
    • Ensembles de données
    • Publications
    • Documentary
  • COMMENT S’ENGAGER
    • Adhésion
    • Faire un Don
    • Gouvernance
    • Nous Joindre