|
Data Rescue Intern: Ming Qiu In the winter of 2024, I had the opportunity to participate in a Data Rescue Internship focused on data curation for per- and polyfluoroalkyl substances (PFAS) in Quebec’s surface waters. PFAS are a group of synthetic chemicals widely used in products such as food packaging, clothing, and non-stick cookware. Often referred to as “forever chemicals,” they are highly persistent in the environment due to their resistance to degradation. Dr. Sébastien Sauvé and his team at the Université de Montréal have been conducting regular monitoring of these harmful substances across freshwater bodies in Quebec, Canada. Curating this dataset enables researchers to track changes in PFAS levels over time and assess their potential impact on the environment and public health. The primary goal of the internship was to prepare and upload a clean, well-formatted PFAS dataset to DataStream, an open-access platform dedicated to water quality data. One of the biggest challenges was correctly mapping the measured PFAS compounds to their standardized names in the Water Quality eXchange (WQX), a national formatting framework for sharing water quality data, which has been adapted for data uploaded to DataStream (see DataStream’s open data schema DS-WQX [https://datastream.org/en-ca/documentation/data-schema]). This required obtaining the unique Chemical Abstracts Service (CAS) number for each PFAS to ensure proper alignment with the standard database. I carefully screened the raw dataset to verify that all target PFAS were assigned the correct CAS numbers. For compounds not yet included in the WQX database, I collaborated with my mentor to submit requests for their registration. Once the PFAS records and sampling site data were quality-checked, I developed a series of R scripts to format the dataset according to the DataStream schema for publication.
This experience has been incredibly valuable, especially in helping me understand the entire process of preparing an open-access water quality dataset. I am especially grateful to my DataStream mentor, Charlotte, for her patience and responsiveness—she set an excellent example of effective workplace communication. I also want to thank my coordinator, Pierre, for his dedication in organizing meetings that accommodated everyone’s schedules, ensuring the internship ran smoothly. His meticulous meeting notes greatly improved our communication and workflow efficiency. Since there were a few PFAS that were not able to get uploaded at this time due to the process of verification for registering them in DataStream and WQX, I housed all the R scripts and metadata details in Open Sciences Framework (https://osf.io/9rxsw/; DOI: 10.17605/OSF.IO/9RXSW) to showcase my work. Once the data are finally published, the DataStream DOI will be added in the OSF project profile. Comments are closed.
|
RSS Feed