Abstract

The concentration and chemical composition of airborne aerosol particles are important indicators of air quality and sources of air pollution. The particles’ chemical composition reveals probable emission sources, like traffic, biomass burning, wildfires, agriculture, or industrial sources. Single-particle mass spectrometry (SPMS), combined with rapid spectral classification, uniquely enables an in-situ analysis of the chemical composition of individual aerosol particles in real-time for environmental monitoring and other tasks. Modern SPMS devices analyze hundreds of individual particles per minute. Rapid and accurate classification of such large amounts of data remains challenging. Conventional clustering algorithms require tedious manual post-processing. A mass spectrum can be understood as a 1D image per analyzed particle. We applied CNN-based algorithms to perform a fully automated classification. To train the models, usually a large amount of labeled data needs to be prepared. With a manually created benchmark dataset containing 10,400 samples in 13 classes of emission sources (800 samples per class) we achieved an accuracy of ~90%. If the models are trained using only 100 labeled samples per class (1/8 labeled data), the models’ accuracy drops significantly to ~75%. We explored suitable augmentation methods to improve the reliability and performance of multi-class classification for aerosol particle mass spectra in case of limited labeled data (1/8 labeled data). The results using the augmented data improved from ~75% to 86.8%. This paves the way to sharply reduce the expensive and time-consuming work of expert labeling. Furthermore, we verified that converting the 1D mass spectrum into 2D representations and classifying them using 2D-CNN is more efficient than 1D-CNN networks, whether with or without data augmentation.

URL

https://ieeecai.org/2024/wp-content/pdfs/540900b164/540900b164.pdf