Next Article in Journal
Healthcare AI for Physician-Centered Decision-Making: Case Study of Applying Deep Learning to Aid Medical Professionals
Previous Article in Journal
Integrating Large Language Models into Digital Manufacturing: A Systematic Review and Research Agenda
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimizing Data Pipelines for Green AI: A Comparative Analysis of Pandas, Polars, and PySpark for CO2 Emission Prediction

1
Paragraphe Laboratory, Paris 8 University of Paris, Vincennes–Saint-Denis, 93200 Saint-Denis, France
2
Laboratory of Engineering, Modeling, and Systems Analysis (LIMAS), Faculty of Sciences, Sidi Mohamed Ben Abdellah University (USMBA), Fez 30000, Morocco
3
ESISA ANALYTICA Laboratory (LEA), Department of Artificial Intelligence, School of Engineering in Applied Sciences (ESISA), Fez 30050, Morocco
4
Department of Computer Engineering High School of Technology, Moulay Ismail University, Meknes 50050, Morocco
*
Author to whom correspondence should be addressed.
Computers 2025, 14(8), 319; https://doi.org/10.3390/computers14080319
Submission received: 12 June 2025 / Revised: 28 July 2025 / Accepted: 1 August 2025 / Published: 7 August 2025
(This article belongs to the Section Internet of Things (IoT) and Industrial IoT)

Abstract

This study evaluates the performance and energy trade-offs of three popular data processing libraries—Pandas, PySpark, and Polars—applied to GreenNav, a CO2 emission prediction pipeline for urban traffic. GreenNav is an eco-friendly navigation app designed to predict CO2 emissions and determine low-carbon routes using a hybrid CNN-LSTM model integrated into a complete pipeline for the ingestion and processing of large, heterogeneous geospatial and road data. Our study quantifies the end-to-end execution time, cumulative CPU load, and maximum RAM consumption for each library when applied to the GreenNav pipeline; it then converts these metrics into energy consumption and CO2 equivalents. Experiments conducted on datasets ranging from 100 MB to 8 GB demonstrate that Polars in lazy mode offers substantial gains, reducing the processing time by a factor of more than twenty, memory consumption by about two-thirds, and energy consumption by about 60%, while maintaining the predictive accuracy of the model (R2 ≈ 0.91). These results clearly show that the careful selection of data processing libraries can reconcile high computing performance and environmental sustainability in large-scale machine learning applications.
Keywords: green AI; CO2 emissions; data preprocessing; energy efficiency; energy benchmarking; Pandas; Polars; PySpark; dataframe libraries; CNN-LSTM; road traffic; emission modeling; sustainable computing; Paris dataset green AI; CO2 emissions; data preprocessing; energy efficiency; energy benchmarking; Pandas; Polars; PySpark; dataframe libraries; CNN-LSTM; road traffic; emission modeling; sustainable computing; Paris dataset

Share and Cite

MDPI and ACS Style

Mekouar, Y.; Lahmer, M.; Karim, M. Optimizing Data Pipelines for Green AI: A Comparative Analysis of Pandas, Polars, and PySpark for CO2 Emission Prediction. Computers 2025, 14, 319. https://doi.org/10.3390/computers14080319

AMA Style

Mekouar Y, Lahmer M, Karim M. Optimizing Data Pipelines for Green AI: A Comparative Analysis of Pandas, Polars, and PySpark for CO2 Emission Prediction. Computers. 2025; 14(8):319. https://doi.org/10.3390/computers14080319

Chicago/Turabian Style

Mekouar, Youssef, Mohammed Lahmer, and Mohammed Karim. 2025. "Optimizing Data Pipelines for Green AI: A Comparative Analysis of Pandas, Polars, and PySpark for CO2 Emission Prediction" Computers 14, no. 8: 319. https://doi.org/10.3390/computers14080319

APA Style

Mekouar, Y., Lahmer, M., & Karim, M. (2025). Optimizing Data Pipelines for Green AI: A Comparative Analysis of Pandas, Polars, and PySpark for CO2 Emission Prediction. Computers, 14(8), 319. https://doi.org/10.3390/computers14080319

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop