Leveraging Sentinel-2 Data and Machine Learning for Drought Detection in India: The Process of Ground Truth Construction and a Case Study

Articolo

Data di Pubblicazione:

2025

Abstract:

Highlights: What are the main findings? Ensemble learning models trained on Sentinel-2 multispectral indices reliably classified regional drought conditions in India during the Rabi season, with Bagging Classifier and Random Forest yielding accuracies above 83%, and seasonal majority voting raising performance to 94%. SHAP-based feature attribution consistently identified the Normalized Multi-band Drought Index (NMDI) and Day of the Season (DOS) as dominant predictors, with RECI, EVI, NDMI, and RDI emerging as additional key contributors across models. What is the implication of the main finding? Integrating multispectral drought-sensitive indices with ensemble classifiers provides a scalable and robust methodological framework for regional drought detection and monitoring, complementing conventional ground-based drought assessments. Feature importance rankings demonstrate that vegetation stress and soil-moisture–related indices are central for model generalization, offering transferable insights for agricultural risk management and operational drought early warning systems. Droughts significantly impact agriculture, water resources, and ecosystems. Their timely detection is essential for implementing effective mitigation strategies. This study explores the use of multispectral Sentinel-2 remote sensing indices and machine learning techniques to detect drought conditions in three distinct regions of India, such as Jodhpur, Amravati, and Thanjavur, during the Rabi season (October–April). Twelve remote sensing indices were studied to assess different aspects of vegetation health, soil moisture, and water stress, and their possible joint use and influence as indicators of regional drought events. Reference data used to define drought conditions in each region were primarily sourced from official government drought declarations and regional and national news publications, which provide seasonal maps of drought conditions across the country. Based on this information, a district vs. year (3 × 10) ground truth is created, indicating the presence or absence of drought (Drought/No Drought) for each region across the ten-year period. Using this ground truth table, we extended the remote sensing dataset by adding a binary drought label for each observation: 1 for “Drought” and 0 for “No Drought”. The dataset is organized by year (2016–2025) in a two-dimensional format, with indices as columns and observations as rows. Each observation represents a single measurement of the remote sensing indices. This enriched dataset serves as the foundation for training and evaluating machine learning models aimed at classifying drought conditions based on spectral information. The resultant remote sensing dataset was used to predict drought events through various machine learning models, including Random Forest, XGBoost, Bagging Classifier, and Gradient Boosting. Among the models, XGBoost achieved the highest accuracy (84.80%), followed closely by the Bagging Classifier (83.98%) and Random Forest (82.98%). In terms of precision, Bagging Classifier and Random Forest performed comparably (82.31% and 81.45%, respectively), while XGBoost achieved a precision of 81.28%. We applied a seasonal majority voting strategy, assigning a final drought label for each region and Rabi season based on the majority of predicted monthly labels. Using this method, XGBoost and Bagging Classifier achieved (Formula presented.) accuracy, precision, and recall, while Random Forest and Gradient Boosting reached (Formula presented.) and (Formula presented.), respectively, across all metrics. Shapley Additive Explanation (SHAP) analysis revealed that Normalized Multi-band Drought Index (NMDI) and Day of Season (DOS) consistently emerged as the most influential feature

Tipologia CRIS:

1.1 Articolo in rivista

Keywords:

agricultural applications; bagging classifier; Borda Count; copernicus; drought detection; India; machine learning; remote sensing indices; Sentinel-2; SHAP; XGBoost

Elenco autori:

Sharma, Shubham Subhankar; Mukherjee, Jit; Dell'Acqua, Fabio

Autori di Ateneo:

DELL'ACQUA FABIO

Link alla scheda completa:

https://iris.unipv.it/handle/11571/1549324

Pubblicato in:

REMOTE SENSING

Journal

Dati Generali

URL

https://www.mdpi.com/2072-4292/17/18/3159

Leveraging Sentinel-2 Data and Machine Learning for Drought Detection in India: The Process of Ground Truth Construction and a Case Study

Sharma, Shubham Subhankar; Mukherjee, Jit; Dell'Acqua, Fabio

REMOTE SENSING

Dati Generali

URL