Semidefinite programming relaxatio…
Updated:
March 17, 2025
In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of 2 sub-gaussian distributions in $\R^p$. We consider semidefinite programming relaxations of an integer quadratic program that is formulated as finding the maximum cut on a graph, where edge weights in the cut represent dissimilarity scores between two nodes based on their $p$ features. We are interested in the case that individual features are of low average quality $\gamma$, and we want to use as few of them as possible to correctly partition the sample. Denote by $\Delta^2:=p \gamma$ the $\ell_2^2$ distance between two centers (mean vectors) in $\R^p$. The goal is to allow a full range of tradeoffs between $n, p, \gamma$ in the sense that partial recovery (success rate $< 100%$) is feasible once the signal to noise ratio $s^2 := \min{np \gamma^2, \Delta^2}$ is lower bounded by a constant. For both balanced and unbalanced cases, we allow each population to have distinct covariance structures with diagonal matrices as special cases. In the present work, (a) we provide a unified framework for analyzing three computationally efficient algorithms, namely, SDP1, BalancedSDP, and Spectral clustering; and (b) we prove that the misclassification error decays exponentially with respect to the SNR $s^2$ for SDP1. Moreover, for balanced partitions, we design an estimator $\bf {BalancedSDP}$ with a superb debiasing property. Indeed, with this new estimator, we remove an assumption (A2) on bounding the trace difference between the two population covariance matrices while proving the exponential error bound as stated above. These estimators and their statistical analyses are novel to the best of our knowledge. We provide simulation evidence illuminating the theoretical predictions.
Harnessing Large Language Models O…
Updated:
January 14, 2024
In an era where the silent struggle of underdiagnosed depression pervades globally, our research delves into the crucial link between mental health and social media. This work focuses on early detection of depression, particularly in extroverted social media users, using LLMs such as GPT 3.5, GPT 4 and our proposed GPT 3.5 fine-tuned model DepGPT, as well as advanced Deep learning models(LSTM, Bi-LSTM, GRU, BiGRU) and Transformer models(BERT, BanglaBERT, SahajBERT, BanglaBERT-Base). The study categorized Reddit and X datasets into "Depressive" and "Non-Depressive" segments, translated into Bengali by native speakers with expertise in mental health, resulting in the creation of the Bengali Social Media Depressive Dataset (BSMDD). Our work provides full architecture details for each model and a methodical way to assess their performance in Bengali depressive text categorization using zero-shot and few-shot learning techniques. Our work demonstrates the superiority of SahajBERT and Bi-LSTM with FastText embeddings in their respective domains also tackles explainability issues with transformer models and emphasizes the effectiveness of LLMs, especially DepGPT, demonstrating flexibility and competence in a range of learning contexts. According to the experiment results, the proposed model, DepGPT, outperformed not only Alpaca Lora 7B in zero-shot and few-shot scenarios but also every other model, achieving a near-perfect accuracy of 0.9796 and an F1-score of 0.9804, high recall, and exceptional precision. Although competitive, GPT-3.5 Turbo and Alpaca Lora 7B show relatively poorer effectiveness in zero-shot and few-shot situations. The work emphasizes the effectiveness and flexibility of LLMs in a variety of linguistic circumstances, providing insightful information about the complex field of depression detection models.
AGSPNet: A framework for parcel-sc…
Updated:
January 11, 2024
Real-time and accurate information on fine-grained changes in crop cultivation is of great significance for crop growth monitoring, yield prediction and agricultural structure adjustment. Aiming at the problems of serious spectral confusion in visible high-resolution unmanned aerial vehicle (UAV) images of different phases, interference of large complex background and salt-and-pepper noise by existing semantic change detection (SCD) algorithms, in order to effectively extract deep image features of crops and meet the demand of agricultural practical engineering applications, this paper designs and proposes an agricultural geographic scene and parcel-scale constrained SCD framework for crops (AGSPNet). AGSPNet framework contains three parts: agricultural geographic scene (AGS) division module, parcel edge extraction module and crop SCD module. Meanwhile, we produce and introduce an UAV image SCD dataset (CSCD) dedicated to agricultural monitoring, encompassing multiple semantic variation types of crops in complex geographical scene. We conduct comparative experiments and accuracy evaluations in two test areas of this dataset, and the results show that the crop SCD results of AGSPNet consistently outperform other deep learning SCD models in terms of quantity and quality, with the evaluation metrics F1-score, kappa, OA, and mIoU obtaining improvements of 0.038, 0.021, 0.011 and 0.062, respectively, on average over the sub-optimal method. The method proposed in this paper can clearly detect the fine-grained change information of crop types in complex scenes, which can provide scientific and technical support for smart agriculture monitoring and management, food policy formulation and food security assurance.
Bayesian estimation of a 1D hydrod…
Updated:
December 6, 2023
Monitoring water stage and discharge at hydrometric stations is essential for flood characterization and prediction. Continuous measurement is feasible for stage records, whereas discharge must be calculated, typically using a rating curve. Several methods have been developed, such as surface velocity measurement or rating curve methods. Nevertheless, hydrometric stations may be influenced by tidal fluctuations, leading to transient flow conditions that can disrupt the stage-discharge relation, complicating the accurate estimation of discharge. In the case of a quasi-unsteady flow, the dynamic can be managed by a relationship between water level, elevation, and discharge based on the Manning-Strickler formula and measurements of water level and slope of the waterline. However, in unsteady flow, when tidal effects are pronounced, these types of relationships and their variations have proven to be less effective. To capture the complex flow dynamics, including flow reversal, an approach via a 1D hydrodynamic model is proposed. Here, model estimation refers to estimating the posterior distribution of parameters and the structural error model.To set up the model, the cross-sectional geometry, friction coefficient, upstream discharge(s), and downstream water level are necessary. In hydrodynamic modeling, the friction coefficients, often represented through a set of Strickler coefficients, are the main calibration parameter, but manual calibration is difficult due to spatial variations in roughness combined with unsteady flow. Furthermore, understanding and quantifying uncertainties associated with data and the model are an important step in the calibration process. Therefore, automatic calibration of friction coefficients is proposed through Bayesian inference.In terms of numerical tools, the 1D hydrodynamic code used here is Mage, developed by INRAE, which solves the 1D Saint-Venant equations for river flows and transients. However, the proposed method is not specific to a given simulation code: it can be applied to any usual 1D hydrodynamic code. Bayesian calibration is performed using the software BaM! (Bayesian modeling: https://github.com/BaM-tools), which allows specifying prior information on model parameters (in this case, friction coefficients) to then estimate them with associated uncertainties, using observations themselves with their uncertainties (not only water levels but also gauging campaigns).
Traffic Cameras to detect inland w…
Updated:
January 5, 2024
Inland waterways are critical for freight movement, but limited means exist for monitoring their performance and usage by freight-carrying vessels, e.g., barges. While methods to track vessels, e.g., tug and tow boats, are publicly available through Automatic Identification Systems (AIS), ways to track freight tonnages and commodity flows carried on barges along these critical marine highways are non-existent, especially in real-time settings. This paper develops a method to detect barge traffic on inland waterways using existing traffic cameras with opportune viewing angles. Deep learning models, specifically, You Only Look Once (YOLO), Single Shot MultiBox Detector (SSD), and EfficientDet are employed. The model detects the presence of vessels and/or barges from video and performs a classification (no vessel or barge, vessel without barge, vessel with barge, and barge). A dataset of 331 annotated images was collected from five existing traffic cameras along the Mississippi and Ohio Rivers for model development. YOLOv8 achieves an F1-score of 96%, outperforming YOLOv5, SSD, and EfficientDet models with 86%, 79%, and 77% respectively. Sensitivity analysis was carried out regarding weather conditions (fog and rain) and location (Mississippi and Ohio rivers). A background subtraction technique was used to normalize video images across the various locations for the location sensitivity analysis. This model can be used to detect the presence of barges along river segments, which can be used for anonymous bulk commodity tracking and monitoring. Such data is valuable for long-range transportation planning efforts carried out by public transportation agencies, in addition to operational and maintenance planning conducted by federal agencies such as the US Army Corp of Engineers.
Over 100-fold improvement in the a…
Updated:
September 1, 2024
Measurements of atmosphere-surface exchange are largely limited by the availability of fast-response gas analyzers; this limitation hampers our understanding of the role of terrestrial ecosystems in atmospheric chemistry and global change. Current micrometeorological methods, compatible with slow-response gas analyzers, are difficult to implement, or rely on empirical parameters that introduce large systematic errors. Here, we develop a new micrometeorological method, optimized for slow-response gas analyzers, that directly measures exchange rates of different atmospheric constituents, with minimal requirements. The new method requires only the sampling of air at a constant rate and directing it into one of two reservoirs, depending on the direction of the vertical wind velocity. An integral component of the new technique is an error diffusion algorithm that minimizes the biases in the measured fluxes and achieves direct flux estimates. We demonstrate that the new method provides an unbiased estimate of the flux, with accuracy within 0.1 of the reference eddy covariance flux, and importantly, allows for significant enhancements in the signal-to-noise ratio of measured scalars without compromising accuracy. Our new method provides a simple and reliable way to address complex environmental questions and offers a promising avenue for advancing our understanding of ecological systems and atmospheric chemistry.
Time Travelling Pixels: Bitemporal…
Updated:
December 23, 2023
Change detection, a prominent research area in remote sensing, is pivotal in observing and analyzing surface transformations. Despite significant advancements achieved through deep learning-based methods, executing high-precision change detection in spatio-temporally complex remote sensing scenarios still presents a substantial challenge. The recent emergence of foundation models, with their powerful universality and generalization capabilities, offers potential solutions. However, bridging the gap of data and tasks remains a significant obstacle. In this paper, we introduce Time Travelling Pixels (TTP), a novel approach that integrates the latent knowledge of the SAM foundation model into change detection. This method effectively addresses the domain shift in general knowledge transfer and the challenge of expressing homogeneous and heterogeneous characteristics of multi-temporal images. The state-of-the-art results obtained on the LEVIR-CD underscore the efficacy of the TTP. The Code is available at \url{https://kychen.me/TTP}.
Empowering health in aging: Innova…
Updated:
December 21, 2023
Addressing the health challenges faced by the aging population, particularly undernutrition, is of paramount importance, given the significant representation of older individuals in society. Undernutrition arises from a disbalance between nutritional intake and energy expenditure, making its diagnosis crucial. Advances in technology allowed a better precision and efficiency of biomarker measurements, making it easier to detect undernutrition in the elderly. This article introduces an innovative system developed as part of the CART initiative at Toulouse University Hospital in France. This system takes a comprehensive approach to monitor health and well-being, collecting data that can provide insights, shape health outcomes, and even predict them. A key focus of this system is on identifying nutrition-related behaviors. By integrating quantitative and clinical assessments, which include biannual nutritional evaluations, as well as physical and physiological measurements like mobility and weight, this approach improves the diagnosis and prevention of undernutrition risks. It offers a more holistic perspective aligned with physiological standards. An example is given with the data collection of an elderly person followed at home for 3 months. We believe that this advance could make a significant contribution to the overall improvement of health and well-being, particularly in the elderly population.
Parameterized Decision-making with…
Updated:
December 19, 2023
Autonomous driving is an emerging technology that has advanced rapidly over the last decade. Modern transportation is expected to benefit greatly from a wise decision-making framework of autonomous vehicles, including the improvement of mobility and the minimization of risks and travel time. However, existing methods either ignore the complexity of environments only fitting straight roads, or ignore the impact on surrounding vehicles during optimization phases, leading to weak environmental adaptability and incomplete optimization objectives. To address these limitations, we propose a parameterized decision-making framework with multi-modal perception based on deep reinforcement learning, called AUTO. We conduct a comprehensive perception to capture the state features of various traffic participants around the autonomous vehicle, based on which we design a graph-based model to learn a state representation of the multi-modal semantic features. To distinguish between lane-following and lane-changing, we decompose an action of the autonomous vehicle into a parameterized action structure that first decides whether to change lanes and then computes an exact action to execute. A hybrid reward function takes into account aspects of safety, traffic efficiency, passenger comfort, and impact to guide the framework to generate optimal actions. In addition, we design a regularization term and a multi-worker paradigm to enhance the training. Extensive experiments offer evidence that AUTO can advance state-of-the-art in terms of both macroscopic and microscopic effectiveness.
Multiphysics-decision tree learnin…
Updated:
December 15, 2023
A novel multiphysics-decision tree learning algorithm is presented for (1) estimating transport properties in the variably saturated subsurface governed by explicitly coupled equations for water, heat, and solute transport; and (2) providing reduced order simulation of time-dependent pressure head, temperature, and concentration with subsurface properties and/or changing surface boundary conditions. We demonstrate that the proposed algorithm results in about one order of magnitude less error in estimated parameters than the traditional multiphysics numerical inversion. We further show that the multiphysics-decision tree learning algorithm reduces the computational burden associated of traditional parameter estimation with reductions in the number Jacobian sensitivity calculations by as much as 90% and the number of iterations required for convergence by up to an order of magnitude. A natural outcome following convergence of the proposed learning algorithm is the reduced order set of supervised decision tree learning models for predicting the pressure head (Random Forest), and the temperature and concentration (Ensemble Gradient Boosting) given knowledge of time, depth, and remaining pair of state variables. The supervised reduced-order modeling is extended to unsupervised machine learning for the simultaneous prediction of state variables by training a Self-Organized Map using the joint multiphysics-decision tree learning property estimates, stochastic boundary conditions, and subsurface state field measurements. The reduced-order machine learning models provide a computationally efficient alternative for studying the effects of changing subsurface water, heat, and solute transport properties and/or surface boundary conditions on coupled subsurface pressure head, temperature, and concentration.
FoMo: Multi-Modal, Multi-Scale and…
Updated:
February 24, 2025
Forests are vital to ecosystems, supporting biodiversity and essential services, but are rapidly changing due to land use and climate change. Understanding and mitigating negative effects requires parsing data on forests at global scale from a broad array of sensory modalities, and using them in diverse forest monitoring applications. Such diversity in data and applications can be effectively addressed through the development of a large, pre-trained foundation model that serves as a versatile base for various downstream tasks. However, remote sensing modalities, which are an excellent fit for several forest management tasks, are particularly challenging considering the variation in environmental conditions, object scales, image acquisition modes, spatio-temporal resolutions, etc. With that in mind, we present the first unified Forest Monitoring Benchmark (FoMo-Bench), carefully constructed to evaluate foundation models with such flexibility. FoMo-Bench consists of 15 diverse datasets encompassing satellite, aerial, and inventory data, covering a variety of geographical regions, and including multispectral, red-green-blue, synthetic aperture radar and LiDAR data with various temporal, spatial and spectral resolutions. FoMo-Bench includes multiple types of forest-monitoring tasks, spanning classification, segmentation, and object detection. To enhance task and geographic diversity in FoMo-Bench, we introduce TalloS, a global dataset combining satellite imagery with ground-based annotations for tree species classification across 1,000+ categories and hierarchical taxonomic levels. Finally, we propose FoMo-Net, a pre-training framework to develop foundation models with the capacity to process any combination of commonly used modalities and spectral bands in remote sensing.
Depth-dependent warming of the Gul…
Updated:
December 13, 2023
The Gulf of Eilat (Gulf of Aqaba) is a semi-enclosed basin situated at the northern end of the Red Sea, renowned for its exceptional marine ecosystem. To evaluate the response of the Gulf to climate variations, we analyzed various factors including temperature down to 700 m, surface air temperature, and heat fluxes. We find that the sea temperature is rising at all depths despite inconclusive trends in local atmospheric variables, including the surface air temperature. The Gulf's sea surface temperature warms at a rate of a few hundredths of a degree Celsius per year, which is comparable to the warming of the global sea surface temperature and the Mediterranean Sea. The increase in sea warming is linked to fewer winter deep mixing events that used to occur more frequently in the past. Based on the analysis of the ocean-atmosphere heat fluxes, we conclude that the lateral advection of heat from the southern part of the Gulf likely leads to an increase in water temperature in the northern part of the Gulf. Our findings suggest that local ocean warming is not necessarily associated with local processes, but rather with the warming of other remote locations.
Statistical model concept to quant…
Updated:
December 13, 2023
Valid mass load predictions of nutrients, in particular nitrogen (N) and phosphorus (P), are needed for the limnological understanding of single lake ecosystems as well as larger river/lake ecosystems. The mass of N and P that enters a lake will determine the ecological state of the lake, and the mass release from the lake will determine the ecological state of downstream ecosystems. Hence, establishing sound quantifications of the external load is crucial and e.g. contributes to the foundation of assessments of necessary management interventions to improve or preserve the ecological integrity of lakes. The external load of N and P is an integral of several pathways, each having different contributions to the total mass load. Around the world, balances of N and P have been derived for decades to support both lake water quality monitoring and research, but it can be difficult and, thus, costly to make detailed and sufficiently covering measurement campaigns in all tributaries (surface as well as groundwater) in the watershed of the N and P load including seasonality and temporal change from year to year. Thus, load prediction is facing challenge of uncertainty due to unmeasured loads, which can be a consequence of limited resources available for the water flow recordings and water concentration measurements in inlets around the lake, or simply due to invisible water flow taking place through the lake bottom. The lake outlet will typically take place in one single river, so the outlet recording seems easier to measure than inlets, however, the outlet may also have unmeasured parts in cases where water is leaching out though the lake bottom. In this paper, we propose a method that applies incomplete data sets (incomplete in the sense of temporal frequency and percentage of gauged watershed) to generate time series that predict the N and P loads entering and leaving the lake.
Spatial Knowledge-Infused Hierarch…
Updated:
December 12, 2023
Deep learning for Earth imagery plays an increasingly important role in geoscience applications such as agriculture, ecology, and natural disaster management. Still, progress is often hindered by the limited training labels. Given Earth imagery with limited training labels, a base deep neural network model, and a spatial knowledge base with label constraints, our problem is to infer the full labels while training the neural network. The problem is challenging due to the sparse and noisy input labels, spatial uncertainty within the label inference process, and high computational costs associated with a large number of sample locations. Existing works on neuro-symbolic models focus on integrating symbolic logic into neural networks (e.g., loss function, model architecture, and training label augmentation), but these methods do not fully address the challenges of spatial data (e.g., spatial uncertainty, the trade-off between spatial granularity and computational costs). To bridge this gap, we propose a novel Spatial Knowledge-Infused Hierarchical Learning (SKI-HL) framework that iteratively infers sample labels within a multi-resolution hierarchy. Our framework consists of a module to selectively infer labels in different resolutions based on spatial uncertainty and a module to train neural network parameters with uncertainty-aware multi-instance learning. Extensive experiments on real-world flood mapping datasets show that the proposed model outperforms several baseline methods. The code is available at \url{https://github.com/ZelinXu2000/SKI-HL}.
Remote Sensing Vision-Language Fou…
Updated:
December 12, 2023
We introduce a method to train vision-language models for remote-sensing images without using any textual annotations. Our key insight is to use co-located internet imagery taken on the ground as an intermediary for connecting remote-sensing images and language. Specifically, we train an image encoder for remote sensing images to align with the image encoder of CLIP using a large amount of paired internet and satellite images. Our unsupervised approach enables the training of a first-of-its-kind large-scale vision language model (VLM) for remote sensing images at two different resolutions. We show that these VLMs enable zero-shot, open-vocabulary image classification, retrieval, segmentation and visual question answering for satellite images. On each of these tasks, our VLM trained without textual annotations outperforms existing VLMs trained with supervision, with gains of up to 20% for classification and 80% for segmentation.
MaskConver: Revisiting Pure Convol…
Updated:
December 11, 2023
In recent years, transformer-based models have dominated panoptic segmentation, thanks to their strong modeling capabilities and their unified representation for both semantic and instance classes as global binary masks. In this paper, we revisit pure convolution model and propose a novel panoptic architecture named MaskConver. MaskConver proposes to fully unify things and stuff representation by predicting their centers. To that extent, it creates a lightweight class embedding module that can break the ties when multiple centers co-exist in the same location. Furthermore, our study shows that the decoder design is critical in ensuring that the model has sufficient context for accurate detection and segmentation. We introduce a powerful ConvNeXt-UNet decoder that closes the performance gap between convolution- and transformerbased models. With ResNet50 backbone, our MaskConver achieves 53.6% PQ on the COCO panoptic val set, outperforming the modern convolution-based model, Panoptic FCN, by 9.3% as well as transformer-based models such as Mask2Former (+1.7% PQ) and kMaX-DeepLab (+0.6% PQ). Additionally, MaskConver with a MobileNet backbone reaches 37.2% PQ, improving over Panoptic-DeepLab by +6.4% under the same FLOPs/latency constraints. A further optimized version of MaskConver achieves 29.7% PQ, while running in real-time on mobile devices. The code and model weights will be publicly available
Better, Not Just More: Data-Centri…
Updated:
March 13, 2025
Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on end-user applications. Furthermore, considering the entire machine learning cycle-from problem definition to model deployment with feedback-is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.
A simple stacked ensemble machine …
Updated:
December 4, 2023
New Zealand legislation requires that Regional Councils set limits for water resource usage to manage the effects of abstractions in over-allocated catchments. We propose a simple stacked ensemble machine learning model to predict the probable naturalized hydrology and allocation status across 317 anthropogenically stressed gauged catchments and across 18,612 ungauged river reaches in Otago. The training and testing of ensemble machine learning models provides unbiased results characterized as very good (R2 > 0.8) to extremely good (R2 > 0.9) when predicting naturalized mean annual low flow and Mean flow. Statistical 5-fold stacking identifies varying levels of risk for managing water-resource sustainability in over-allocated catchments; for example, at the respective 5th, 25th, 50th, 75th, and 95th percentiles the number of overallocated catchments are 73, 57, 44, 23, and 22. The proposed model can be applied to inform sustainable stream management in other regional catchments across New Zealand and worldwide.
SICKLE: A Multi-Sensor Satellite I…
Updated:
November 29, 2023
The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite greater access to earth observation data in agriculture, there is a scarcity of curated and labelled datasets, which limits the potential of its use in training ML models for remote sensing (RS) in agriculture. To this end, we introduce a first-of-its-kind dataset called SICKLE, which constitutes a time-series of multi-resolution imagery from 3 distinct satellites: Landsat-8, Sentinel-1 and Sentinel-2. Our dataset constitutes multi-spectral, thermal and microwave sensors during January 2018 - March 2021 period. We construct each temporal sequence by considering the cropping practices followed by farmers primarily engaged in paddy cultivation in the Cauvery Delta region of Tamil Nadu, India; and annotate the corresponding imagery with key cropping parameters at multiple resolutions (i.e. 3m, 10m and 30m). Our dataset comprises 2,370 season-wise samples from 388 unique plots, having an average size of 0.38 acres, for classifying 21 crop types across 4 districts in the Delta, which amounts to approximately 209,000 satellite images. Out of the 2,370 samples, 351 paddy samples from 145 plots are annotated with multiple crop parameters; such as the variety of paddy, its growing season and productivity in terms of per-acre yields. Ours is also one among the first studies that consider the growing season activities pertinent to crop phenology (spans sowing, transplanting and harvesting dates) as parameters of interest. We benchmark SICKLE on three tasks: crop type, crop phenology (sowing, transplanting, harvesting), and yield prediction
Improving Self-supervised Molecula…
Updated:
November 29, 2023
Self-supervised learning (SSL) has great potential for molecular representation learning given the complexity of molecular graphs, the large amounts of unlabelled data available, the considerable cost of obtaining labels experimentally, and the hence often only small training datasets. The importance of the topic is reflected in the variety of paradigms and architectures that have been investigated recently. Yet the differences in performance seem often minor and are barely understood to date. In this paper, we study SSL based on persistent homology (PH), a mathematical tool for modeling topological features of data that persist across multiple scales. It has several unique features which particularly suit SSL, naturally offering: different views of the data, stability in terms of distance preservation, and the opportunity to flexibly incorporate domain knowledge. We (1) investigate an autoencoder, which shows the general representational power of PH, and (2) propose a contrastive loss that complements existing approaches. We rigorously evaluate our approach for molecular property prediction and demonstrate its particular features in improving the embedding space: after SSL, the representations are better and offer considerably more predictive power than the baselines over different probing tasks; our loss increases baseline performance, sometimes largely; and we often obtain substantial improvements over very small datasets, a common scenario in practice.
Making Self-supervised Learning Ro…
Updated:
November 29, 2023
Self-supervised learning (SSL) has emerged as a powerful technique for learning rich representations from unlabeled data. The data representations are able to capture many underlying attributes of data, and be useful in downstream prediction tasks. In real-world settings, spurious correlations between some attributes (e.g. race, gender and age) and labels for downstream tasks often exist, e.g. cancer is usually more prevalent among elderly patients. In this paper, we investigate SSL in the presence of spurious correlations and show that the SSL training loss can be minimized by capturing only a subset of the conspicuous features relevant to those sensitive attributes, despite the presence of other important predictive features for the downstream tasks. To address this issue, we investigate the learning dynamics of SSL and observe that the learning is slower for samples that conflict with such correlations (e.g. elder patients without cancer). Motivated by these findings, we propose a learning-speed aware SSL (LA-SSL) approach, in which we sample each training data with a probability that is inversely related to its learning speed. We evaluate LA-SSL on three datasets that exhibit spurious correlations between different attributes, demonstrating that it improves the robustness of pretrained representations on downstream classification tasks.
Can SAM recognize crops? Quantifyi…
Updated:
December 4, 2023
Climate change is increasingly disrupting worldwide agriculture, making global food production less reliable. To tackle the growing challenges in feeding the planet, cutting-edge management strategies, such as precision agriculture, empower farmers and decision-makers with rich and actionable information to increase the efficiency and sustainability of their farming practices. Crop-type maps are key information for decision-support tools but are challenging and costly to generate. We investigate the capabilities of Meta AI's Segment Anything Model (SAM) for crop-map prediction task, acknowledging its recent successes at zero-shot image segmentation. However, SAM being limited to up-to 3 channel inputs and its zero-shot usage being class-agnostic in nature pose unique challenges in using it directly for crop-type mapping. We propose using clustering consensus metrics to assess SAM's zero-shot performance in segmenting satellite imagery and producing crop-type maps. Although direct crop-type mapping is challenging using SAM in zero-shot setting, experiments reveal SAM's potential for swiftly and accurately outlining fields in satellite images, serving as a foundation for subsequent crop classification. This paper attempts to highlight a use-case of state-of-the-art image segmentation models like SAM for crop-type mapping and related specific needs of the agriculture industry, offering a potential avenue for automatic, efficient, and cost-effective data products for precision agriculture practices.
Satellite-based feature extraction…
Updated:
November 25, 2023
Shellfish production constitutes an important sector for the economy of many Portuguese coastal regions, yet the challenge of shellfish biotoxin contamination poses both public health concerns and significant economic risks. Thus, predicting shellfish contamination levels holds great potential for enhancing production management and safeguarding public health. In our study, we utilize a dataset with years of Sentinel-3 satellite imagery for marine surveillance, along with shellfish biotoxin contamination data from various production areas along Portugal's western coastline, collected by Portuguese official control. Our goal is to evaluate the integration of satellite data in forecasting models for predicting toxin concentrations in shellfish given forecasting horizons up to four weeks, which implies extracting a small set of useful features and assessing their impact on the predictive models. We framed this challenge as a time-series forecasting problem, leveraging historical contamination levels and satellite images for designated areas. While contamination measurements occurred weekly, satellite images were accessible multiple times per week. Unsupervised feature extraction was performed using autoencoders able to handle non-valid pixels caused by factors like cloud cover, land, or anomalies. Finally, several Artificial Neural Networks models were applied to compare univariate (contamination only) and multivariate (contamination and satellite data) time-series forecasting. Our findings show that incorporating these features enhances predictions, especially beyond one week in lagoon production areas (RIAV) and for the 1-week and 2-week horizons in the L5B area (oceanic). The methodology shows the feasibility of integrating information from a high-dimensional data source like remote sensing without compromising the model's predictive ability.
You Only Explain Once
Updated:
November 23, 2023
In this paper, we propose a new black-box explainability algorithm and tool, YO-ReX, for efficient explanation of the outputs of object detectors. The new algorithm computes explanations for all objects detected in the image simultaneously. Hence, compared to the baseline, the new algorithm reduces the number of queries by a factor of 10X for the case of ten detected objects. The speedup increases further with with the number of objects. Our experimental results demonstrate that YO-ReX can explain the outputs of YOLO with a negligible overhead over the running time of YOLO. We also demonstrate similar results for explaining SSD and Faster R-CNN. The speedup is achieved by avoiding backtracking by combining aggressive pruning with a causal analysis.
"Energy transfers in surface wave-…
Updated:
November 22, 2023
Ocean surface gravity waves play an important role for the air-sea momentum fluxes and the upper ocean mixing, and knowledge of the sea state leads in general circulation models to improved estimates of the ocean energy budget and allows to incorporate surface wave impacts, such as Langmuir turbulence. However, including the Stokes drift, in phase-averaged equations for the Eulerian mean motion leads to an Eulerian energy budget which is physically difficult to interpret. In this note, we show that a Lagrangian energy budget allows for a closed energy budget, in which all terms connecting the different energy compartments correspond to well known energy transfer terms. We show that the so-called Coriolis-Stokes force does not lead to an energy transfer between surface gravity waves and oceanic mean motions as previously suggested. In an energy budget for the Lagrangian mean kinetic energy, the work done by the Coriolis-Stokes force does not contribute, and should be used to estimate the kinetic energy balance in the wave affected surface mixed layer. The Lagrangian energy budget is used to discuss an energetically consistent framework, which can be used to couple a general circulation ocean model to a surface wave model.
Exchanging Dual Encoder-Decoder: A…
Updated:
November 19, 2023
Change detection is a critical task in earth observation applications. Recently, deep learning-based methods have shown promising performance and are quickly adopted in change detection. However, the widely used multiple encoder and single decoder (MESD) as well as dual encoder-decoder (DED) architectures still struggle to effectively handle change detection well. The former has problems of bitemporal feature interference in the feature-level fusion, while the latter is inapplicable to intraclass change detection and multiview building change detection. To solve these problems, we propose a new strategy with an exchanging dual encoder-decoder structure for binary change detection with semantic guidance and spatial localization. The proposed strategy solves the problems of bitemporal feature inference in MESD by fusing bitemporal features in the decision level and the inapplicability in DED by determining changed areas using bitemporal semantic features. We build a binary change detection model based on this strategy, and then validate and compare it with 18 state-of-the-art change detection methods on six datasets in three scenarios, including intraclass change detection datasets (CDD, SYSU), single-view building change detection datasets (WHU, LEVIR-CD, LEVIR-CD+) and a multiview building change detection dataset (NJDS). The experimental results demonstrate that our model achieves superior performance with high efficiency and outperforms all benchmark methods with F1-scores of 97.77%, 83.07%, 94.86%, 92.33%, 91.39%, 74.35% on CDD, SYSU, WHU, LEVIR-CD, LEVIR- CD+, and NJDS datasets, respectively. The code of this work will be available at https://github.com/NJU-LHRS/official-SGSLN.
Bias A-head? Analyzing Bias in Tra…
Updated:
June 16, 2024
Transformer-based pretrained large language models (PLM) such as BERT and GPT have achieved remarkable success in NLP tasks. However, PLMs are prone to encoding stereotypical biases. Although a burgeoning literature has emerged on stereotypical bias mitigation in PLMs, such as work on debiasing gender and racial stereotyping, how such biases manifest and behave internally within PLMs remains largely unknown. Understanding the internal stereotyping mechanisms may allow better assessment of model fairness and guide the development of effective mitigation strategies. In this work, we focus on attention heads, a major component of the Transformer architecture, and propose a bias analysis framework to explore and identify a small set of biased heads that are found to contribute to a PLM's stereotypical bias. We conduct extensive experiments to validate the existence of these biased heads and to better understand how they behave. We investigate gender and racial bias in the English language in two types of Transformer-based PLMs: the encoder-based BERT model and the decoder-based autoregressive GPT model. Overall, the results shed light on understanding the bias behavior in pretrained language models.
S$^3$AD: Semi-supervised Small App…
Updated:
March 14, 2025
Crop detection is integral for precision agriculture applications such as automated yield estimation or fruit picking. However, crop detection, e.g., apple detection in orchard environments remains challenging due to a lack of large-scale datasets and the small relative size of the crops in the image. In this work, we address these challenges by reformulating the apple detection task in a semi-supervised manner. To this end, we provide the large, high-resolution dataset MAD comprising 105 labeled images with 14,667 annotated apple instances and 4,440 unlabeled images. Utilizing this dataset, we also propose a novel Semi-Supervised Small Apple Detection system S$^3$AD based on contextual attention and selective tiling to improve the challenging detection of small apples, while limiting the computational overhead. We conduct an extensive evaluation on MAD and the MSU dataset, showing that S$^3$AD substantially outperforms strong fully-supervised baselines, including several small object detection systems, by up to $14.9\%$. Additionally, we exploit the detailed annotations of our dataset w.r.t. apple properties to analyze the influence of relative size or level of occlusion on the results of various systems, quantifying current challenges.
Emergent Communication for Rules R…
Updated:
November 8, 2023
Research on emergent communication between deep-learning-based agents has received extensive attention due to its inspiration for linguistics and artificial intelligence. However, previous attempts have hovered around emerging communication under perception-oriented environmental settings, that forces agents to describe low-level perceptual features intra image or symbol contexts. In this work, inspired by the classic human reasoning test (namely Raven's Progressive Matrix), we propose the Reasoning Game, a cognition-oriented environment that encourages agents to reason and communicate high-level rules, rather than perceived low-level contexts. Moreover, we propose 1) an unbiased dataset (namely rule-RAVEN) as a benchmark to avoid overfitting, 2) and a two-stage curriculum agent training method as a baseline for more stable convergence in the Reasoning Game, where contexts and semantics are bilaterally drifting. Experimental results show that, in the Reasoning Game, a semantically stable and compositional language emerges to solve reasoning problems. The emerged language helps agents apply the extracted rules to the generalization of unseen context attributes, and to the transfer between different context attributes or even tasks.
DeFault: Deep-learning-based Fault…
Updated:
November 5, 2024
The carbon capture, utilization, and storage (CCUS) framework is an essential component in reducing greenhouse gas emissions, with its success hinging on the comprehensive knowledge of subsurface geology and geomechanics. Passive seismic event relocation and fault detection serve as indispensable tools, offering vital insights into subsurface structures and fluid migration pathways. Accurate identification and localization of seismic events, however, face significant challenges, including the necessity for high-quality seismic data and advanced computational methods. To address these challenges, we introduce a novel deep learning method, DeFault, specifically designed for passive seismic source relocation and fault delineating for passive seismic monitoring projects. By leveraging data domain-adaptation, DeFault allows us to train a neural network with labeled synthetic data and apply it directly to field data. Using DeFault, the passive seismic sources are automatically clustered based on their recording time and spatial locations, and subsequently, faults and fractures are delineated accordingly. We demonstrate the efficacy of DeFault on a field case study involving CO2 injection related microseismic data from the Decatur, Illinois area. Our approach accurately and efficiently relocated passive seismic events, identified faults and aided in the prevention of potential geological hazards. Our results highlight the potential of DeFault as a valuable tool for passive seismic monitoring, emphasizing its role in ensuring CCUS project safety. This research bolsters the understanding of subsurface characterization in CCUS, illustrating machine learning's capacity to refine these methods. Ultimately, our work bear significant implications for CCUS technology deployment, an essential strategy in combating climate change.
InSAR-Informed In-Situ Monitoring …
Updated:
February 12, 2024
This work focuses on assessing the fidelity of Interferometric Synthetic Aperture Radar (InSAR) as it relates to subsurface ground motion monitoring, as well as understanding uncertainty in modeling active landslide scarp displacement for the case study of the in situ monitored El Forn deep seated landslide in Canillo, Andorra. We used the available Sentinel 1 data on the Alaska Satellite Facility (ASF) Vertex platform to create deformation velocity maps and time series of the El Forn landslide scarp. We compared the performances of InSAR data from the recently launched European Ground Motion Service (EGMS) platform and the ASF Vertex Platform in a time series comparison of displacement in the direction of landslide motion with in situ borehole based measurements from 2019 to 2021, suggesting that ground motion detected through InSAR can be used in tandem with field monitoring to provide optimal information with minimum in situ deployment. While identification of active landslide scarps may be possible via the use of EGMS platform, the intents and purposes of this work are in assessment of InSAR as a monitoring tool. Based on that, geospatial interpolation with statistical analysis was conducted to better understand the necessary number of in situ observations needed to lower error on a remote sensing recreation of ground motion over the entirety of a landslide scarp, suggesting between 20 to 25 total observations provides the optimal normalized root mean squared error for an ordinarily kriged model of the El Forn landslide scarp.
Feature Guided Masked Autoencoder …
Updated:
October 28, 2023
Self-supervised learning guided by masked image modelling, such as Masked AutoEncoder (MAE), has attracted wide attention for pretraining vision transformers in remote sensing. However, MAE tends to excessively focus on pixel details, thereby limiting the model's capacity for semantic understanding, in particular for noisy SAR images. In this paper, we explore spectral and spatial remote sensing image features as improved MAE-reconstruction targets. We first conduct a study on reconstructing various image features, all performing comparably well or better than raw pixels. Based on such observations, we propose Feature Guided Masked Autoencoder (FG-MAE): reconstructing a combination of Histograms of Oriented Graidents (HOG) and Normalized Difference Indices (NDI) for multispectral images, and reconstructing HOG for SAR images. Experimental results on three downstream tasks illustrate the effectiveness of FG-MAE with a particular boost for SAR imagery. Furthermore, we demonstrate the well-inherited scalability of FG-MAE and release a first series of pretrained vision transformers for medium resolution SAR and multispectral images.
Spatial correlation increase in si…
Updated:
October 27, 2023
The Amazon rainforest (ARF) is threatened by deforestation and climate change, which could trigger a regime shift to a savanna-like state. Previous work suggesting declining resilience in recent decades was based only on local resilience indicators. Moreover, previous results are potentially biased by the employed multi-sensor and optical satellite data and undetected anthropogenic land-use change. Here, we show that the spatial correlation provides a more robust resilience indicator than local estimators and employ it to measure resilience changes in the ARF, based on single-sensor Vegetation Optical Depth data under conservative exclusion of human activity. Our results show an overall loss of resilience until around 2019, which is especially pronounced in the southwestern and northern Amazon for the time period from 2002 to 2011. The demonstrated reliability of spatial correlation in coupled systems suggests that in particular the southwest of the ARF has experienced pronounced resilience loss over the last two decades.
Measuring tropical rainforest resi…
Updated:
October 24, 2023
The Amazon rainforest is considered one of the Earth's tipping elements and may lose stability under ongoing climate change. Recently a decrease in tropical rainforest resilience has been identified globally from remotely sensed vegetation data. However, the underlying theory assumes a Gaussian distribution of forest disturbances, which is different from most observed forest stressors such as fires, deforestation, or windthrow. Those stressors often occur in power-law-like distributions and can be approximated by $\alpha$-stable L\'evy noise. Here, we show that classical critical slowing down indicators to measure changes in forest resilience are robust under such power-law disturbances. To assess the robustness of critical slowing down indicators, we simulate pulse-like perturbations in an adapted and conceptual model of a tropical rainforest. We find few missed early warnings and few false alarms are achievable simultaneously if the following steps are carried out carefully: First, the model must be known to resolve the timescales of the perturbation. Second, perturbations need to be filtered according to their absolute temporal autocorrelation. Third, critical slowing down has to be assessed using the non-parametric Kendall-$\tau$ slope. These prerequisites allow for an increase in the sensitivity of early warning signals. Hence, our findings imply improved reliability of the interpretation of empirically estimated rainforest resilience through critical slowing down indicators.
Super-resolved rainfall prediction…
Updated:
October 24, 2023
Rainfall prediction at the kilometre-scale up to a few hours in the future is key for planning and safety. But it is challenging given the complex influence of climate change on cloud processes and the limited skill of weather models at this scale. Following the set-up proposed by the \emph{weather4cast} challenge of NeurIPS, we build a two-step deep-learning solution for predicting rainfall occurrence at ground radar high spatial resolution starting from coarser resolution weather satellite images. Our approach is designed to predict future satellite images with a physics-aware ConvLSTM network, which is then converted into precipitation maps through a U-Net. We find that our two-step pipeline outperforms the baseline model and we quantify the benefits of including physical information. We find that local-scale rainfall predictions with good accuracy starting from satellite radiances can be obtained for up to 4 hours in the future.
Estimation of forest height and bi…
Updated:
November 9, 2023
Mapping forest resources and carbon is important for improving forest management and meeting the objectives of storing carbon and preserving the environment. Spaceborne remote sensing approaches have considerable potential to support forest height monitoring by providing repeated observations at high spatial resolution over large areas. This study uses a machine learning approach that was previously developed to produce local maps of forest parameters (basal area, height, diameter, etc.). The aim of this paper is to present the extension of the approach to much larger scales such as the French national coverage. We used the GEDI Lidar mission as reference height data, and the satellite images from Sentinel-1, Sentinel-2 and ALOS-2 PALSA-2 to estimate forest height and produce a map of France for the year 2020. The height map is then derived into volume and aboveground biomass (AGB) using allometric equations. The validation of the height map with local maps from ALS data shows an accuracy close to the state of the art, with a mean absolute error (MAE) of 4.3 m. Validation on inventory plots representative of French forests shows an MAE of 3.7 m for the height. Estimates are slightly better for coniferous than for broadleaved forests. Volume and AGB maps derived from height shows MAEs of 75 tons/ha and 93 m${}^3$/ha respectively. The results aggregated by sylvo-ecoregion and forest types (owner and species) are further improved, with MAEs of 23 tons/ha and 30 m${}^3$/ha. The precision of these maps allows to monitor forests locally, as well as helping to analyze forest resources and carbon on a territorial scale or on specific types of forests by combining the maps with geolocated information (administrative area, species, type of owner, protected areas, environmental conditions, etc.). Height, volume and AGB maps produced in this study are made freely available.
Multimodal Transformer Using Cross…
Updated:
June 17, 2024
Object detection in Remote Sensing Images (RSI) is a critical task for numerous applications in Earth Observation (EO). Differing from object detection in natural images, object detection in remote sensing images faces challenges of scarcity of annotated data and the presence of small objects represented by only a few pixels. Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities such as RGB, infrared (IR), lidar, and synthetic aperture radar (SAR). To this end, the fusion of representations at the mid or late stage, produced by parallel subnetworks, is dominant, with the disadvantages of increasing computational complexity in the order of the number of modalities and the creation of additional engineering obstacles. Using the cross-attention mechanism, we propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage, enabling the construction of a coherent input by aligning the different modalities. By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques. Additionally, we enhance the SWIN transformer by integrating convolution layers into the feed-forward of non-shifting blocks. This augmentation strengthens the model's capacity to merge separated windows through local attention, thereby improving small object detection. Extensive experiments prove the effectiveness of the proposed multimodal fusion module and the architecture, demonstrating their applicability to object detection in multimodal aerial imagery.
High-Resolution Building and Road …
Updated:
September 18, 2024
Mapping buildings and roads automatically with remote sensing typically requires high-resolution imagery, which is expensive to obtain and often sparsely available. In this work we demonstrate how multiple 10 m resolution Sentinel-2 images can be used to generate 50 cm resolution building and road segmentation masks. This is done by training a `student' model with access to Sentinel-2 images to reproduce the predictions of a `teacher' model which has access to corresponding high-resolution imagery. While the predictions do not have all the fine detail of the teacher model, we find that we are able to retain much of the performance: for building segmentation we achieve 79.0\% mIoU, compared to the high-resolution teacher model accuracy of 85.5\% mIoU. We also describe two related methods that work on Sentinel-2 imagery: one for counting individual buildings which achieves $R^2 = 0.91$ against true counts and one for predicting building height with 1.5 meter mean absolute error. This work opens up new possibilities for using freely available Sentinel-2 imagery for a range of tasks that previously could only be done with high-resolution satellite imagery.
To what extent airborne particulat…
Updated:
October 18, 2023
Intensive farming is known to significantly impact air quality, particularly fine particulate matter (PM$_{2.5}$). Understanding in detial their relation is important for scientific reasons and policy making. Ammonia emissions convey the impact of farming, but are not directly observed. They are computed through emission inventories based on administrative data and provided on a regular spatial grid at daily resolution. In this paper, we aim to validate \textit{lato sensu} the approach mentioned above by considering ammonia concentrations instead of emissions in the Lombardy Region, Italy. While the former are available only in few monitoring stations around the region, they are direct observations. Hence, we build a model explaining PM2.5 based on precursors, ammonia (NH3) and nitrogen oxides (NOX), and meteorological variables. To do this, we use a seasonal interaction regression model allowing for temporal autocorrelation, correlation between stations, and heteroskedasticity. It is found that the sensitivity of PM2.5 to NH3 and NOX depends on season, area, and NOX level. It is recommended that an emission reduction policy should focus on the entire manure cycle and not only on spread practices.
Equirectangular image construction…
Updated:
October 13, 2023
360{\deg} spherical images have advantages of wide view field, and are typically projected on a planar plane for processing, which is known as equirectangular image. The object shape in equirectangular images can be distorted and lack translation invariance. In addition, there are few publicly dataset of equirectangular images with labels, which presents a challenge for standard CNNs models to process equirectangular images effectively. To tackle this problem, we propose a methodology for converting a perspective image into equirectangular image. The inverse transformation of the spherical center projection and the equidistant cylindrical projection are employed. This enables the standard CNNs to learn the distortion features at different positions in the equirectangular image and thereby gain the ability to semantically the equirectangular image. The parameter, {\phi}, which determines the projection position of the perspective image, has been analyzed using various datasets and models, such as UNet, UNet++, SegNet, PSPNet, and DeepLab v3+. The experiments demonstrate that an optimal value of {\phi} for effective semantic segmentation of equirectangular images is 6{\pi}/16 for standard CNNs. Compared with the other three types of methods (supervised learning, unsupervised learning and data augmentation), the method proposed in this paper has the best average IoU value of 43.76%. This value is 23.85%, 10.7% and 17.23% higher than those of other three methods, respectively.
Real-Time Event Detection with Ran…
Updated:
October 12, 2023
The petroleum industry is crucial for modern society, but the production process is complex and risky. During the production, accidents or failures, resulting from undesired production events, can cause severe environmental and economic damage. Previous studies have investigated machine learning (ML) methods for undesired event detection. However, the prediction of event probability in real-time was insufficiently addressed, which is essential since it is important to undertake early intervention when an event is expected to happen. This paper proposes two ML approaches, random forests and temporal convolutional networks, to detect undesired events in real-time. Results show that our approaches can effectively classify event types and predict the probability of their appearance, addressing the challenges uncovered in previous studies and providing a more effective solution for failure event management during the production.
Hierarchical Mask2Former: Panoptic…
Updated:
October 10, 2023
Advancements in machine vision that enable detailed inferences to be made from images have the potential to transform many sectors including agriculture. Precision agriculture, where data analysis enables interventions to be precisely targeted, has many possible applications. Precision spraying, for example, can limit the application of herbicide only to weeds, or limit the application of fertiliser only to undernourished crops, instead of spraying the entire field. The approach promises to maximise yields, whilst minimising resource use and harms to the surrounding environment. To this end, we propose a hierarchical panoptic segmentation method to simultaneously identify indicators of plant growth and locate weeds within an image. We adapt Mask2Former, a state-of-the-art architecture for panoptic segmentation, to predict crop, weed and leaf masks. We achieve a PQ{\dag} of 75.99. Additionally, we explore approaches to make the architecture more compact and therefore more suitable for time and compute constrained applications. With our more compact architecture, inference is up to 60% faster and the reduction in PQ{\dag} is less than 1%.
The BELSAR dataset: Mono- and bist…
Updated:
September 12, 2023
The BELSAR dataset is a unique collection of high-resolution airborne mono- and bistatic fully-polarimetric synthetic aperture radar (SAR) data in L-band, alongside concurrent measurements of vegetation and soil bio-geophysical variables measured in maize and winter wheat fields during the summer of 2018 in Belgium. This innovative dataset, the collection of which was funded by the European Space Agency (ESA), helps addressing the lack of publicly-accessible experimental datasets combining multistatic SAR and in situ measurements. As such, it offers an opportunity to advance the development of SAR remote sensing science and applications for agricultural monitoring and hydrology. This paper aims to facilitate its adoption and exploration by offering comprehensive documentation and integrating its multiple data sources into a unified, analysis-ready dataset.
Exploring DINO: Emergent Propertie…
Updated:
December 2, 2023
Self-supervised learning (SSL) models have recently demonstrated remarkable performance across various tasks, including image segmentation. This study delves into the emergent characteristics of the Self-Distillation with No Labels (DINO) algorithm and its application to Synthetic Aperture Radar (SAR) imagery. We pre-train a vision transformer (ViT)-based DINO model using unlabeled SAR data, and later fine-tune the model to predict high-resolution land cover maps. We rigorously evaluate the utility of attention maps generated by the ViT backbone and compare them with the model's token embedding space. We observe a small improvement in model performance with pre-training compared to training from scratch and discuss the limitations and opportunities of SSL for remote sensing and land cover segmentation. Beyond small performance increases, we show that ViT attention maps hold great intrinsic value for remote sensing, and could provide useful inputs to other algorithms. With this, our work lays the groundwork for bigger and better SSL models for Earth Observation.
Climate Change and Future Food Sec…
Updated:
November 27, 2023
Agriculture is crucial in sustaining human life and civilization that relies heavily on natural resources. This industry faces new challenges, such as climate change, a growing global population, and new models for managing food security and water resources. Through a machine learning framework, we estimate the future productivity of croplands based on CMIP5 climate projections on moderate carbon emission scenario. We demonstrate that Vietnam and Thailand are at risk with a 10\% and 14\% drop in rice production, respectively, whereas the Philippines is expected to increase its output by 11\% by 2026 compared with 2018. We urge proactive international collaboration between regions facing crop land gain and degradation to mitigate the climate change and population growth impacts reducing our society's vulnerability. Our study provides critical information on the effects of climate change and human activities on land productivity and uses that may assist such collaboration.
Wave Measurements using Open Sourc…
Updated:
October 4, 2023
This study reviews the design and signal processing of ship borne ultrasonic altimeter wave measurements. The system combines a downward facing ultrasonic altimeter to capture the sea surface elevation as a time series, and an inertial measurement unit to compensate for the ship's motion. The methodology is cost-effective, open source, and adaptable to various ships and platforms. The system was installed on the barque Statsraad Lehmkuhl and recorded data continuously during the 20-month One Ocean Expedition. Results from 1-month crossing of the Tropical Atlantic are presented here. The one-dimensional wave spectrum and associated wave parameters are obtained from the sea surface elevation time series. The observed significant wave height agrees well with satellite altimetry and a spectral wave model. The agreement between observations and the spectral wave model is better for the mean wave period than the peak period. We perform Doppler shift corrections to improve wave period estimates by accounting for the speed of the ship relative to the waves. This correction enhances the accuracy of the mean period, but not the peak period. We suggest that the Doppler correction could be improved by complementing the data sources with directional wave measurements from a marine X-band radar.
Detach-ROCKET: Sequential feature …
Updated:
June 24, 2024
Time Series Classification (TSC) is essential in fields like medicine, environmental science, and finance, enabling tasks such as disease diagnosis, anomaly detection, and stock price analysis. While machine learning models like Recurrent Neural Networks and InceptionTime are successful in numerous applications, they can face scalability issues due to computational requirements. Recently, ROCKET has emerged as an efficient alternative, achieving state-of-the-art performance and simplifying training by utilizing a large number of randomly generated features from the time series data. However, many of these features are redundant or non-informative, increasing computational load and compromising generalization. Here we introduce Sequential Feature Detachment (SFD) to identify and prune non-essential features in ROCKET-based models, such as ROCKET, MiniRocket, and MultiRocket. SFD estimates feature importance using model coefficients and can handle large feature sets without complex hyperparameter tuning. Testing on the UCR archive shows that SFD can produce models with better test accuracy using only 10\% of the original features. We named these pruned models Detach-ROCKET. We also present an end-to-end procedure for determining an optimal balance between the number of features and model accuracy. On the largest binary UCR dataset, Detach-ROCKET improves test accuracy by 0.6\% while reducing features by 98.9\%. By enabling a significant reduction in model size without sacrificing accuracy, our methodology improves computational efficiency and contributes to model interpretability. We believe that Detach-ROCKET will be a valuable tool for researchers and practitioners working with time series data, who can find a user-friendly implementation of the model at \url{https://github.com/gon-uri/detach_rocket}.
Coherent Spectral Feature Extracti…
Updated:
January 15, 2025
Hyperspectral data acquired through remote sensing are invaluable for environmental and resource studies. While rich in spectral information, various complexities such as environmental conditions, material properties, and sensor characteristics can cause significant variability even among pixels belonging to the same material class. This variability poses nuisance for accurate land-cover classification and analysis. Focusing on the spectral domain, we utilize an autoencoder architecture called the symmetric autoencoder (SymAE), which leverages permutation invariant representation and stochastic regularization in tandem to disentangle class-invariant 'coherent' features from variability-causing 'nuisance' features on a pixel-by-pixel basis. This disentanglement is achieved through a purely data-driven process, without the need for hand-crafted modeling, noise distribution priors, or reference 'clean signals'. Additionally, SymAE can generate virtual spectra through manipulations in latent space. Using AVIRIS instrument data, we demonstrate these virtual spectra, offering insights on the disentanglement. Extensive experiments across six benchmark hyperspectral datasets show that coherent features extracted by SymAE can be used to achieve state-of-the-art pixel-based classification. Furthermore, we leverage these coherent features to enhance the performance of some leading spectral-spatial HSI classification methods. Our approach especially shows improvement in scenarios where training and test sets are disjoint, a common challenge in real-world applications where existing methods often struggle to maintain relatively high performance.
Counterfactual Conservative Q Lear…
Updated:
September 22, 2023
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. Tomitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number, which is therefore theoretically superior to the direct treatment referred to above, especially when the agent number is large. We further conduct experiments on four environments including both discrete and continuous action settings on both existing and our man-made datasets, demonstrating that CFCQL outperforms existing methods on most datasets and even with a remarkable margin on some of them.
MMST-ViT: Climate Change-aware Cro…
Updated:
September 19, 2023
Precise crop yield prediction provides valuable information for agricultural planning and decision-making processes. However, timely predicting crop yields remains challenging as crop growth is sensitive to growing season weather variation and climate change. In this work, we develop a deep learning-based solution, namely Multi-Modal Spatial-Temporal Vision Transformer (MMST-ViT), for predicting crop yields at the county level across the United States, by considering the effects of short-term meteorological variations during the growing season and the long-term climate change on crops. Specifically, our MMST-ViT consists of a Multi-Modal Transformer, a Spatial Transformer, and a Temporal Transformer. The Multi-Modal Transformer leverages both visual remote sensing data and short-term meteorological data for modeling the effect of growing season weather variations on crop growth. The Spatial Transformer learns the high-resolution spatial dependency among counties for accurate agricultural tracking. The Temporal Transformer captures the long-range temporal dependency for learning the impact of long-term climate change on crops. Meanwhile, we also devise a novel multi-modal contrastive learning technique to pre-train our model without extensive human supervision. Hence, our MMST-ViT captures the impacts of both short-term weather variations and long-term climate change on crops by leveraging both satellite images and meteorological data. We have conducted extensive experiments on over 200 counties in the United States, with the experimental results exhibiting that our MMST-ViT outperforms its counterparts under three performance metrics of interest.