Projects List

Sort

Category

Resources

Reversed Attention: On The Gradien…
Updated:
December 22, 2024
0
0
External Public

The success of Transformer-based Language Models (LMs) stems from their attention mechanism. While this mechanism has been extensively studied in explainability research, particularly through the attention values obtained during the forward pass of LMs, the backward pass of attention has been largely overlooked. In this work, we study the mathematics of the backward pass of attention, revealing that it implicitly calculates an attention matrix we refer to as "Reversed Attention". We examine the properties of Reversed Attention and demonstrate its ability to elucidate the models' behavior and edit dynamics. In an experimental setup, we showcase the ability of Reversed Attention to directly alter the forward pass of attention, without modifying the model's weights, using a novel method called "attention patching". In addition to enhancing the comprehension of how LM configure attention layers during backpropagation, Reversed Attention maps contribute to a more interpretable backward pass.

Read More cs.CL
EarthDial: Turning Multi-sensory E…
Updated:
December 19, 2024
0
2
External Public

Automated analysis of vast Earth observation data via interactive Vision-Language Models (VLMs) can unlock new opportunities for environmental monitoring, disaster response, and resource management. Existing generic VLMs do not perform well on Remote Sensing data, while the recent Geo-spatial VLMs remain restricted to a fixed resolution and few sensor modalities. In this paper, we introduce EarthDial, a conversational assistant specifically designed for Earth Observation (EO) data, transforming complex, multi-sensory Earth observations into interactive, natural language dialogues. EarthDial supports multi-spectral, multi-temporal, and multi-resolution imagery, enabling a wide range of remote sensing tasks, including classification, detection, captioning, question answering, visual reasoning, and visual grounding. To achieve this, we introduce an extensive instruction tuning dataset comprising over 11.11M instruction pairs covering RGB, Synthetic Aperture Radar (SAR), and multispectral modalities such as Near-Infrared (NIR) and infrared. Furthermore, EarthDial handles bi-temporal and multi-temporal sequence analysis for applications like change detection. Our extensive experimental results on 37 downstream applications demonstrate that EarthDial outperforms existing generic and domain-specific models, achieving better generalization across various EO tasks.

Read More cs.CV
DroughtSet: Understanding Drought …
Updated:
December 19, 2024
0
0
External Public

Drought is one of the most destructive and expensive natural disasters, severely impacting natural resources and risks by depleting water resources and diminishing agricultural yields. Under climate change, accurately predicting drought is critical for mitigating drought-induced risks. However, the intricate interplay among the physical and biological drivers that regulate droughts limits the predictability and understanding of drought, particularly at a subseasonal to seasonal (S2S) time scale. While deep learning has been demonstrated with potential in addressing climate forecasting challenges, its application to drought prediction has received relatively less attention. In this work, we propose a new dataset, DroughtSet, which integrates relevant predictive features and three drought indices from multiple remote sensing and reanalysis datasets across the contiguous United States (CONUS). DroughtSet specifically provides the machine learning community with a new real-world dataset to benchmark drought prediction models and more generally, time-series forecasting methods. Furthermore, we propose a spatial-temporal model SPDrought to predict and interpret S2S droughts. Our model learns from the spatial and temporal information of physical and biological features to predict three types of droughts simultaneously. Multiple strategies are employed to quantify the importance of physical and biological features for drought prediction. Our results provide insights for researchers to better understand the predictability and sensitivity of drought to biological and physical conditions. We aim to contribute to the climate field by proposing a new tool to predict and understand the occurrence of droughts and provide the AI community with a new benchmark to study deep learning applications in climate science.

Read More cs.LG
Nullu: Mitigating Object Hallucina…
Updated:
March 17, 2025
50
1
External Public

Recent studies have shown that large vision-language models (LVLMs) often suffer from the issue of object hallucinations (OH). To mitigate this issue, we introduce an efficient method that edits the model weights based on an unsafe subspace, which we call HalluSpace in this paper. With truthful and hallucinated text prompts accompanying the visual content as inputs, the HalluSpace can be identified by extracting the hallucinated embedding features and removing the truthful representations in LVLMs. By orthogonalizing the model weights, input features will be projected into the Null space of the HalluSpace to reduce OH, based on which we name our method Nullu. We reveal that HalluSpaces generally contain prior information in the large language models (LLMs) applied to build LVLMs, which have been shown as essential causes of OH in previous studies. Therefore, null space projection suppresses the LLMs' priors to filter out the hallucinated features, resulting in contextually accurate outputs. Experiments show that our method can effectively mitigate OH across different LVLM families without extra inference costs and also show strong performance in general LVLM benchmarks. Code is released at https://github.com/Ziwei-Zheng/Nullu.

Read More cs.CV
Learning of Patch-Based Smooth-Plu…
Updated:
March 17, 2025
0
0
External Public

We aim at the solution of inverse problems in imaging, by combining a penalized sparse representation of image patches with an unconstrained smooth one. This allows for a straightforward interpretation of the reconstruction. We formulate the optimization as a bilevel problem. The inner problem deploys classical algorithms while the outer problem optimizes the dictionary and the regularizer parameters through supervised learning. The process is carried out via implicit differentiation and gradient-based optimization. We evaluate our method for denoising, super-resolution, and compressed-sensing magnetic-resonance imaging. We compare it to other classical models as well as deep-learning-based methods and show that it always outperforms the former and also the latter in some instances.

Read More eess.IV cs.CV More categories
Online optimisation for dynamic el…
Updated:
March 17, 2025
0
0
External Public

Online optimisation studies the convergence of optimisation methods as the data embedded in the problem changes. Based on this idea, we propose a primal dual online method for nonlinear time-discrete inverse problems. We analyse the method through regret theory and demonstrate its performance in real-time monitoring of moving bodies in a fluid with Electrical Impedance Tomography (EIT). To do so, we also prove the second-order differentiability of the Complete Electrode Model (CEM) solution operator on $L^\infty$.

Read More math.OC cs.CV
Artificial Intelligence in Traffic…
Updated:
December 16, 2024
0
0
External Public

Existing research on AI-based traffic management systems, utilizing techniques such as fuzzy logic, reinforcement learning, deep neural networks, and evolutionary algorithms, demonstrates the potential of AI to transform the traffic landscape. This article endeavors to review the topics where AI and traffic management intersect. It comprises areas like AI-powered traffic signal control systems, automatic distance and velocity recognition (for instance, in autonomous vehicles, hereafter AVs), smart parking systems, and Intelligent Traffic Management Systems (ITMS), which use data captured in real-time to keep track of traffic conditions, and traffic-related law enforcement and surveillance using AI. AI applications in traffic management cover a wide range of spheres. The spheres comprise, inter alia, streamlining traffic signal timings, predicting traffic bottlenecks in specific areas, detecting potential accidents and road hazards, managing incidents accurately, advancing public transportation systems, development of innovative driver assistance systems, and minimizing environmental impact through simplified routes and reduced emissions. The benefits of AI in traffic management are also diverse. They comprise improved management of traffic data, sounder route decision automation, easier and speedier identification and resolution of vehicular issues through monitoring the condition of individual vehicles, decreased traffic snarls and mishaps, superior resource utilization, alleviated stress of traffic management manpower, greater on-road safety, and better emergency response time.

Read More cs.AI
DRUM: Diffusion-based runoff model…
Updated:
December 16, 2024
0
0
External Public

Reliable flood forecasting remains a critical challenge due to persistent underestimation of peak flows and inadequate uncertainty quantification in current approaches. We present DRUM (Diffusion-based Runoff Model), a generative AI solution for probabilistic runoff prediction. DRUM builds up an iterative refinement process that generates ensemble runoff estimates from noise, guided by past meteorological conditions, present meteorological forecasts, and static catchment attributes. This framework allows learning complex hydrological behaviors without imposing explicit distributional assumptions, particularly benefiting extreme event prediction and uncertainty quantification. Using data from 531 representative basins across the contiguous United States, DRUM outperforms state-of-the-art deep learning methods in runoff forecasting regarding both deterministic and probabilistic skills, with particular advantages in extreme flow (0.1%) predictions. DRUM demonstrates superior flood early warning skill across all magnitudes and lead times (1-7 days), achieving F1 scores near 0.4 for extreme events under perfect forecasts and maintaining robust performance with operational forecasts, especially for longer lead times and high-magnitude floods. When applied to climate projections through the 21st century, DRUM reveals increasing flood vulnerability in 47.8-57.1% of basins across emission scenarios, with particularly elevated risks along the West Coast and Southeast regions. These advances demonstrate significant potential for improving both operational flood forecasting and long-term risk assessment in a changing climate.

Read More physics.geo-ph
A decade of the fast-varying ionos…
Updated:
March 10, 2025
0
0
External Public

The time-varying geomagnetic field is a superposition of contributions from multiple internal and external current systems. A major source of geomagnetic variations at periods less than a few years are current systems external to the solid Earth, namely the ionospheric and magnetospheric currents, as well as associated induced currents. The separation of these three sources is mathematically underdetermined using either ground or satellite measurements alone, but becomes tractable when the two datasets are combined. Based on this concept, we developed a new geomagnetic field modelling approach that allows us to simultaneously characterise the mid-latitude ionospheric, magnetospheric and the internal induced magnetic fields using ground and satellite observations for all local times and magnetic conditions, and without prescribing any harmonic behaviour on these current systems in time, as is typical in other models. By applying this new method to a 10-year dataset of ground observatory and multi-satellite measurements from 2014 to 2023, we obtained the time series of the spherical harmonic coefficients of the ionospheric, magnetospheric and induced fields. These new time series allow the study of complex non-periodic dynamics of the external magnetic fields during global geomagnetic storms, as well as periodicities in the magnetospheric coefficients linked to solar activities and periodic ionospheric magnetic fields linked to lunar daily variations, contributing to a more complete picture of the dynamics of the external currents and magnetosphere-ionosphere interactions, and facilitating more accurate space weather nowcast and forecast. Finally, the new approach allows for a better characterisation of internal induced field sources, leading to higher quality electromagnetic transfer functions.

Read More physics.space-ph physics.geo-ph
Twenty-Year Review of Outdoor Air …
Updated:
November 26, 2024
100
1
External Public

Air quality is a prevalent concern due to its imposing health risks. The state of Utah, USA, has, at times over the last 20 years, experienced some of the worst air quality in the nation. The propensity for Utah to experience elevated concentrations of particulate matter ($\mathrm{PM_{2.5}}$) and ozone ($\mathrm{O_3}$) can, in part, be attributed to its unique geography, which features dry, mountainous terrain. Valleys in Utah create ideal environments for extended cold-pool events. In this review, we summarize air quality research conducted in Utah over the past 20 years (2002-2022) by dividing the state into six regions: Utah Valley, Summit County, Southern Utah (regions south of Utah Valley), Cache Valley, Uinta Basin, and Salt Lake Valley. We review the published literature chronologically and provide a summary for each region, identifying areas where additional research is warranted. We found that research efforts are heavily weighted toward the Uinta Basin and Salt Lake Valley, with the remaining regions collectively accounting for only 20% of studies. We identified the need for more source apportionment studies, speciated volatile organic compound (VOC) analyses, and ozone isopleths. Where ozone isopleths cannot be created, measurements of glyoxal ($\mathrm{CHOCHO}$) and formaldehyde ($\mathrm{HCHO}$) concentrations could serve as cost-effective surrogates to inform ozone mitigation policies.

Read More physics.ao-ph
Improving Satellite Imagery Maskin…
Updated:
December 11, 2024
0
0
External Public

Many remote sensing applications employ masking of pixels in satellite imagery for subsequent measurements. For example, estimating water quality variables, such as Suspended Sediment Concentration (SSC) requires isolating pixels depicting water bodies unaffected by clouds, their shadows, terrain shadows, and snow and ice formation. A significant bottleneck is the reliance on a variety of data products (e.g., satellite imagery, elevation maps), and a lack of precision in individual steps affecting estimation accuracy. We propose to improve both the accuracy and computational efficiency of masking by developing a system that predicts all required masks from Harmonized Landsat and Sentinel (HLS) imagery. Our model employs multi-tasking to share computation and enable higher accuracy across tasks. We experiment with recent advances in deep network architectures and show that masking models can benefit from these, especially when combined with pre-training on large satellite imagery datasets. We present a collection of models offering different speed/accuracy trade-offs for masking. MobileNet variants are the fastest, and perform competitively with larger architectures. Transformer-based architectures are the slowest, but benefit the most from pre-training on large satellite imagery datasets. Our models provide a 9% F1 score improvement compared to previous work on water pixel identification. When integrated with an SSC estimation system, our models result in a 30x speedup while reducing estimation error by 2.64 mg/L, allowing for global-scale analysis. We also evaluate our model on a recently proposed cloud and cloud shadow estimation benchmark, where we outperform the current state-of-the-art model by at least 6% in F1 score.

Read More cs.CV
SenCLIP: Enhancing zero-shot land-…
Updated:
December 11, 2024
0
1
External Public

Pre-trained vision-language models (VLMs), such as CLIP, demonstrate impressive zero-shot classification capabilities with free-form prompts and even show some generalization in specialized domains. However, their performance on satellite imagery is limited due to the underrepresentation of such data in their training sets, which predominantly consist of ground-level images. Existing prompting techniques for satellite imagery are often restricted to generic phrases like a satellite image of ..., limiting their effectiveness for zero-shot land-use and land-cover (LULC) mapping. To address these challenges, we introduce SenCLIP, which transfers CLIPs representation to Sentinel-2 imagery by leveraging a large dataset of Sentinel-2 images paired with geotagged ground-level photos from across Europe. We evaluate SenCLIP alongside other SOTA remote sensing VLMs on zero-shot LULC mapping tasks using the EuroSAT and BigEarthNet datasets with both aerial and ground-level prompting styles. Our approach, which aligns ground-level representations with satellite imagery, demonstrates significant improvements in classification accuracy across both prompt styles, opening new possibilities for applying free-form textual descriptions in zero-shot LULC mapping.

Read More cs.CV
TraSCE: Trajectory Steering for Co…
Updated:
March 17, 2025
0
0
External Public

Recent advancements in text-to-image diffusion models have brought them to the public spotlight, becoming widely accessible and embraced by everyday users. However, these models have been shown to generate harmful content such as not-safe-for-work (NSFW) images. While approaches have been proposed to erase such abstract concepts from the models, jail-breaking techniques have succeeded in bypassing such safety measures. In this paper, we propose TraSCE, an approach to guide the diffusion trajectory away from generating harmful content. Our approach is based on negative prompting, but as we show in this paper, a widely used negative prompting strategy is not a complete solution and can easily be bypassed in some corner cases. To address this issue, we first propose using a specific formulation of negative prompting instead of the widely used one. Furthermore, we introduce a localized loss-based guidance that enhances the modified negative prompting technique by steering the diffusion trajectory. We demonstrate that our proposed method achieves state-of-the-art results on various benchmarks in removing harmful content, including ones proposed by red teams, and erasing artistic styles and objects. Our proposed approach does not require any training, weight modifications, or training data (either image or prompt), making it easier for model owners to erase new concepts.

Read More cs.CV cs.AI cs.LG
BigDocs: An Open Dataset for Train…
Updated:
March 17, 2025
0
0
External Public

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows, extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

Read More cs.LG cs.CL
DualPM: Dual Posed-Canonical Point…
Updated:
March 17, 2025
84
0
External Public

The choice of data representation is a key factor in the success of deep learning in geometric tasks. For instance, DUSt3R has recently introduced the concept of viewpoint-invariant point maps, generalizing depth prediction, and showing that one can reduce all the key problems in the 3D reconstruction of static scenes to predicting such point maps. In this paper, we develop an analogous concept for a very different problem, namely, the reconstruction of the 3D shape and pose of deformable objects. To this end, we introduce the Dual Point Maps (DualPM), where a pair of point maps is extracted from the same image, one associating pixels to their 3D locations on the object, and the other to a canonical version of the object at rest pose. We also extend point maps to amodal reconstruction, seeing through self-occlusions to obtain the complete shape of the object. We show that 3D reconstruction and 3D pose estimation reduce to the prediction of the DualPMs. We demonstrate empirically that this representation is a good target for a deep network to predict; specifically, we consider modeling horses, showing that DualPMs can be trained purely on 3D synthetic data, consisting of a single model of a horse, while generalizing very well to real images. With this, we improve by a large margin previous methods for the 3D analysis and reconstruction of this type of objects.

Read More cs.CV
A personalized model and optimizat…
Updated:
December 3, 2024
0
0
External Public

Background and objective: Diabetes is one of the four leading causes of death worldwide, necessitating daily blood glucose monitoring. While sweat offers a promising non-invasive alternative for glucose monitoring, its application remains limited due to the low to moderate correlation between sweat and blood glucose concentrations, which has been obtained until now by assuming a linear relationship. This study proposes a novel model-based strategy to estimate blood glucose concentrations from sweat samples, setting the stage for non-invasive glucose monitoring through sweat-sensing technology. Methods: We first developed a pharmacokinetic glucose transport model that describes the glucose transport from blood to sweat. Secondly, we designed a novel optimization strategy leveraging the proposed model to solve the inverse problem and infer blood glucose levels from measured glucose concentrations in sweat. To this end, the pharmacokinetic model parameters with the highest sensitivity were also optimized so as to achieve a personalized estimation. Our strategy was tested on a dataset composed of 108 samples from healthy volunteers and diabetic patients. Results: Our glucose transport model improves over the state-of-the-art in estimating sweat glucose concentrations from blood levels (higher accuracy, p<0.001). Additionally, our optimization strategy effectively solved the inverse problem, yielding a Pearson correlation coefficient of 0.98 across all 108 data points, with an average root-mean-square-percent-error of 12%+/-8%. This significantly outperforms the best sweat-blood glucose correlation reported in the existing literature (0.75). Conclusion: Our innovative optimization strategy, also leveraging more accurate modeling, shows promising results, paving the way for non-invasive blood glucose monitoring and, possibly, improved diabetes management.

Read More q-bio.QM
Advancing global aerosol forecasti…
Updated:
December 3, 2024
0
0
External Public

Aerosol forecasting is essential for air quality warnings, health risk assessment, and climate change mitigation. However, it is more complex than weather forecasting due to the intricate interactions between aerosol physicochemical processes and atmospheric dynamics, resulting in significant uncertainty and high computational costs. Here, we develop an artificial intelligence-driven global aerosol-meteorology forecasting system (AI-GAMFS), which provides reliable 5-day, 3-hourly forecasts of aerosol optical components and surface concentrations at a 0.5{\deg} x 0.625{\deg} resolution. AI-GAMFS combines Vision Transformer and U-Net in a backbone network, robustly capturing the complex aerosol-meteorology interactions via global attention and spatiotemporal encoding. Trained on 42 years of advanced aerosol reanalysis data and initialized with GEOS Forward Processing (GEOS-FP) analyses, AI-GAMFS delivers operational 5-day forecasts in one minute. It outperforms the Copernicus Atmosphere Monitoring Service (CAMS) global forecasting system, GEOS-FP forecasts, and several regional dust forecasting systems in forecasting most aerosol variables including aerosol optical depth and dust components. Our results mark a significant step forward in leveraging AI to refine physics-based aerosol forecasting, facilitating more accurate global warnings for aerosol pollution events, such as dust storms and wildfires.

Read More physics.ao-ph
Deep learning approach for predict…
Updated:
December 3, 2024
0
0
External Public

This paper presents a physics-informed deep learning approach for predicting the replicator equation, allowing accurate forecasting of population dynamics. This methodological innovation allows us to derive governing differential or difference equations for systems that lack explicit mathematical models. We used the SINDy model first introduced by Fasel, Kaiser, Kutz, Brunton, and Brunt 2016a to get the replicator equation, which will significantly advance our understanding of evolutionary biology, economic systems, and social dynamics. By refining predictive models across multiple disciplines, including ecology, social structures, and moral behaviours, our work offers new insights into the complex interplay of variables shaping evolutionary outcomes in dynamic systems

Read More cs.AI
IQA-Adapter: Exploring Knowledge T…
Updated:
March 16, 2025
93
0
External Public

Diffusion-based models have recently revolutionized image generation, achieving unprecedented levels of fidelity. However, consistent generation of high-quality images remains challenging partly due to the lack of conditioning mechanisms for perceptual quality. In this work, we propose methods to integrate image quality assessment (IQA) models into diffusion-based generators, enabling quality-aware image generation. We show that diffusion models can learn complex qualitative relationships from both IQA models' outputs and internal activations. First, we experiment with gradient-based guidance to optimize image quality directly and show this method has limited generalizability. To address this, we introduce IQA-Adapter, a novel framework that conditions generation on target quality levels by learning the implicit relationship between images and quality scores. When conditioned on high target quality, IQA-Adapter can shift the distribution of generated images towards a higher-quality subdomain, and, inversely, it can be used as a degradation model, generating progressively more distorted images when provided with a lower-quality signal. Under high-quality condition, IQA-Adapter achieves up to a 10% improvement across multiple objective metrics, as confirmed by a user preference study, while preserving generative diversity and content. Furthermore, we extend IQA-Adapter to a reference-based conditioning scenario, utilizing the rich activation space of IQA models to transfer highly specific, content-agnostic qualitative features between images.

Read More cs.CV cs.AI
Local and Regional Contributions t…
Updated:
December 2, 2024
0
0
External Public

The Wasatch Front in Utah, USA, is currently a non-attainment area for ozone according to the Environmental Protection Agency's (EPA) National Ambient Air Quality Standards (NAAQS). Nitrogen oxides ($\mathrm{NO_x = NO_2 + NO}$) and volatile organic compounds (VOCs), in the presence of sunlight, lead to ozone formation in the troposphere. When the rate of oxidant production, defined as the sum of $\mathrm{O_3}$ and $\mathrm{NO_2}$, is faster than the rate of $\mathrm{NO_x}$ production, a region is said to be $\mathrm{NO_x}$limited, and ozone formation will be limited by the concentration of $\mathrm{NO_x}$ species in the region. The inverse of this situation makes the region VOC-limited. Knowing whether a region is $\mathrm{NO_x}$-limited or VOC-limited can aid in generating effective mitigation strategies. Understanding the background or regional contributions to ozone in a region, whether from the transport of precursors or of ozone, provides information about the lower limit for ozone concentrations that a region can achieve through regulation of local precursors. In this paper, measured oxidant and $\mathrm{NO_x}$ concentrations are analyzed from 14 counties in the state of Utah to calculate the regional and local contributions to ozone for each region. This analysis is used to determine the nature of the atmosphere in each county by identifying whether the region is VOC or $\mathrm{NO_x}$-limited. Furthermore, this analysis is performed for each county for the years 2012 and 2022 to assess changes in the oxidative nature and quantify regional and local contributions to ozone over a 10-year period. All studied counties--except for Washington County--in Utah were found to be VOC-limited in 2012. This shifted in 2022, with most counties being either in a transitional state or $\mathrm{NO_x}$-limited. Local contributions to ozone increased in two major counties, Cache and Salt Lake Counties, but decreased in Carbon, Davis, Duchesne, Uinta, Utah, Washington, and Weber Counties. Generally, the regional contributions to oxidant concentrations decreased across the state. A summertime spike in both regional and local contributions to oxidants was observed. Smoke from wildfires was found to increase regional contributions to oxidants and shift the local regime to be more $\mathrm{NO_x}$-limited.

Read More physics.ao-ph
GuardSplat: Efficient and Robust W…
Updated:
March 17, 2025
76
0
External Public

3D Gaussian Splatting (3DGS) has recently created impressive 3D assets for various applications. However, considering security, capacity, invisibility, and training efficiency, the copyright of 3DGS assets is not well protected as existing watermarking methods are unsuited for its rendering pipeline. In this paper, we propose GuardSplat, an innovative and efficient framework for watermarking 3DGS assets. Specifically, 1) We propose a CLIP-guided pipeline for optimizing the message decoder with minimal costs. The key objective is to achieve high-accuracy extraction by leveraging CLIP's aligning capability and rich representations, demonstrating exceptional capacity and efficiency. 2) We tailor a Spherical-Harmonic-aware (SH-aware) Message Embedding module for 3DGS, seamlessly embedding messages into the SH features of each 3D Gaussian while preserving the original 3D structure. This enables watermarking 3DGS assets with minimal fidelity trade-offs and prevents malicious users from removing the watermarks from the model files, meeting the demands for invisibility and security. 3) We present an Anti-distortion Message Extraction module to improve robustness against various distortions. Experiments demonstrate that GuardSplat outperforms state-of-the-art and achieves fast optimization speed. Project page is at https://narcissusex.github.io/GuardSplat, and Code is at https://github.com/NarcissusEx/GuardSplat.

Read More cs.CV cs.CR
Population dynamics in the global …
Updated:
February 11, 2025
0
0
External Public

Coral reefs are crucial to marine biodiversity and rely on a delicate symbiotic relationship between corals and zooxanthellae algae. Water temperature variations, however, disrupt this association, leading to coral bleaching events that severely affect marine ecosystems. This study presents a mathematical model for the population dynamics of coral and symbiont species considering the coral symbiont network and recurrent warming events. The model incorporates thermal tolerances of species and coupled growth dynamics (between corals and symbionts) to investigate how network structure and thermal tolerance influence the species' growth. Using real data from different ocean regions, results reveal that network connectivity plays a significant role in population growth after successive warming events, with generalist species demonstrating greater growth across all regions analyzed. The comparatively higher correlation between node degree and final population also emphasizes the impact of ecological network structure on species growth, offering valuable insights into coral reef population dynamics under climate change. This research highlights the need to consider network structure beyond species' thermal tolerances when evaluating the ecological responses of corals to environmental changes.

Read More q-bio.PE
Assessing the potential of state-o…
Updated:
November 28, 2024
29
0
External Public

The growing adoption of machine learning (ML) in modelling atmospheric and oceanic processes offers a promising alternative to traditional numerical methods. It is essential to benchmark the performance of both ML and physics-informed ML (PINN) models to evaluate their predictive skill, particularly for short- to medium-term forecasting. In this study, we utilize gridded sea surface temperature (SST) data and six atmospheric predictors (cloud cover, relative humidity, solar radiation, surface pressure, u-component of velocity, and v-component of velocity) to capture both spatial and temporal patterns in SST predictions.

Read More physics.ao-ph
Can bidirectional encoder become t…
Updated:
November 27, 2024
63
0
External Public

Over the past few decades, Artificial Intelligence(AI) has progressed from the initial machine learning stage to the deep learning stage, and now to the stage of foundational models. Foundational models have the characteristics of pre-training, transfer learning, and self-supervised learning, and pre-trained models can be fine-tuned and applied to various downstream tasks. Under the framework of foundational models, models such as Bidirectional Encoder Representations from Transformers(BERT) and Generative Pre-trained Transformer(GPT) have greatly advanced the development of natural language processing(NLP), especially the emergence of many models based on BERT. BERT broke through the limitation of only using one-way methods for language modeling in pre-training by using a masked language model. It can capture bidirectional context information to predict the masked words in the sequence, this can improve the feature extraction ability of the model. This makes the model very useful for downstream tasks, especially for specialized applications. The model using the bidirectional encoder can better understand the domain knowledge and be better applied to these downstream tasks. So we hope to help understand how this technology has evolved and improved model performance in various natural language processing tasks under the background of foundational models and reveal its importance in capturing context information and improving the model's performance on downstream tasks. This article analyzes one-way and bidirectional models based on GPT and BERT and compares their differences based on the purpose of the model. It also briefly analyzes BERT and the improvements of some models based on BERT. The model's performance on the Stanford Question Answering Dataset(SQuAD) and General Language Understanding Evaluation(GLUE) was compared.

Read More cs.CL
Video-Guided Foley Sound Generatio…
Updated:
March 17, 2025
96
3
External Public

Generating sound effects for videos often requires creating artistic sound effects that diverge significantly from real-life sources and flexible control in the sound design. To address this problem, we introduce MultiFoley, a model designed for video-guided sound generation that supports multimodal conditioning through text, audio, and video. Given a silent video and a text prompt, MultiFoley allows users to create clean sounds (e.g., skateboard wheels spinning without wind noise) or more whimsical sounds (e.g., making a lion's roar sound like a cat's meow). MultiFoley also allows users to choose reference audio from sound effects (SFX) libraries or partial videos for conditioning. A key novelty of our model lies in its joint training on both internet video datasets with low-quality audio and professional SFX recordings, enabling high-quality, full-bandwidth (48kHz) audio generation. Through automated evaluations and human studies, we demonstrate that MultiFoley successfully generates synchronized high-quality sounds across varied conditional inputs and outperforms existing methods. Please see our project page for video results: https://ificl.github.io/MultiFoley/

Read More cs.CV cs.MM More categories
Utilizing Low-Cost Sensors to Moni…
Updated:
November 26, 2024
32
4
External Public

Air quality has important climate and health effects. There is a need, therefore, to monitor air quality both indoors and outdoors. Methods of measuring air quality should be cost-effective if they are to be used widely, and one such method is low-cost sensors (LCS). This study reports on the use of LCSs in Ulaanbaatar, Mongolia, to measure $\mathrm{PM_{2.5}}$ concentrations inside yurts or "gers." Some of these gers were part of a non-government agency (NGO) initiative to improve the insulating properties of these housing structures. The goal of the NGO was to decrease particulate emissions inside the gers; a secondary result was to lower the use of coal and other biomass material. LCSs were installed in gers heated primarily by coal, and interior air quality was measured. Gers that were modified by increasing their insulating capacities showed a 17.5% reduction in $\mathrm{PM_{2.5}}$ concentrations, but these concentrations remained higher than levels recommended by health organizations. Gers that were insulated and used a combination of both coal and electricity showed a 19.1% reduction in $\mathrm{PM_{2.5}}$ concentrations. Insulated gers that used electricity for both heating and cooking showed a 48% reduction in $\mathrm{PM_{2.5}}$, though concentrations were still 6.4 times higher than those recommended by the World Health Organization (WHO). Nighttime and daytime trends followed similar patterns in $\mathrm{PM_{2.5}}$ concentrations with slight variations. It was found that, at nighttime, the outside $\mathrm{PM_{2.5}}$ concentrations were generally higher than the inside concentrations of the gers in this study. This suggests that $\mathrm{PM_{2.5}}$ would flow into the gers whenever the doors were opened, causing spikes in $\mathrm{PM_{2.5}}$ concentrations.

Read More physics.ao-ph
ΩSFormer: Dual-Modal Ω-like Super-…
Updated:
November 26, 2024
7
0
External Public

Terraced field is a significant engineering practice for soil and water conservation (SWC). Terraced field extraction from remotely sensed imagery is the foundation for monitoring and evaluating SWC. This study is the first to propose a novel dual-modal {\Omega}-like super-resolution Transformer network for intelligent TFVE, offering the following advantages: (1) reducing edge segmentation error from conventional multi-scale downsampling encoder, through fusing original high-resolution features with downsampling features at each step of encoder and leveraging a multi-head attention mechanism; (2) improving the accuracy of TFVE by proposing a {\Omega}-like network structure, which fully integrates rich high-level features from both spectral and terrain data to form cross-scale super-resolution features; (3) validating an optimal fusion scheme for cross-modal and cross-scale (i.e., inconsistent spatial resolution between remotely sensed imagery and DEM) super-resolution feature extraction; (4) mitigating uncertainty between segmentation edge pixels by a coarse-to-fine and spatial topological semantic relationship optimization (STSRO) segmentation strategy; (5) leveraging contour vibration neural network to continuously optimize parameters and iteratively vectorize terraced fields from semantic segmentation results. Moreover, a DMRVD for deep-learning-based TFVE was created for the first time, which covers nine study areas in four provinces of China, with a total coverage area of 22441 square kilometers. To assess the performance of {\Omega}SFormer, classic and SOTA networks were compared. The mIOU of {\Omega}SFormer has improved by 0.165, 0.297 and 0.128 respectively, when compared with best accuracy single-modal remotely sensed imagery, single-modal DEM and dual-modal result.

Read More cs.CV
Classifier-Free Guidance inside th…
Updated:
March 17, 2025
0
0
External Public

Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel perspective on the memorization phenomenon and propose a simple yet effective approach to mitigate it. We argue that memorization occurs because of an attraction basin in the denoising process which steers the diffusion trajectory towards a memorized image. However, this can be mitigated by guiding the diffusion trajectory away from the attraction basin by not applying classifier-free guidance until an ideal transition point occurs from which classifier-free guidance is applied. This leads to the generation of non-memorized images that are high in image quality and well-aligned with the conditioning mechanism. To further improve on this, we present a new guidance technique, opposite guidance, that escapes the attraction basin sooner in the denoising process. We demonstrate the existence of attraction basins in various scenarios in which memorization occurs, and we show that our proposed approach successfully mitigates memorization.

Read More cs.CV cs.AI cs.LG
StructFormer: Document Structure-b…
Updated:
November 25, 2024
17
0
External Public

Most state-of-the-art techniques for Language Models (LMs) today rely on transformer-based architectures and their ubiquitous attention mechanism. However, the exponential growth in computational requirements with longer input sequences confines Transformers to handling short passages. Recent efforts have aimed to address this limitation by introducing selective attention mechanisms, notably local and global attention. While sparse attention mechanisms, akin to full attention in being Turing-complete, have been theoretically established, their practical impact on pre-training remains unexplored. This study focuses on empirically assessing the influence of global attention on BERT pre-training. The primary steps involve creating an extensive corpus of structure-aware text through arXiv data, alongside a text-only counterpart. We carry out pre-training on these two datasets, investigate shifts in attention patterns, and assess their implications for downstream tasks. Our analysis underscores the significance of incorporating document structure into LM models, demonstrating their capacity to excel in more abstract tasks, such as document understanding.

Read More cs.CL
GeoFormer: A Multi-Polygon Segment…
Updated:
November 25, 2024
44
0
External Public

In remote sensing there exists a common need for learning scale invariant shapes of objects like buildings. Prior works relies on tweaking multiple loss functions to convert segmentation maps into the final scale invariant representation, necessitating arduous design and optimization. For this purpose we introduce the GeoFormer, a novel architecture which presents a remedy to the said challenges, learning to generate multipolygons end-to-end. By modeling keypoints as spatially dependent tokens in an auto-regressive manner, the GeoFormer outperforms existing works in delineating building objects from satellite imagery. We evaluate the robustness of the GeoFormer against former methods through a variety of parameter ablations and highlight the advantages of optimizing a single likelihood function. Our study presents the first successful application of auto-regressive transformer models for multi-polygon predictions in remote sensing, suggesting a promising methodological alternative for building vectorization.

Read More cs.CV
Towards Context-Rich Automated Bio…
Updated:
November 21, 2024
38
0
External Public

Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Here we present an integrated approach that combines deep learning-based vision and language models to improve ecological reporting using data from camera traps. We introduce a two-stage system: YOLOv10-X to localise and classify species (mammals and birds) within images, and a Phi-3.5-vision-instruct model to read YOLOv10-X binding box labels to identify species, overcoming its limitation with hard to classify objects in images. Additionally, Phi-3.5 detects broader variables, such as vegetation type, and time of day, providing rich ecological and environmental context to YOLO's species detection output. When combined, this output is processed by the model's natural language system to answer complex queries, and retrieval-augmented generation (RAG) is employed to enrich responses with external information, like species weight and IUCN status (information that cannot be obtained through direct visual analysis). This information is used to automatically generate structured reports, providing biodiversity stakeholders with deeper insights into, for example, species abundance, distribution, animal behaviour, and habitat selection. Our approach delivers contextually rich narratives that aid in wildlife management decisions. By providing contextually rich insights, our approach not only reduces manual effort but also supports timely decision-making in conservation, potentially shifting efforts from reactive to proactive management.

Read More cs.CV cs.AI
Neural machine translation of seis…
Updated:
November 20, 2024
0
0
External Public

Effective structural assessment of urban infrastructure is essential for sustainable land use and resilience to climate change and natural hazards. Seismic wave methods are widely applied in these areas for subsurface characterization and monitoring, yet they often rely on time-consuming inversion techniques that fall short in delivering comprehensive geological, hydrogeological, and geomechanical descriptions. Here, we explore the effectiveness of a passive seismic approach coupled with artificial intelligence (AI) for monitoring geological structures and hydrogeological conditions in the context of sinkhole hazard assessment. We introduce a deterministic petrophysical inversion technique based on a language model that decodes seismic wave velocity measurements to infer soil petrophysical and mechanical parameters as textual descriptions. Results successfully delineate 3D subsurface structures with their respective soil nature and mechanical characteristics, while accurately predicting daily water table levels. Validation demonstrates high accuracy, with a normalized root mean square error of 8%, closely rivaling with conventional stochastic seismic inversion methods, while delivering broader insights into subsurface conditions 2,000 times faster. These findings underscore the potential of advanced AI techniques to significantly enhance subsurface characterization across diverse scales, supporting decision-making for natural hazard mitigation.

Read More physics.geo-ph
Fast MRI for All: Bridging Equity …
Updated:
March 13, 2025
70
0
External Public

Physics-driven deep learning (PD-DL) approaches have become popular for improved reconstruction of fast magnetic resonance imaging (MRI) scans. Though PD-DL offers higher acceleration rates than existing clinical fast MRI techniques, their use has been limited outside specialized MRI centers. A key challenge is generalization to underrepresented pathologies or populations, noted in multiple studies, with fine-tuning on target populations suggested for improvement. However, current approaches for PD-DL training require access to raw k-space measurements, which is typically only available at specialized MRI centers that have research agreements for such data access. This is especially an issue for rural and underserved areas, where commercial MRI scanners only provide access to a final reconstructed image. To tackle these challenges, we propose Compressibility-inspired Unsupervised Learning via Parallel Imaging Fidelity (CUPID) for high-quality PD-DL training using only routine clinical reconstructed images exported from an MRI scanner. CUPID evaluates output quality with a compressibility-based approach while ensuring that the output stays consistent with the clinical parallel imaging reconstruction through well-designed perturbations. Our results show CUPID achieves similar quality to established PD-DL training that requires k-space data while outperforming compressed sensing (CS) and diffusion-based generative methods. We further demonstrate its effectiveness in a zero-shot training setup for retrospectively and prospectively sub-sampled acquisitions, attesting to its minimal training burden. As an approach that radically deviates from existing strategies, CUPID presents an opportunity to provide equitable access to fast MRI for underserved populations in an attempt to reduce the inequalities associated with this expensive imaging modality.

Read More eess.IV cs.AI More categories
First observations of a geomagneti…
Updated:
February 14, 2025
82
0
External Public

Forecasting the geomagnetic effects of solar coronal mass ejections (CMEs) is currently an unsolved problem. CMEs, responsible for the largest values of the north-south component of the interplanetary magnetic field, are the key driver of intense and extreme geomagnetic activity. Observations of southward interplanetary magnetic fields are currently only accessible directly through in situ measurements by spacecraft in the solar wind. On 10-12 May 2024, the strongest geomagnetic storm since 2003 took place, caused by five interacting CMEs. We clarify the relationship between the CMEs, their solar source regions, and the resulting signatures at the Sun-Earth L1 point observed by the ACE spacecraft at 1.00 AU. The STEREO-A spacecraft was situated at 0.956 AU and 12.6{\deg} west of Earth during the event, serving as a fortuitous sub-L1 monitor providing interplanetary magnetic field measurements of the solar wind. We demonstrate an extension of the prediction lead time, as the shock was observed 2.57 hours earlier at STEREO-A than at L1, consistent with the measured shock speed at L1, 710 km/s, and the radial distance of 0.04 AU. By deriving the geomagnetic indices based on the STEREO-A beacon data, we show that the strength of the geomagnetic storm would have been decently forecasted, with the modeled minimum SYM-H=-478.5 nT, underestimating the observed minimum by only 8%. Our study sets an unprecedented benchmark for future mission design using upstream monitoring for space weather prediction.

Read More physics.space-ph
Near-real-time design of experimen…
Updated:
November 17, 2024
0
0
External Public

Monitoring the seismic activity of volcanoes is crucial for hazard assessment and eruption forecasting. The layout of each seismic network determines the information content of recorded data about volcanic earthquakes, and experimental design methods optimise sensor locations to maximise that information. We provide a code package that implements Bayesian experimental design to optimise seismometer networks to locate seismicity at any volcano, and a practical guide to make this easily and rapidly implementable by any volcano seismologist. This work is the first to optimise travel-time, amplitude and array source location methods simultaneously, making it suitable for a wide range of volcano monitoring scenarios. The code-package is designed to be straightforward to use and can be adapted to a wide range of scenarios, and automatically links to existing global databases of topography and properties of volcanoes worldwide to allow rapid deployment. Any user should be able to obtain an initial design within minutes using a combination of generic and volcano-specific information to guide the design process, and to refine the design for their specific scenario within hours, if more specific prior information is available.

Read More physics.geo-ph
Satellite monitoring of long perio…
Updated:
December 4, 2024
63
2
External Public

Satellite magnetic field observations have the potential to provide valuable information on dynamics, heat content and salinity throughout the ocean. Here we present the expected spatio-temporal characteristics of the ocean-induced magnetic field at satellite altitude on periods of months to decades. We compare these to the characteristics of other sources of Earth's magnetic field, and discuss whether it is feasible for the ocean-induced magnetic field to be retrieved and routinely monitored from space. We focus on large length scales (spherical harmonic degrees up to 30) and periods from one month up to five years. To characterize the expected ocean signal we make use of advanced numerical simulations taking high resolution oceanographic inputs and solve the magnetic induction equation in 3D including galvanic coupling and self induction effects. We find the time-varying ocean-induced signal dominates over the primary source of the internal field, the core dynamo, at high spherical harmonic degree with the cross-over taking place at degree 15 to 20 depending on the considered period. The ionospheric and magnetospheric fields (including their Earth induced counterparts) have most power on periods shorter than one month and are expected to be mostly zonal in magnetic coordinates at satellite altitude. Based on these findings we discuss future prospects for isolating and monitoring long period ocean induced magnetic field variations using data collected by present and upcoming magnetic survey satellites.

Read More physics.geo-ph
Differentiable Land Model Reveals …
Updated:
November 14, 2024
0
0
External Public

Accurate modeling of terrestrial carbon and water exchange requires robust ecological parameters that capture vegetation responses and adaptations to the local environment. The current generation of land models use Plant Functional Types (PFTs) to discretize vegetation functional diversity, but these coarse categorizations often overlook fine-scale variations shaped by local climate, soil, and forest age factors. The lack of governing equations for plant adaptation demands a paradigm shift in how we integrate diverse Earth observations to uncover ecological functional dependence on changing environments. To address this challenge, we developed DifferLand, a differentiable, hybrid physics and machine learning model that infers the spatial distributions of ecological parameters and their relationships with environmental factors constrained by satellite and in-situ observations. Our model unifies top-down and bottom-up observational constraints with process-based knowledge to generate a global analysis of ecological functions and their adaptation to environmental gradients. We found PFTs account for less than half of the explainable spatial parameter variations controlling carbon fluxes and vegetation states. The remaining parameter variability is largely driven by local climate and forest demography factors, and the learned environment-parameter relationships lead to enhanced spatial generalization at unseen locations. DifferLand identified growing season length, leaf economics, and agricultural intensity as the three orthogonal spatial gradients underlying parameter variations. Our novel framework can lead to new insights on global carbon cycling by learning directly from data and expanding our understanding of local responses of ecosystems to environmental drivers.

Read More physics.geo-ph
Training Neural Networks as Recogn…
Updated:
March 17, 2025
56
2
External Public

Characterizing the computational power of neural network architectures in terms of formal language theory remains a crucial line of research, as it describes lower and upper bounds on the reasoning capabilities of modern AI. However, when empirically testing these bounds, existing work often leaves a discrepancy between experiments and the formal claims they are meant to support. The problem is that formal language theory pertains specifically to recognizers: machines that receive a string as input and classify whether it belongs to a language. On the other hand, it is common instead to evaluate language models on proxy tasks, e.g., language modeling or sequence-to-sequence transduction, that are similar in only an informal sense to the underlying theory. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings, using a general method that can be applied to a wide variety of languages. As part of this, we extend an algorithm recently proposed by Sn{\ae}bjarnarson et al. (2025) for efficient length-controlled sampling of strings from regular languages. We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures: a simple RNN, an LSTM, and a causally-masked transformer. We find that the RNN and LSTM often outperform the transformer, and that auxiliary training objectives such as language modeling can help, although no single objective uniformly improves performance across languages and architectures. Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work. We have released our datasets as a benchmark called FLaRe (Formal Language Recognition), along with our code.

Read More cs.CL cs.LG
Recursive Interferometric Surface-…
Updated:
February 24, 2025
20
0
External Public

High-resolution seismic reflections are essential for imaging and monitoring applications. In seismic land surveys using sources and receivers at the surface, surface waves often dominate, masking the reflections. In this study, we demonstrate the efficacy of a two-step procedure to suppress surface waves in an active-source reflection seismic dataset. First, we apply seismic interferometry (SI) by cross-correlation, turning receivers into virtual sources to estimate the dominant surface waves. Then, we perform adaptive subtraction to minimise the difference between the surface waves in the original data and the result of SI. We propose a new approach where the initial suppression results are used for further iterations, followed by adaptive subtraction. This technique aims to enhance the efficacy of data-driven surface-wave suppression through an iterative process. We use a 2D seismic reflection dataset from Scheemda, situated in the Groningen province of the Netherlands, to illustrate the technique's efficiency. A comparison between the data after recursive interferometric surface-wave suppression and the original data across time and frequency-wavenumber domains shows significant suppression of the surface waves, enhancing visualization of the reflections for following subsurface imaging and monitoring studies.

Read More physics.geo-ph
Advanced computer vision for extra…
Updated:
March 17, 2025
118
0
External Public

This paper presents a framework for extracting georeferenced vehicle trajectories from high-altitude drone imagery, addressing key challenges in urban traffic monitoring and the limitations of traditional ground-based systems. Our approach integrates several novel contributions, including a tailored object detector optimized for high-altitude bird's-eye view perspectives, a unique track stabilization method that uses detected vehicle bounding boxes as exclusion masks during image registration, and an orthophoto and master frame-based georeferencing strategy that enhances consistent alignment across multiple drone viewpoints. Additionally, our framework features robust vehicle dimension estimation and detailed road segmentation, enabling comprehensive traffic analysis. Conducted in the Songdo International Business District, South Korea, the study utilized a multi-drone experiment covering 20 intersections, capturing approximately 12TB of 4K video data over four days. The framework produced two high-quality datasets: the Songdo Traffic dataset, comprising approximately 700,000 unique vehicle trajectories, and the Songdo Vision dataset, containing over 5,000 human-annotated images with about 300,000 vehicle instances in four classes. Comparisons with high-precision sensor data from an instrumented probe vehicle highlight the accuracy and consistency of our extraction pipeline in dense urban environments. The public release of Songdo Traffic and Songdo Vision, and the complete source code for the extraction pipeline, establishes new benchmarks in data quality, reproducibility, and scalability in traffic research. Results demonstrate the potential of integrating drone technology with advanced computer vision for precise and cost-effective urban traffic monitoring, providing valuable resources for developing intelligent transportation systems and enhancing traffic management strategies.

Read More cs.CV cs.AI cs.LG
Optimizing Multi-Scale Representat…
Updated:
March 15, 2025
0
0
External Public

Earth Observation (EO) data are increasingly used in policy analysis by enabling granular estimation of conditional average treatment effects (CATE). However, a challenge in EO-based causal inference is determining the scale of the input satellite imagery -- balancing the trade-off between capturing fine-grained individual heterogeneity in smaller images and broader contextual information in larger ones. This paper introduces Multi-Scale Representation Concatenation, a set of composable procedures that transform arbitrary single-scale EO-based CATE estimation algorithms into multi-scale ones. We benchmark the performance of Multi-Scale Representation Concatenation on a CATE estimation pipeline that combines Vision Transformer (ViT) models (which encode images) with Causal Forests (CFs) to obtain CATE estimates from those encodings. We first perform simulation studies where the causal mechanism is known, showing that our multi-scale approach captures information relevant to effect heterogeneity that single-scale ViT models fail to capture as measured by $R^2$. We then apply the multi-scale method to two randomized controlled trials (RCTs) conducted in Peru and Uganda using Landsat satellite imagery. As we do not have access to ground truth CATEs in the RCT analysis, the Rank Average Treatment Effect Ratio (RATE Ratio) measure is employed to assess performance. Results indicate that Multi-Scale Representation Concatenation improves the performance of deep learning models in EO-based CATE estimation without the complexity of designing new multi-scale architectures for a specific use case. The application of Multi-Scale Representation Concatenation could have meaningful policy benefits -- e.g., potentially increasing the impact of poverty alleviation programs without additional resource expenditure.

Read More stat.ML cs.LG I.4.7; I.4.9
A Theoretical Characterization of …
Updated:
February 1, 2025
0
0
External Public

Data augmentations play an important role in the recent success of Self-Supervised Learning (SSL). While commonly viewed as encoding invariances into the learned representations, this interpretation overlooks the impact of the pretraining architecture and suggests that SSL would require diverse augmentations which resemble the data to work well. However, these assumptions do not align with empirical evidence, encouraging further theoretical understanding to guide the principled design of augmentations in new domains. To this end, we use kernel theory to derive analytical expressions for data augmentations that achieve desired target representations after pretraining. We consider two popular non-contrastive losses, VICReg and Barlow Twins, and provide an algorithm to construct such augmentations. Our analysis shows that augmentations need not be similar to the data to learn useful representations, nor be diverse, and that the architecture has a significant impact on the optimal augmentations.

Read More cs.LG
The 10 October 2024 geomagnetic st…
Updated:
December 19, 2024
61
2
External Public

In this short communication, we qualitatively analyze possible effects of the 10 October 2024 geomagnetic storm on accelerating the reentry of a Starlink satellite from very low-Earth orbit (VLEO). The storm took place near the maximum of solar cycle (SC) 25, which has shown to be more intense than SC24. Based on preliminary geomagnetic indices, the 10 October 2024, along with the 10 May 2024, were the most intense events since the well-known Halloween storms of October/November 2003. By looking at a preliminary version of the Dst index and altitudes along with velocities extracted from two-line element (TLE) data of the Starlink-1089 (SL-1089) satellite, we observe a possible connection between storm main phase onset and a sharp decay of SL-1089. The satellite was predicted to reenter on 22 October, but it reentered on 12 October, 10 days before schedule. The sharp altitude decay of SL-1089 revealed by TLE data coincides with the storm main phase onset. We compare the de-orbiting altitudes of another three satellites during different geomagnetic conditions and observe that the day difference between actual and predicted reentries increases for periods with higher geomagnetic activity. Therefore, we call for future research to establish the eventual causal relationship between storm occurrence and satellite orbital decay. As predicted by previous works, SC25 is already producing extreme geomagnetic storms with unprecedented satellite orbital drag effects and consequences for current megaconstellations in VLEO.

Read More physics.space-ph
LAM-YOLO: Drones-based Small Objec…
Updated:
November 1, 2024
58
0
External Public

Drone-based target detection presents inherent challenges, such as the high density and overlap of targets in drone-based images, as well as the blurriness of targets under varying lighting conditions, which complicates identification. Traditional methods often struggle to recognize numerous densely packed small targets under complex background. To address these challenges, we propose LAM-YOLO, an object detection model specifically designed for drone-based. First, we introduce a light-occlusion attention mechanism to enhance the visibility of small targets under different lighting conditions. Meanwhile, we incroporate incorporate Involution modules to improve interaction among feature layers. Second, we utilize an improved SIB-IoU as the regression loss function to accelerate model convergence and enhance localization accuracy. Finally, we implement a novel detection strategy that introduces two auxiliary detection heads for identifying smaller-scale targets.Our quantitative results demonstrate that LAM-YOLO outperforms methods such as Faster R-CNN, YOLOv9, and YOLOv10 in terms of mAP@0.5 and mAP@0.5:0.95 on the VisDrone2019 public dataset. Compared to the original YOLOv8, the average precision increases by 7.1\%. Additionally, the proposed SIB-IoU loss function shows improved faster convergence speed during training and improved average precision over the traditional loss function.

Read More cs.CV
GPT or BERT: why not both?
Updated:
December 29, 2024
0
0
External Public

We present a simple way to merge masked language modeling with causal language modeling. This hybrid training objective results in a model that combines the strengths of both modeling paradigms within a single transformer stack: GPT-BERT can be transparently used like any standard causal or masked language model. We test the pretraining process that enables this flexible behavior on the BabyLM Challenge 2024. The results show that the hybrid pretraining outperforms masked-only or causal-only models. We openly release the models, training corpora and code.

Read More cs.CL
NeFF-BioNet: Crop Biomass Predicti…
Updated:
October 30, 2024
59
0
External Public

Crop biomass offers crucial insights into plant health and yield, making it essential for crop science, farming systems, and agricultural research. However, current measurement methods, which are labor-intensive, destructive, and imprecise, hinder large-scale quantification of this trait. To address this limitation, we present a biomass prediction network (BioNet), designed for adaptation across different data modalities, including point clouds and drone imagery. Our BioNet, utilizing a sparse 3D convolutional neural network (CNN) and a transformer-based prediction module, processes point clouds and other 3D data representations to predict biomass. To further extend BioNet for drone imagery, we integrate a neural feature field (NeFF) module, enabling 3D structure reconstruction and the transformation of 2D semantic features from vision foundation models into the corresponding 3D surfaces. For the point cloud modality, BioNet demonstrates superior performance on two public datasets, with an approximate 6.1% relative improvement (RI) over the state-of-the-art. In the RGB image modality, the combination of BioNet and NeFF achieves a 7.9% RI. Additionally, the NeFF-based approach utilizes inexpensive, portable drone-mounted cameras, providing a scalable solution for large field applications.

Read More cs.CV
Show Me What and Where has Changed…
Updated:
November 13, 2024
46
1
External Public

Remote sensing change detection aims to perceive changes occurring on the Earth's surface from remote sensing data in different periods, and feed these changes back to humans. However, most existing methods only focus on detecting change regions, lacking the capability to interact with users to identify changes that the users expect. In this paper, we introduce a new task named Change Detection Question Answering and Grounding (CDQAG), which extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence. To this end, we construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks. It encompasses 10 essential land-cover categories and 8 comprehensive question types, which provides a valuable and diverse dataset for remote sensing applications. Furthermore, we present VisTA, a simple yet effective baseline method that unifies the tasks of question answering and grounding by delivering both visual and textual answers. Our method achieves state-of-the-art results on both the classic change detection-based visual question answering (CDVQA) and the proposed CDQAG datasets. Extensive qualitative and quantitative experimental results provide useful insights for developing better CDQAG models, and we hope that our work can inspire further research in this important yet underexplored research field. The proposed benchmark dataset and method are available at https://github.com/like413/VisTA.

Read More cs.CV
A little less conversation, a litt…
Updated:
January 3, 2025
0
0
External Public

As general-purpose tools, Large Language Models (LLMs) must often reason about everyday physical environments. In a question-and-answer capacity, understanding the interactions of physical objects may be necessary to give appropriate responses. Moreover, LLMs are increasingly used as reasoning engines in agentic systems, designing and controlling their action sequences. The vast majority of research has tackled this issue using static benchmarks, comprised of text or image-based questions about the physical world. However, these benchmarks do not capture the complexity and nuance of real-life physical processes. Here we advocate for a second, relatively unexplored, approach: 'embodying' the LLMs by granting them control of an agent within a 3D environment. We present the first embodied and cognitively meaningful evaluation of physical common-sense reasoning in LLMs. Our framework allows direct comparison of LLMs with other embodied agents, such as those based on Deep Reinforcement Learning, and human and non-human animals. We employ the Animal-AI (AAI) environment, a simulated 3D virtual laboratory, to study physical common-sense reasoning in LLMs. For this, we use the AAI Testbed, a suite of experiments that replicate laboratory studies with non-human animals, to study physical reasoning capabilities including distance estimation, tracking out-of-sight objects, and tool use. We demonstrate that state-of-the-art multi-modal models with no finetuning can complete this style of task, allowing meaningful comparison to the entrants of the 2019 Animal-AI Olympics competition and to human children. Our results show that LLMs are currently outperformed by human children on these tasks. We argue that this approach allows the study of physical reasoning using ecologically valid experiments drawn directly from cognitive science, improving the predictability and reliability of LLMs.

Read More cs.AI
Bio-optical characterization using…
Updated:
October 30, 2024
0
0
External Public

In ocean colour remote sensing, radiance at the sensor level can be modeled using molecular scattering and particle scattering based on existing mathematical models and gaseous absorption in the atmosphere. The modulation of light field by optical constituents within the seawater waters results in the spectral variation of water leaving radiances that can be related to phytoplankton pigment concentration, total suspended matter, vertical diffuse attenuation coefficients etc. Atmospheric correction works very well over open ocean using NIR channels of ocean colour sensors to retrieve geophysical products with reasonable accuracy while it fails over sediment laden and/or optically complex waters. To resolve this issue, a combination of SWIR channels or NIR-SWIR channels are configured in some ocean colour sensors such as Sentinel- OLCI, EOS- 06 OCM etc. Ocean Colour Monitor (OCM)-3 on board EOS -06 was launched on Nov 26, 2022. It has 13 bands in VNIR (400-1010 nm range) with ~1500 km swath for ocean colour monitoring. Arabian Sea near Gujarat coast is chosen as our study site to showcase the geophysical products derived using OCM-3 onboard EOS-06.

Read More physics.ao-ph
Community search signatures as fou…
Updated:
October 30, 2024
28
1
External Public

Aggregated relative search frequencies offer a unique composite signal reflecting people's habits, concerns, interests, intents, and general information needs, which are not found in other readily available datasets. Temporal search trends have been successfully used in time series modeling across a variety of domains such as infectious diseases, unemployment rates, and retail sales. However, most existing applications require curating specialized datasets of individual keywords, queries, or query clusters, and the search data need to be temporally aligned with the outcome variable of interest. We propose a novel approach for generating an aggregated and anonymized representation of search interest as foundation features at the community level for geospatial modeling. We benchmark these features using spatial datasets across multiple domains. In zip codes with a population greater than 3000 that cover over 95% of the contiguous US population, our models for predicting missing values in a 20% set of holdout counties achieve an average $R^2$ score of 0.74 across 21 health variables, and 0.80 across 6 demographic and environmental variables. Our results demonstrate that these search features can be used for spatial predictions without strict temporal alignment, and that the resulting models outperform spatial interpolation and state of the art methods using satellite imagery features.

Read More cs.LG