260317 arxiv 모음
A Generative Model of Conspicuous Consumption and Status Signaling
A Generative Model of Conspicuous Consumption and Status Signaling
Status signaling drives human behavior and the allocation of scarce resources such as mating opportunities, yet the generative mechanisms governing how specific goods, signals, or behaviors acquire prestige remain a puzzle. Classical frameworks, such as Costly Signaling Theory, treat preferences as fixed and struggle to explain how semiotic meaning changes based on context or drifts dynamically over time, occasionally reaching tipping points. In this work, we propose a computational theory of st
출처: https://arxiv.org/abs/2603.13220v1
Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM
Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM
Query expansion with large language models is promising but often relies on hand-crafted prompts, manually chosen exemplars, or a single LLM, making it non-scalable and sensitive to domain shift. We present an automated, domain-adaptive QE framework that builds in-domain exemplar pools by harvesting pseudo-relevant passages using a BM25-MonoT5 pipeline. A training-free cluster-based strategy selects diverse demonstrations, yielding strong and stable in-context QE without supervision. To further
출처: https://arxiv.org/abs/2602.08917v2
Neuron-Aware Data Selection In Instruction Tuning For Large Language Models
Neuron-Aware Data Selection In Instruction Tuning For Large Language Models
Instruction Tuning (IT) has been proven to be an effective approach to unlock the powerful capabilities of large language models (LLMs). Recent studies indicate that excessive IT data can degrade LLMs performance, while carefully selecting a small subset of high-quality IT data can significantly enhance their capabilities. Therefore, identifying the most efficient subset data from the IT dataset to effectively develop either specific or general abilities in LLMs has become a critical challenge.
출처: https://arxiv.org/abs/2603.13201v1
Superficial Safety Alignment Hypothesis
Superficial Safety Alignment Hypothesis
As large language models (LLMs) are overwhelmingly more and more integrated into various applications, ensuring they generate safe responses is a pressing need. Previous studies on alignment have largely focused on general instruction-following but have often overlooked the distinct properties of safety alignment, such as the brittleness of safety mechanisms. To bridge the gap, we propose the Superficial Safety Alignment Hypothesis (SSAH), which posits that safety alignment teaches an otherwise
출처: https://arxiv.org/abs/2410.10862v3
Large language models show fragile cognitive reasoning about human emotions
Large language models show fragile cognitive reasoning about human emotions
Affective computing seeks to support the holistic development of artificial intelligence by enabling machines to engage with human emotion. Recent foundation models, particularly large language models (LLMs), have been trained and evaluated on emotion-related tasks, typically using supervised learning with discrete emotion labels. Such evaluations largely focus on surface phenomena, such as recognizing expressed or evoked emotions, leaving open whether these systems reason about emotion in cogni
출처: https://arxiv.org/abs/2508.05880v2
From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven
From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven
While large language models (LLMs) have transformed AI agents into proficient executors of computational materials science, performing a hundred simulations does not make a researcher. What distinguishes research from routine execution is the progressive accumulation of knowledge -- learning which approaches fail, recognizing patterns across systems, and applying understanding to new problems. However, the prevailing paradigm in AI-driven computational science treats each execution in isolation,
출처: https://arxiv.org/abs/2603.13191v1
LLM Constitutional Multi-Agent Governance
LLM Constitutional Multi-Agent Governance
Large Language Models (LLMs) can generate persuasive influence strategies that shift cooperative behavior in multi-agent populations, but a critical question remains: does the resulting cooperation reflect genuine prosocial alignment, or does it mask erosion of agent autonomy, epistemic integrity, and distributional fairness? We introduce Constitutional Multi-Agent Governance (CMAG), a two-stage framework that interposes between an LLM policy compiler and a networked agent population, combining
출처: https://arxiv.org/abs/2603.13189v1
Semantic Invariance in Agentic AI
Semantic Invariance in Agentic AI
Large Language Models (LLMs) increasingly serve as autonomous reasoning agents in decision support, scientific problem-solving, and multi-agent coordination systems. However, deploying LLM agents in consequential applications requires assurance that their reasoning remains stable under semantically equivalent input variations, a property we term semantic invariance.Standard benchmark evaluations, which assess accuracy on fixed, canonical problem formulations, fail to capture this critical reliab
출처: https://arxiv.org/abs/2603.13173v1
Developing and evaluating a chatbot to support maternal health care
Developing and evaluating a chatbot to support maternal health care
The ability to provide trustworthy maternal health information using phone-based chatbots can have a significant impact, particularly in low-resource settings where users have low health literacy and limited access to care. However, deploying such systems is technically challenging: user queries are short, underspecified, and code-mixed across languages, answers require regional context-specific grounding, and partial or missing symptom context makes safe routing decisions difficult. We presen
출처: https://arxiv.org/abs/2603.13168v1
ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation
ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation
As corporate responsibility increasingly incorporates environmental, social, and governance (ESG) criteria, ESG reporting is becoming a legal requirement in many regions and a key channel for documenting sustainability practices and assessing firms' long-term and ethical performance. However, the length and complexity of ESG disclosures make them difficult to interpret and automate the analysis reliably. To support scalable and trustworthy analysis, this paper introduces ESG-Bench, a benchmark d
출처: https://arxiv.org/abs/2603.13154v1
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation
The pursuit of robot generalists, agents capable of performing diverse tasks across diverse environments, demands rigorous and scalable evaluation. Yet real-world testing of robot policies remains fundamentally constrained: it is labor-intensive, slow, unsafe at scale, and difficult to reproduce. As policies expand in scope and complexity, these barriers only intensify, since defining "success" in robotics often hinges on nuanced human judgments of execution quality. We introduce RobotArena Infi
출처: https://arxiv.org/abs/2510.23571v2
Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis an
Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis an
Open-world embodied agents must solve long-horizon tasks where the main bottleneck is not single-step planning quality but how interaction experience is organized and evolved. To this end, we present Steve-Evolving, a non-parametric self-evolving framework that tightly couples fine-grained execution diagnosis with dual-track knowledge distillation in a closed loop. The method follows three phases: Experience Anchoring, Experience Distillation, and Knowledge-Driven Closed-Loop Control. In detail,
출처: https://arxiv.org/abs/2603.13131v1
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Mon
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Mon
Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these ap
출처: https://arxiv.org/abs/2602.23163v2
Dynamic Aware: Adaptive Multi-Mode Out-of-Distribution Detection for Trajectory
Dynamic Aware: Adaptive Multi-Mode Out-of-Distribution Detection for Trajectory
Trajectory prediction is central to the safe and seamless operation of autonomous vehicles (AVs). In deployment, however, prediction models inevitably face distribution shifts between training data and real-world conditions, where rare or underrepresented traffic scenarios induce out-of-distribution (OOD) cases. While most prior OOD detection research in AVs has concentrated on computer vision tasks such as object detection and segmentation, trajectory-level OOD detection remains largely underex
출처: https://arxiv.org/abs/2509.13577v2
RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline
RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline
If we cannot inspect the training data of a large language model (LLM), how can we ever know what it has seen? We believe the most compelling evidence arises when the model itself freely reproduces the target content. As such, we propose RECAP, an agentic pipeline designed to elicit and verify memorized training data from LLM outputs. At the heart of RECAP is a feedback-driven loop, where an initial extraction attempt is evaluated by a secondary language model, which compares the output against
출처: https://arxiv.org/abs/2510.25941v3
DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework f
DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework f
End-to-end autonomous driving systems map sensor data directly to control commands, but remain opaque, lack interpretability, and offer no formal safety guarantees. While recent vision-language-guided reinforcement learning (RL) methods introduce semantic feedback, they often rely on static prompts and fixed objectives, limiting adaptability to dynamic driving scenes. We present DriveMind, a unified semantic reward framework that integrates: (i) a contrastive Vision-Language Model (VLM) encoder
출처: https://arxiv.org/abs/2506.00819v2
Representation Learning for Spatiotemporal Physical Systems
Representation Learning for Spatiotemporal Physical Systems
Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate emulator for the system's evolution in time. However, these emulators are computationally expensive to train and are subject to performance pitfalls, such as compounding errors during autoregressive rollout. In this work, we take a different perspective and look at scientific tasks further downstream of predicting the next frame, such as estimation
출처: https://arxiv.org/abs/2603.13227v1
Visual-ERM: Reward Modeling for Visual Equivalence
Visual-ERM: Reward Modeling for Visual Equivalence
Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representations with high visual fidelity. While recent Large Vision Language Models (LVLMs) achieve strong results via supervised fine-tuning, reinforcement learning remains challenging due to misaligned reward signals. Existing rewards either rely on textual rules or coarse visual embedding similarity, both of which fail to capture fine-grained visual dis
출처: https://arxiv.org/abs/2603.13224v1
Neural-Quantum-States Impurity Solver for Quantum Embedding Problems
Neural-Quantum-States Impurity Solver for Quantum Embedding Problems
Neural quantum states (NQS) have emerged as a promising approach to solve second-quantized Hamiltonians, because of their scalability and flexibility. In this work, we design and benchmark an NQS impurity solver for the quantum embedding (QE) methods, focusing on the ghost Gutzwiller Approximation (gGA) framework. We introduce a graph transformer-based NQS framework able to represent arbitrarily connected impurity orbitals of the embedding Hamiltonian (EH) and develop an error control mechanism
출처: https://arxiv.org/abs/2509.12431v2
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on this progress, recent methods attempt to transfer such models for character animation and real robot control by applying a Whole-Body Controller (WBC) that converts diffusion-generated motions into executable trajectories. While WBC trajectories become compliant with physics, they may expose substantial deviations from original motion. To a
출처: https://arxiv.org/abs/2603.13228v1
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their
출처: https://arxiv.org/abs/2603.13186v1
MXNorm: Reusing MXFP block scales for efficient tensor normalisation
MXNorm: Reusing MXFP block scales for efficient tensor normalisation
Matrix multiplication performance has long been the major bottleneck to scaling deep learning workloads, which has stimulated the design of new accelerators that use increasingly low-precision number formats. However, improvements in matrix multiplication performance have far outstripped improvements in performance on reductions and elementwise computations, which are still being performed in higher precision. In this work, we propose MXNorm, a drop-in replacement for RMSNorm that estimates the
출처: https://arxiv.org/abs/2603.13180v1
Clustering Astronomical Orbital Synthetic Data Using Advanced Feature Extraction
Clustering Astronomical Orbital Synthetic Data Using Advanced Feature Extraction
The dynamics of Saturn's satellite system offer a rich framework for studying orbital stability and resonance interactions. Traditional methods for analysing such systems, including Fourier analysis and stability metrics, struggle with the scale and complexity of modern datasets. This study introduces a machine learning-based pipeline for clustering approximately 22,300 simulated satellite orbits, addressing these challenges with advanced feature extraction and dimensionality reduction technique
출처: https://arxiv.org/abs/2603.13177v1
관련 노트
- [[260324_arxiv]]
- [[260314_arxiv]] — 키워드 유사
- [[260318_arxiv]] — 키워드 유사
- [[260316_x]] — 키워드 유사