260317 arxiv 모음

A Generative Model of Conspicuous Consumption and Status Signaling

Status signaling drives human behavior and the allocation of scarce resources such as mating opportunities, yet the generative mechanisms governing how specific goods, signals, or behaviors acquire prestige remain a puzzle. Classical frameworks, such as Costly Signaling Theory, treat preferences as fixed and struggle to explain how semiotic meaning changes based on context or drifts dynamically over time, occasionally reaching tipping points. In this work, we propose a computational theory of st

출처: https://arxiv.org/abs/2603.13220v1

Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM

원문

Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM

Query expansion with large language models is promising but often relies on hand-crafted prompts, manually chosen exemplars, or a single LLM, making it non-scalable and sensitive to domain shift. We present an automated, domain-adaptive QE framework that builds in-domain exemplar pools by harvesting pseudo-relevant passages using a BM25-MonoT5 pipeline. A training-free cluster-based strategy selects diverse demonstrations, yielding strong and stable in-context QE without supervision. To further

출처: https://arxiv.org/abs/2602.08917v2

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models

원문

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models

Instruction Tuning (IT) has been proven to be an effective approach to unlock the powerful capabilities of large language models (LLMs). Recent studies indicate that excessive IT data can degrade LLMs performance, while carefully selecting a small subset of high-quality IT data can significantly enhance their capabilities. Therefore, identifying the most efficient subset data from the IT dataset to effectively develop either specific or general abilities in LLMs has become a critical challenge.

출처: https://arxiv.org/abs/2603.13201v1

Superficial Safety Alignment Hypothesis

원문

Superficial Safety Alignment Hypothesis

As large language models (LLMs) are overwhelmingly more and more integrated into various applications, ensuring they generate safe responses is a pressing need. Previous studies on alignment have largely focused on general instruction-following but have often overlooked the distinct properties of safety alignment, such as the brittleness of safety mechanisms. To bridge the gap, we propose the Superficial Safety Alignment Hypothesis (SSAH), which posits that safety alignment teaches an otherwise

출처: https://arxiv.org/abs/2410.10862v3

Large language models show fragile cognitive reasoning about human emotions

원문

Large language models show fragile cognitive reasoning about human emotions

Affective computing seeks to support the holistic development of artificial intelligence by enabling machines to engage with human emotion. Recent foundation models, particularly large language models (LLMs), have been trained and evaluated on emotion-related tasks, typically using supervised learning with discrete emotion labels. Such evaluations largely focus on surface phenomena, such as recognizing expressed or evoked emotions, leaving open whether these systems reason about emotion in cogni

출처: https://arxiv.org/abs/2508.05880v2

From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven

원문

From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven

While large language models (LLMs) have transformed AI agents into proficient executors of computational materials science, performing a hundred simulations does not make a researcher. What distinguishes research from routine execution is the progressive accumulation of knowledge -- learning which approaches fail, recognizing patterns across systems, and applying understanding to new problems. However, the prevailing paradigm in AI-driven computational science treats each execution in isolation,

출처: https://arxiv.org/abs/2603.13191v1

LLM Constitutional Multi-Agent Governance

원문

LLM Constitutional Multi-Agent Governance

Large Language Models (LLMs) can generate persuasive influence strategies that shift cooperative behavior in multi-agent populations, but a critical question remains: does the resulting cooperation reflect genuine prosocial alignment, or does it mask erosion of agent autonomy, epistemic integrity, and distributional fairness? We introduce Constitutional Multi-Agent Governance (CMAG), a two-stage framework that interposes between an LLM policy compiler and a networked agent population, combining

출처: https://arxiv.org/abs/2603.13189v1

Semantic Invariance in Agentic AI

원문

Semantic Invariance in Agentic AI

Large Language Models (LLMs) increasingly serve as autonomous reasoning agents in decision support, scientific problem-solving, and multi-agent coordination systems. However, deploying LLM agents in consequential applications requires assurance that their reasoning remains stable under semantically equivalent input variations, a property we term semantic invariance.Standard benchmark evaluations, which assess accuracy on fixed, canonical problem formulations, fail to capture this critical reliab

출처: https://arxiv.org/abs/2603.13173v1

Developing and evaluating a chatbot to support maternal health care

원문

Developing and evaluating a chatbot to support maternal health care

The ability to provide trustworthy maternal health information using phone-based chatbots can have a significant impact, particularly in low-resource settings where users have low health literacy and limited access to care. However, deploying such systems is technically challenging: user queries are short, underspecified, and code-mixed across languages, answers require regional context-specific grounding, and partial or missing symptom context makes safe routing decisions difficult. We presen

출처: https://arxiv.org/abs/2603.13168v1

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation

원문

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation

As corporate responsibility increasingly incorporates environmental, social, and governance (ESG) criteria, ESG reporting is becoming a legal requirement in many regions and a key channel for documenting sustainability practices and assessing firms' long-term and ethical performance. However, the length and complexity of ESG disclosures make them difficult to interpret and automate the analysis reliably. To support scalable and trustworthy analysis, this paper introduces ESG-Bench, a benchmark d

출처: https://arxiv.org/abs/2603.13154v1

RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation

원문

RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation

The pursuit of robot generalists, agents capable of performing diverse tasks across diverse environments, demands rigorous and scalable evaluation. Yet real-world testing of robot policies remains fundamentally constrained: it is labor-intensive, slow, unsafe at scale, and difficult to reproduce. As policies expand in scope and complexity, these barriers only intensify, since defining "success" in robotics often hinges on nuanced human judgments of execution quality. We introduce RobotArena Infi

출처: https://arxiv.org/abs/2510.23571v2

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis an

원문

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis an

Open-world embodied agents must solve long-horizon tasks where the main bottleneck is not single-step planning quality but how interaction experience is organized and evolved. To this end, we present Steve-Evolving, a non-parametric self-evolving framework that tightly couples fine-grained execution diagnosis with dual-track knowledge distillation in a closed loop. The method follows three phases: Experience Anchoring, Experience Distillation, and Knowledge-Driven Closed-Loop Control. In detail,

출처: https://arxiv.org/abs/2603.13131v1

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Mon

원문

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Mon

Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these ap

출처: https://arxiv.org/abs/2602.23163v2

Dynamic Aware: Adaptive Multi-Mode Out-of-Distribution Detection for Trajectory

원문

Dynamic Aware: Adaptive Multi-Mode Out-of-Distribution Detection for Trajectory

Trajectory prediction is central to the safe and seamless operation of autonomous vehicles (AVs). In deployment, however, prediction models inevitably face distribution shifts between training data and real-world conditions, where rare or underrepresented traffic scenarios induce out-of-distribution (OOD) cases. While most prior OOD detection research in AVs has concentrated on computer vision tasks such as object detection and segmentation, trajectory-level OOD detection remains largely underex

출처: https://arxiv.org/abs/2509.13577v2

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

원문

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

If we cannot inspect the training data of a large language model (LLM), how can we ever know what it has seen? We believe the most compelling evidence arises when the model itself freely reproduces the target content. As such, we propose RECAP, an agentic pipeline designed to elicit and verify memorized training data from LLM outputs. At the heart of RECAP is a feedback-driven loop, where an initial extraction attempt is evaluated by a secondary language model, which compares the output against

출처: https://arxiv.org/abs/2510.25941v3

DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework f

원문

DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework f

End-to-end autonomous driving systems map sensor data directly to control commands, but remain opaque, lack interpretability, and offer no formal safety guarantees. While recent vision-language-guided reinforcement learning (RL) methods introduce semantic feedback, they often rely on static prompts and fixed objectives, limiting adaptability to dynamic driving scenes. We present DriveMind, a unified semantic reward framework that integrates: (i) a contrastive Vision-Language Model (VLM) encoder

출처: https://arxiv.org/abs/2506.00819v2

Representation Learning for Spatiotemporal Physical Systems

원문

Representation Learning for Spatiotemporal Physical Systems

Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate emulator for the system's evolution in time. However, these emulators are computationally expensive to train and are subject to performance pitfalls, such as compounding errors during autoregressive rollout. In this work, we take a different perspective and look at scientific tasks further downstream of predicting the next frame, such as estimation

출처: https://arxiv.org/abs/2603.13227v1

Visual-ERM: Reward Modeling for Visual Equivalence

원문

Visual-ERM: Reward Modeling for Visual Equivalence

Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representations with high visual fidelity. While recent Large Vision Language Models (LVLMs) achieve strong results via supervised fine-tuning, reinforcement learning remains challenging due to misaligned reward signals. Existing rewards either rely on textual rules or coarse visual embedding similarity, both of which fail to capture fine-grained visual dis

출처: https://arxiv.org/abs/2603.13224v1

Neural-Quantum-States Impurity Solver for Quantum Embedding Problems

원문

Neural-Quantum-States Impurity Solver for Quantum Embedding Problems

Neural quantum states (NQS) have emerged as a promising approach to solve second-quantized Hamiltonians, because of their scalability and flexibility. In this work, we design and benchmark an NQS impurity solver for the quantum embedding (QE) methods, focusing on the ghost Gutzwiller Approximation (gGA) framework. We introduce a graph transformer-based NQS framework able to represent arbitrarily connected impurity orbitals of the embedding Hamiltonian (EH) and develop an error control mechanism

출처: https://arxiv.org/abs/2509.12431v2

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

원문

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on this progress, recent methods attempt to transfer such models for character animation and real robot control by applying a Whole-Body Controller (WBC) that converts diffusion-generated motions into executable trajectories. While WBC trajectories become compliant with physics, they may expose substantial deviations from original motion. To a

출처: https://arxiv.org/abs/2603.13228v1

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

원문

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their

출처: https://arxiv.org/abs/2603.13186v1

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

원문

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

Matrix multiplication performance has long been the major bottleneck to scaling deep learning workloads, which has stimulated the design of new accelerators that use increasingly low-precision number formats. However, improvements in matrix multiplication performance have far outstripped improvements in performance on reductions and elementwise computations, which are still being performed in higher precision. In this work, we propose MXNorm, a drop-in replacement for RMSNorm that estimates the

출처: https://arxiv.org/abs/2603.13180v1

Clustering Astronomical Orbital Synthetic Data Using Advanced Feature Extraction

원문

Clustering Astronomical Orbital Synthetic Data Using Advanced Feature Extraction

The dynamics of Saturn's satellite system offer a rich framework for studying orbital stability and resonance interactions. Traditional methods for analysing such systems, including Fourier analysis and stability metrics, struggle with the scale and complexity of modern datasets. This study introduces a machine learning-based pipeline for clustering approximately 22,300 simulated satellite orbits, addressing these challenges with advanced feature extraction and dimensionality reduction technique

출처: https://arxiv.org/abs/2603.13177v1

260317 arxiv (23개)

260317 arxiv 모음

A Generative Model of Conspicuous Consumption and Status Signaling

A Generative Model of Conspicuous Consumption and Status Signaling

Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM

Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models

Superficial Safety Alignment Hypothesis

Superficial Safety Alignment Hypothesis

Large language models show fragile cognitive reasoning about human emotions

Large language models show fragile cognitive reasoning about human emotions

From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven

From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven

LLM Constitutional Multi-Agent Governance

LLM Constitutional Multi-Agent Governance

Semantic Invariance in Agentic AI

Semantic Invariance in Agentic AI

Developing and evaluating a chatbot to support maternal health care

Developing and evaluating a chatbot to support maternal health care

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation

RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation

RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis an

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis an

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Mon

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Mon

Dynamic Aware: Adaptive Multi-Mode Out-of-Distribution Detection for Trajectory

Dynamic Aware: Adaptive Multi-Mode Out-of-Distribution Detection for Trajectory

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework f

DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework f

Representation Learning for Spatiotemporal Physical Systems

Representation Learning for Spatiotemporal Physical Systems

Visual-ERM: Reward Modeling for Visual Equivalence

Visual-ERM: Reward Modeling for Visual Equivalence

Neural-Quantum-States Impurity Solver for Quantum Embedding Problems

Neural-Quantum-States Impurity Solver for Quantum Embedding Problems

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

Clustering Astronomical Orbital Synthetic Data Using Advanced Feature Extraction

Clustering Astronomical Orbital Synthetic Data Using Advanced Feature Extraction

관련 노트