260319 arxiv 모음

Efficient Reasoning on the Edge

Large language models (LLMs) with chain-of-thought reasoning achieve state-of-the-art performance across complex problem-solving tasks, but their verbose reasoning traces and large context requirements make them impractical for edge deployment. These challenges include high token generation costs, large KV-cache footprints, and inefficiencies when distilling reasoning capabilities into smaller models for mobile devices. Existing approaches often rely on distilling reasoning traces from larger mo

출처: https://arxiv.org/abs/2603.16867v1

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval fo

원문

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval fo

Recent advances in Large Language Models (LLMs) have enabled conversational AI agents to engage in extended multi-turn interactions spanning weeks or months. However, existing memory systems struggle to reason over temporally grounded facts and preferences that evolve across months of interaction and lack effective retrieval strategies for multi-hop, time-sensitive queries over long dialogue histories. We introduce Chronos, a novel temporal-aware memory framework that decomposes raw dialogue int

출처: https://arxiv.org/abs/2603.16862v1

Mediocrity is the key for LLM as a Judge Anchor Selection

원문

Mediocrity is the key for LLM as a Judge Anchor Selection

The ``LLM-as-a-judge'' paradigm has become a standard method for evaluating open-ended generation. To address the quadratic scalability costs of pairwise comparisons, popular benchmarks like Arena-Hard and AlpacaEval compare all models against a single anchor. However, despite its widespread use, the impact of anchor selection on the reliability of the results remains largely unexplored. In this work, we systematically investigate the effect of anchor selection by evaluating 22 different anchors

출처: https://arxiv.org/abs/2603.16848v1

Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning

원문

Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning

Federated Learning (FL) is increasingly applied in sectors like healthcare, finance, and IoT, enabling collaborative model training while safeguarding user privacy. However, FL systems are susceptible to Byzantine adversaries that inject malicious updates, which can severely compromise global model performance. Existing defenses tend to focus on specific attack types and fail against untargeted strategies, such as multi-label flipping or combinations of noise and backdoor patterns. To overcome t

출처: https://arxiv.org/abs/2603.16846v1

Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

원문

Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

Automated presentation generation remains a challenging task requiring coherent content creation, visual design, and audience-aware communication. This work proposes an OpenEnv-compatible reinforcement learning environment where LLM agents learn to research topics, plan content, and generate professional HTML slide presentations through tool use. We introduce a multi-component reward system combining structural validation, render quality assessment, LLM-based aesthetic scoring, content quality m

출처: https://arxiv.org/abs/2603.16839v1

Prompt Programming for Cultural Bias and Alignment of Large Language Models

원문

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language models (LLMs) often exhibit cultural biases that misalign with target populations. As LLMs are increasingly used for strategic decision-making, policy support, and document engineering tasks such as summarization, categorization, and compliance-oriented auditing, improving cultural alignment is important for ensuring that downstream analyses and recommendations reflect target-population value prof

출처: https://arxiv.org/abs/2603.16827v1

Exploring Collatz Dynamics with Human-LLM Collaboration

원문

Exploring Collatz Dynamics with Human-LLM Collaboration

We develop a quantitative framework for the Collatz conjecture through a human-LLM collaboration, combining exact arithmetic structure, cycle-level probabilistic laws, and a conditional convergence reduction. The central quantitative result is the Per-Orbit Gain Rate theorem, which proves R <= 0.0893 < epsilon = 2 - log_2 3 ~= 0.415, leaving a safety margin of at least 4.65x. A robustness corollary shows that exact equidistribution is unnecessary: it suffices that sum_K delta_K < 0.557. This pro

출처: https://arxiv.org/abs/2603.11066v2

Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic

원문

Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic

Large language models (LLMs) frequently hallucinate, limiting their reliability in knowledge-intensive applications. Retrieval-augmented generation (RAG) and conformal factuality have emerged as potential ways to address this limitation. While RAG aims to ground responses in retrieved evidence, it provides no statistical guarantee that the final output is correct. Conformal factuality filtering offers distribution-free statistical reliability by scoring and filtering atomic claims using a thresh

출처: https://arxiv.org/abs/2603.16817v1

MetaCrit: A Critical Thinking Framework for Self-Regulated LLM Reasoning

원문

MetaCrit: A Critical Thinking Framework for Self-Regulated LLM Reasoning

Large language models (LLMs) fail on over one-third of multi-hop questions with counterfactual premises and remain vulnerable to adversarial prompts that trigger biased or factually incorrect responses, which exposes a fundamental deficit in self-regulated reasoning. We propose \textbf{MetaCrit}, a multi-agent framework grounded in Nelson and Narens' metacognitive regulation theory. MetaCrit decomposes reasoning regulation into four agents: object-level generation, a \emph{monitoring} agent that

출처: https://arxiv.org/abs/2507.15015v3

Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

원문

Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

Test-time adaptation enables large language models (LLMs) to modify their behavior at inference without updating model parameters. A common approach is many-shot prompting, where large numbers of in-context learning (ICL) examples are injected as an input-space test-time update. Although performance can improve as more demonstrations are added, the reliability and limits of this update mechanism remain poorly understood, particularly for open-source models. We present an empirical study of many-

출처: https://arxiv.org/abs/2603.05829v3

Internalizing Agency from Reflective Experience

원문

Internalizing Agency from Reflective Experience

Large language models are increasingly deployed as autonomous agents that must plan, act, and recover from mistakes through long-horizon interaction with environments that provide rich feedback. However, prevailing outcome-driven post-training methods (e.g., RL with verifiable rewards) primarily optimize final success signals, leaving rich environment feedback underutilized. Consequently, they often lead to distribution sharpening: the policy becomes better at reproducing a narrow set of already

출처: https://arxiv.org/abs/2603.16843v1

Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning

원문

Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning

Stochastic resetting, where a dynamical process is intermittently returned to a fixed reference state, has emerged as a powerful mechanism for optimizing first-passage properties. Existing theory largely treats static, non-learning processes. Here we ask how stochastic resetting interacts with reinforcement learning, where the underlying dynamics adapt through experience. In tabular grid environments, we find that resetting accelerates policy convergence even when it does not reduce the search t

출처: https://arxiv.org/abs/2603.16842v1

Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating

원문

Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating

Model Medicine is the science of understanding, diagnosing, treating, and preventing disorders in AI models, grounded in the principle that AI models -- like biological organisms -- have internal structures, dynamic processes, heritable traits, observable symptoms, classifiable conditions, and treatable states. This paper introduces Model Medicine as a research program, bridging the gap between current AI interpretability research (anatomical observation) and the systematic clinical practice tha

출처: https://arxiv.org/abs/2603.04722v2

SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

원문

SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

Robust task-oriented spoken dialogue agents require exposure to the full diversity of how people interact through speech. Building spoken user simulators that address this requires large-scale spoken task-oriented dialogue (TOD) data encompassing spoken user behaviors, yet existing datasets are limited in scale and domain coverage, with no systematic pipeline for augmenting them. To address this, we introduce \textbf{SpokenTOD}, a spoken TOD dataset of 52,390 dialogues and 1,034 hours of speech

출처: https://arxiv.org/abs/2603.16783v1

Anticipatory Planning for Multimodal AI Agents

원문

Anticipatory Planning for Multimodal AI Agents

Recent advances in multimodal agents have improved computer-use interaction and tool-usage, yet most existing systems remain reactive, optimizing actions in isolation without reasoning about future states or long-term goals. This limits planning coherence and prevents agents from reliably solving high-level, multi-step tasks. We introduce TraceR1, a two-stage reinforcement learning framework that explicitly trains anticipatory reasoning by forecasting short-horizon trajectories before execution.

출처: https://arxiv.org/abs/2603.16777v1

Ontological foundations for contrastive explanatory narration of robot plans

원문

Ontological foundations for contrastive explanatory narration of robot plans

Mutual understanding of artificial agents' decisions is key to ensuring a trustworthy and successful human-robot interaction. Hence, robots are expected to make reasonable decisions and communicate them to humans when needed. In this article, the focus is on an approach to modeling and reasoning about the comparison of two competing plans, so that robots can later explain the divergent result. First, a novel ontological model is proposed to formalize and reason about the differences between comp

출처: https://arxiv.org/abs/2509.22493v2

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

원문

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and a 7-step industrial control system attack-that require chaining heterogeneous capabilities across extended action sequences. By comparing seven models released over an eighteen-month period (August 2024 to February 2026) at varying inference-time compute budgets, we observe two capability trends. First, model performance scales log-linearly with infe

출처: https://arxiv.org/abs/2603.11214v3

Long-Horizon Traffic Forecasting via Incident-Aware Conformal Spatio-Temporal Tr

원문

Long-Horizon Traffic Forecasting via Incident-Aware Conformal Spatio-Temporal Tr

Reliable multi-horizon traffic forecasting is challenging because network conditions are stochastic, incident disruptions are intermittent, and effective spatial dependencies vary across time-of-day patterns. This study is conducted on the Ohio Department of Transportation (ODOT) traffic count data and corresponding ODOT crash records. This work utilizes a Spatio-Temporal Transformer (STT) model with Adaptive Conformal Prediction (ACP) to produce multi-horizon forecasts with calibrated uncertain

출처: https://arxiv.org/abs/2603.16857v1

Online Experiential Learning for Language Models

원문

Online Experiential Learning for Language Models

The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables language models to continuously improve from their own deployment experience. OEL operates in two stages: first, transferable experiential knowledge is extracted and accumulated from interaction traje

출처: https://arxiv.org/abs/2603.16856v1

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

원문

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

Adapting transformer positional encoding to meshes and graph-structured data presents significant computational challenges: exact spectral methods require cubic-complexity eigendecomposition and can inadvertently break gauge invariance through numerical solver artifacts, while efficient approximate methods sacrifice gauge symmetry by design. Both failure modes cause catastrophic generalization in inductive learning, where models trained with one set of numerical choices fail when encountering di

출처: https://arxiv.org/abs/2603.16849v1

Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testi

원문

Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testi

Beyond conditional average treatment effects, treatments may impact the entire outcome distribution in covariate-dependent ways, for example, by altering the variance or tail risks for specific subpopulations. We propose a novel estimand to capture such conditional distributional treatment effects, and develop a doubly robust estimator that is minimax optimal in the local asymptotic sense. Using this, we develop a test for the global homogeneity of conditional potential outcome distributions tha

출처: https://arxiv.org/abs/2603.16829v1

SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised

원문

SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised

No-Reference Image Quality Assessment (NR-IQA) aims to estimate perceptual quality without access to a reference image of pristine quality. Learning an NR-IQA model faces a fundamental bottleneck: its need for a large number of costly human perceptual labels. We propose SHAMISA, a non-contrastive self-supervised framework that learns from unlabeled distorted images by leveraging explicitly structured relational supervision. Unlike prior methods that impose rigid, binary similarity constraints, S

출처: https://arxiv.org/abs/2603.13669v2

Demystifing Video Reasoning

원문

Demystifing Video Reasoning

Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capabilities. Prior work attributes this to a Chain-of-Frames (CoF) mechanism, where reasoning is assumed to unfold sequentially across video frames. In this work, we challenge this assumption and uncover a fundamentally different mechanism. We show that reasoning in video models instead primarily emerges along the diffusion denoising steps. Through qualitative a

출처: https://arxiv.org/abs/2603.16870v1

ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K

원문

ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K

Learning in simulation provides a useful foundation for scaling robotic manipulation capabilities. However, this paradigm often suffers from a lack of data-generation-ready digital assets, in both scale and diversity. In this work, we present ManiTwin, an automated and efficient pipeline for generating data-generation-ready digital object twins. Our pipeline transforms a single image into simulation-ready and semantically annotated 3D asset, enabling large-scale robotic manipulation data generat

출처: https://arxiv.org/abs/2603.16866v1

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

원문

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

Video Super-Resolution (VSR) aims to restore high-quality video frames from low-resolution (LR) estimates, yet most existing VSR approaches behave like black boxes at inference time: users cannot reliably correct unexpected artifacts, but instead can only accept whatever the model produces. In this paper, we propose a novel interactive VSR framework dubbed SparkVSR that makes sparse keyframes a simple and expressive control signal. Specifically, users can first super-resolve or optionally a smal

출처: https://arxiv.org/abs/2603.16864v1

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

원문

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Omni-modal large language models (OLMs) redefine human-machine interaction by natively integrating audio, vision, and text. However, existing OLM benchmarks remain anchored to static, accuracy-centric tasks, leaving a critical gap in assessing social interactivity, the fundamental capacity to navigate dynamic cues in natural dialogues. To this end, we propose SocialOmni, a comprehensive benchmark that operationalizes the evaluation of this conversational interactivity across three core dimension

출처: https://arxiv.org/abs/2603.16859v1

SOMA: Unifying Parametric Human Body Models

원문

SOMA: Unifying Parametric Human Body Models

Parametric human body models are foundational to human reconstruction, animation, and simulation, yet they remain mutually incompatible: SMPL, SMPL-X, MHR, Anny, and related models each diverge in mesh topology, skeletal structure, shape parameterization, and unit convention, making it impractical to exploit their complementary strengths within a single pipeline. We present SOMA, a unified body layer that bridges these heterogeneous representations through three abstraction layers. Mesh topology

출처: https://arxiv.org/abs/2603.16858v1

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guid

원문

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guid

Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain Monte Carlo, were thought to suffer from sequential bottlenecks. Recent work showed that dynamical systems can in fact be parallelized across the sequence length by reframing their evaluation as a system of nonlinear equations, which can be solved with Newton's method using a parallel associative sc

출처: https://arxiv.org/abs/2603.16850v1

Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning a

원문

Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning a

Transit Network Design is a well-studied problem in the field of transportation, typically addressed by solving optimization models under fixed demand assumptions. Considering the limitations of these assumptions, this paper proposes a new framework, namely the Two-Level Rider Choice Transit Network Design (2LRC-TND), that leverages machine learning and contextual stochastic optimization (CSO) through constraint programming (CP) to incorporate two layers of demand uncertainties into the network

출처: https://arxiv.org/abs/2603.00010v2

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selectio

원문

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selectio

Standard transformer attention uses identical dimensionality for queries, keys, and values, yet these components serve different roles: queries and keys produce scalar attention weights (selection), while values carry rich representations (value transfer). We show that selection requires only $O(\log N)$ dimensions to distinguish among $N$ relevant token categories (e.g., syntactic roles, semantic clusters, positional patterns) -- far fewer than value transfer needs. We introduce factore

출처: https://arxiv.org/abs/2603.04427v2

MessyKitchens: Contact-rich object-level 3D scene reconstruction

원문

MessyKitchens: Contact-rich object-level 3D scene reconstruction

Monocular 3D scene reconstruction has recently seen significant progress. Powered by the modern neural architectures and large-scale data, recent methods achieve high performance in depth estimation from a single image. Meanwhile, reconstructing and decomposing common scenes into individual 3D objects remains a hard challenge due to the large variety of objects, frequent occlusions and complex object relations. Notably, beyond shape and pose estimation of individual objects, applications in robo

출처: https://arxiv.org/abs/2603.16868v1

260319 arxiv (31개)

260319 arxiv 모음

Efficient Reasoning on the Edge

Efficient Reasoning on the Edge

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval fo

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval fo

Mediocrity is the key for LLM as a Judge Anchor Selection

Mediocrity is the key for LLM as a Judge Anchor Selection

Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning

Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning

Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Exploring Collatz Dynamics with Human-LLM Collaboration

Exploring Collatz Dynamics with Human-LLM Collaboration

Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic

Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic

MetaCrit: A Critical Thinking Framework for Self-Regulated LLM Reasoning

MetaCrit: A Critical Thinking Framework for Self-Regulated LLM Reasoning

Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

Internalizing Agency from Reflective Experience

Internalizing Agency from Reflective Experience

Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning

Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning

Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating

Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating

SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

Anticipatory Planning for Multimodal AI Agents

Anticipatory Planning for Multimodal AI Agents

Ontological foundations for contrastive explanatory narration of robot plans

Ontological foundations for contrastive explanatory narration of robot plans

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

Long-Horizon Traffic Forecasting via Incident-Aware Conformal Spatio-Temporal Tr

Long-Horizon Traffic Forecasting via Incident-Aware Conformal Spatio-Temporal Tr

Online Experiential Learning for Language Models

Online Experiential Learning for Language Models

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testi

Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testi

SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised

SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised

Demystifing Video Reasoning

Demystifing Video Reasoning

ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K

ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

SOMA: Unifying Parametric Human Body Models

SOMA: Unifying Parametric Human Body Models

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guid

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guid

Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning a

Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning a

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selectio

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selectio

MessyKitchens: Contact-rich object-level 3D scene reconstruction

MessyKitchens: Contact-rich object-level 3D scene reconstruction

관련 노트