virtual-insanity
← 뒤로

260321 arxiv (28개)

evergreen aggregate 2026-03-21

260321 arxiv 모음

FinTradeBench: A Financial Reasoning Benchmark for LLMs

원문

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data a

출처: https://arxiv.org/abs/2603.19225v1


F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor

원문

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques,

출처: https://arxiv.org/abs/2603.19223v1


Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic

원문

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP

출처: https://arxiv.org/abs/2603.19220v1


DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L

원문

DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L

Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, p

출처: https://arxiv.org/abs/2603.19216v1


Enhancing Lexicon-Based Text Embeddings with Large Language Models

원문

Enhancing Lexicon-Based Text Embeddings with Large Language Models

Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging LLMs that achieve competitive performance on these tasks. LENS consolidates the vocabulary space through token embedding clustering to handle the issue of token redundancy in LLM vocabularies. To further improve performance, we investigate bidirectional atten

출처: https://arxiv.org/abs/2501.09749v2


How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic

원문

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic

Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over

출처: https://arxiv.org/abs/2603.19195v1


OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

원문

OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and emp

출처: https://arxiv.org/abs/2603.19191v1


iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

원문

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model th

출처: https://arxiv.org/abs/2511.08905v3


This looks like what? Challenges and Future Research Directions for Part-Prototy

원문

This looks like what? Challenges and Future Research Directions for Part-Prototy

The growing interest in eXplainable Artificial Intelligence (XAI) has stimulated research on models with built-in interpretability, among which part-prototype models are particularly prominent. Part-Prototype Models (PPMs) classify inputs by comparing them to learned prototypes and provide human-understandable explanations of the form "this looks like that". Despite this intrinsic interpretability, PPMs have not yet emerged as a competitive alternative to post-hoc explanation methods. This surve

출처: https://arxiv.org/abs/2502.09340v2


Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

원문

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture tha

출처: https://arxiv.org/abs/2603.19182v1


NavTrust: Benchmarking Trustworthiness for Embodied Navigation

원문

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input mo

출처: https://arxiv.org/abs/2603.19229v1


DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an

원문

DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and

출처: https://arxiv.org/abs/2603.19219v1


WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste

원문

WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste

Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world th

출처: https://arxiv.org/abs/2603.11132v2


SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha

원문

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting [[NVIDIA]] Blackwell GPUs. The benchmark covers forward

출처: https://arxiv.org/abs/2603.19173v1


DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici

원문

DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici

Despite the computational efficiency of MoE models, the excessive memory footprint and I/O overhead inherent in multi-expert architectures pose formidable challenges for real-time inference on resource-constrained edge platforms. While existing static methods struggle with a rigid latency-accuracy trade-off, we observe that expert importance is highly skewed and depth-dependent. Motivated by these insights, we propose DyMoE, a dynamic mixed-precision quantization framework designed for high-perf

출처: https://arxiv.org/abs/2603.19172v1


ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio

원문

ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio

Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically [[Coherent]] stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning t

출처: https://arxiv.org/abs/2603.19169v1


Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua

원문

Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua

Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th

출처: https://arxiv.org/abs/2603.19166v1


2-D Directed Formation Control Based on Bipolar Coordinates

원문

2-D Directed Formation Control Based on Bipolar Coordinates

This work proposes a novel 2-D formation control scheme for acyclic triangulated directed graphs (a class of minimally acyclic persistent graphs) based on bipolar coordinates with (almost) global convergence to the desired shape. Prescribed performance control is employed to devise a decentralized control law that avoids singularities and introduces robustness against external disturbances while ensuring predefined transient and steady-state performance for the closed-loop system. Furthermore, i

출처: https://arxiv.org/abs/2108.00916v4


Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal

원문

Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal

Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by lea

출처: https://arxiv.org/abs/2603.19186v1


Spectrally-Guided Diffusion Noise Schedules

원문

Spectrally-Guided Diffusion Noise Schedules

Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper

출처: https://arxiv.org/abs/2603.19222v1


Online Learning and Equilibrium Computation with Ranking Feedback

원문

Online Learning and Equilibrium Computation with Ranking Feedback

Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environment, which may be unavailable in human-in-the-loop applications and/or may be restricted by privacy concerns. In this paper, we study an online learning model in which the learner only observes a \emph{

출처: https://arxiv.org/abs/2603.19221v1


$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E

원문

$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E

Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also ha

출처: https://arxiv.org/abs/2603.19215v1


Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode

원문

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode

Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a strong alternative. We systematically evaluate SSM vision backbones for VLMs in a controlled setting. Under matched ImageNet-1K initialization, the SSM backbone achieves the strongest overall performanc

출처: https://arxiv.org/abs/2603.19209v1


The Exponentially Weighted Signature

원문

The Exponentially Weighted Signature

The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together

출처: https://arxiv.org/abs/2603.19198v1


Score Reversal Is Not Free for Quantum Diffusion Models

원문

Score Reversal Is Not Free for Quantum Diffusion Models

Classical reverse diffusion is generated by changing the drift at fixed noise. We show that the quantum version of this principle obeys an exact law with a sharp phase boundary. For Gaussian pure-loss dynamics, the canonical model of continuous-variable decoherence, we prove that the unrestricted instantaneous reverse optimum exhibits a noiseless-to-noisy transition: below a critical squeezing-to-thermal ratio, reversal can be noiseless; above it, complete positivity forces irreducible reverse n

출처: https://arxiv.org/abs/2603.06488v3


The Convergence Frontier: Integrating Machine Learning and High Performance Quan

원문

The Convergence Frontier: Integrating Machine Learning and High Performance Quan

Integrating quantum mechanics into drug discovery marks a decisive shift from empirical trial-and-error toward quantitative precision. However, the prohibitive cost of ab initio molecular dynamics has historically forced a compromise between chemical accuracy and computational scalability. This paper identifies the convergence of High-Performance Computing (HPC), Machine Learning (ML), and Quantum Computing (QC) as the definitive solution to this bottleneck. While ML foundation models, such as F

출처: https://arxiv.org/abs/2603.17790v2


Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

원문

Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration [[INDEX]] (RCI). On the UCI Phishing Web

출처: https://arxiv.org/abs/2603.19204v1


Verifiable Semantics for Agent-to-Agent Communication

원문

Verifiable Semantics for Agent-to-Agent Communication

Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents

출처: https://arxiv.org/abs/2602.16424v2


FinTradeBench: A Financial Reasoning Benchmark for LLMs

원문

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data a

출처: https://arxiv.org/abs/2603.19225v1


F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor

원문

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques,

출처: https://arxiv.org/abs/2603.19223v1


Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic

원문

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP

출처: https://arxiv.org/abs/2603.19220v1


DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L

원문

DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L

Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, p

출처: https://arxiv.org/abs/2603.19216v1


Enhancing Lexicon-Based Text Embeddings with Large Language Models

원문

Enhancing Lexicon-Based Text Embeddings with Large Language Models

Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging LLMs that achieve competitive performance on these tasks. LENS consolidates the vocabulary space through token embedding clustering to handle the issue of token redundancy in LLM vocabularies. To further improve performance, we investigate bidirectional atten

출처: https://arxiv.org/abs/2501.09749v2


How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic

원문

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic

Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over

출처: https://arxiv.org/abs/2603.19195v1


OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

원문

OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and emp

출처: https://arxiv.org/abs/2603.19191v1


iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

원문

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model th

출처: https://arxiv.org/abs/2511.08905v3


This looks like what? Challenges and Future Research Directions for Part-Prototy

원문

This looks like what? Challenges and Future Research Directions for Part-Prototy

The growing interest in eXplainable Artificial Intelligence (XAI) has stimulated research on models with built-in interpretability, among which part-prototype models are particularly prominent. Part-Prototype Models (PPMs) classify inputs by comparing them to learned prototypes and provide human-understandable explanations of the form "this looks like that". Despite this intrinsic interpretability, PPMs have not yet emerged as a competitive alternative to post-hoc explanation methods. This surve

출처: https://arxiv.org/abs/2502.09340v2


Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

원문

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture tha

출처: https://arxiv.org/abs/2603.19182v1


NavTrust: Benchmarking Trustworthiness for Embodied Navigation

원문

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input mo

출처: https://arxiv.org/abs/2603.19229v1


DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an

원문

DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and

출처: https://arxiv.org/abs/2603.19219v1


WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste

원문

WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste

Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world th

출처: https://arxiv.org/abs/2603.11132v2


SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha

원문

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward

출처: https://arxiv.org/abs/2603.19173v1


DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici

원문

DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici

Despite the computational efficiency of MoE models, the excessive memory footprint and I/O overhead inherent in multi-expert architectures pose formidable challenges for real-time inference on resource-constrained edge platforms. While existing static methods struggle with a rigid latency-accuracy trade-off, we observe that expert importance is highly skewed and depth-dependent. Motivated by these insights, we propose DyMoE, a dynamic mixed-precision quantization framework designed for high-perf

출처: https://arxiv.org/abs/2603.19172v1


ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio

원문

ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio

Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning t

출처: https://arxiv.org/abs/2603.19169v1


Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua

원문

Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua

Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th

출처: https://arxiv.org/abs/2603.19166v1


2-D Directed Formation Control Based on Bipolar Coordinates

원문

2-D Directed Formation Control Based on Bipolar Coordinates

This work proposes a novel 2-D formation control scheme for acyclic triangulated directed graphs (a class of minimally acyclic persistent graphs) based on bipolar coordinates with (almost) global convergence to the desired shape. Prescribed performance control is employed to devise a decentralized control law that avoids singularities and introduces robustness against external disturbances while ensuring predefined transient and steady-state performance for the closed-loop system. Furthermore, i

출처: https://arxiv.org/abs/2108.00916v4


Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal

원문

Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal

Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by lea

출처: https://arxiv.org/abs/2603.19186v1


Spectrally-Guided Diffusion Noise Schedules

원문

Spectrally-Guided Diffusion Noise Schedules

Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper

출처: https://arxiv.org/abs/2603.19222v1


Online Learning and Equilibrium Computation with Ranking Feedback

원문

Online Learning and Equilibrium Computation with Ranking Feedback

Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environment, which may be unavailable in human-in-the-loop applications and/or may be restricted by privacy concerns. In this paper, we study an online learning model in which the learner only observes a \emph{

출처: https://arxiv.org/abs/2603.19221v1


$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E

원문

$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E

Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also ha

출처: https://arxiv.org/abs/2603.19215v1


Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode

원문

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode

Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a strong alternative. We systematically evaluate SSM vision backbones for VLMs in a controlled setting. Under matched ImageNet-1K initialization, the SSM backbone achieves the strongest overall performanc

출처: https://arxiv.org/abs/2603.19209v1


The Exponentially Weighted Signature

원문

The Exponentially Weighted Signature

The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together

출처: https://arxiv.org/abs/2603.19198v1


Score Reversal Is Not Free for Quantum Diffusion Models

원문

Score Reversal Is Not Free for Quantum Diffusion Models

Classical reverse diffusion is generated by changing the drift at fixed noise. We show that the quantum version of this principle obeys an exact law with a sharp phase boundary. For Gaussian pure-loss dynamics, the canonical model of continuous-variable decoherence, we prove that the unrestricted instantaneous reverse optimum exhibits a noiseless-to-noisy transition: below a critical squeezing-to-thermal ratio, reversal can be noiseless; above it, complete positivity forces irreducible reverse n

출처: https://arxiv.org/abs/2603.06488v3


The Convergence Frontier: Integrating Machine Learning and High Performance Quan

원문

The Convergence Frontier: Integrating Machine Learning and High Performance Quan

Integrating quantum mechanics into drug discovery marks a decisive shift from empirical trial-and-error toward quantitative precision. However, the prohibitive cost of ab initio molecular dynamics has historically forced a compromise between chemical accuracy and computational scalability. This paper identifies the convergence of High-Performance Computing (HPC), Machine Learning (ML), and Quantum Computing (QC) as the definitive solution to this bottleneck. While ML foundation models, such as F

출처: https://arxiv.org/abs/2603.17790v2


Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

원문

Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration index (RCI). On the UCI Phishing Web

출처: https://arxiv.org/abs/2603.19204v1


Verifiable Semantics for Agent-to-Agent Communication

원문

Verifiable Semantics for Agent-to-Agent Communication

Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents

출처: https://arxiv.org/abs/2602.16424v2


관련 노트

  • [[Coherent]]
  • [[NVIDIA]]
  • [[INDEX]]
  • [[260314_arxiv]] — 키워드 유사
  • [[260319_arxiv]] — 키워드 유사
  • [[260318_arxiv]] — 키워드 유사
  • [[260317_arxiv]] — 키워드 유사
  • [[260316_x]] — 키워드 유사

FinTradeBench: A Financial Reasoning Benchmark for LLMs

원문

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data a

출처: https://arxiv.org/abs/2603.19225v1


F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor

원문

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques,

출처: https://arxiv.org/abs/2603.19223v1


Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic

원문

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP

출처: https://arxiv.org/abs/2603.19220v1


DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L

원문

DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L

Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, p

출처: https://arxiv.org/abs/2603.19216v1


Enhancing Lexicon-Based Text Embeddings with Large Language Models

원문

Enhancing Lexicon-Based Text Embeddings with Large Language Models

Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging LLMs that achieve competitive performance on these tasks. LENS consolidates the vocabulary space through token embedding clustering to handle the issue of token redundancy in LLM vocabularies. To further improve performance, we investigate bidirectional atten

출처: https://arxiv.org/abs/2501.09749v2


How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic

원문

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic

Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over

출처: https://arxiv.org/abs/2603.19195v1


OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

원문

OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and emp

출처: https://arxiv.org/abs/2603.19191v1


iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

원문

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model th

출처: https://arxiv.org/abs/2511.08905v3


This looks like what? Challenges and Future Research Directions for Part-Prototy

원문

This looks like what? Challenges and Future Research Directions for Part-Prototy

The growing interest in eXplainable Artificial Intelligence (XAI) has stimulated research on models with built-in interpretability, among which part-prototype models are particularly prominent. Part-Prototype Models (PPMs) classify inputs by comparing them to learned prototypes and provide human-understandable explanations of the form "this looks like that". Despite this intrinsic interpretability, PPMs have not yet emerged as a competitive alternative to post-hoc explanation methods. This surve

출처: https://arxiv.org/abs/2502.09340v2


Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

원문

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture tha

출처: https://arxiv.org/abs/2603.19182v1


NavTrust: Benchmarking Trustworthiness for Embodied Navigation

원문

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input mo

출처: https://arxiv.org/abs/2603.19229v1


DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an

원문

DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and

출처: https://arxiv.org/abs/2603.19219v1


WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste

원문

WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste

Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world th

출처: https://arxiv.org/abs/2603.11132v2


SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha

원문

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward

출처: https://arxiv.org/abs/2603.19173v1


DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici

원문

DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici

Despite the computational efficiency of MoE models, the excessive memory footprint and I/O overhead inherent in multi-expert architectures pose formidable challenges for real-time inference on resource-constrained edge platforms. While existing static methods struggle with a rigid latency-accuracy trade-off, we observe that expert importance is highly skewed and depth-dependent. Motivated by these insights, we propose DyMoE, a dynamic mixed-precision quantization framework designed for high-perf

출처: https://arxiv.org/abs/2603.19172v1


ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio

원문

ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio

Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning t

출처: https://arxiv.org/abs/2603.19169v1


Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua

원문

Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua

Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th

출처: https://arxiv.org/abs/2603.19166v1


2-D Directed Formation Control Based on Bipolar Coordinates

원문

2-D Directed Formation Control Based on Bipolar Coordinates

This work proposes a novel 2-D formation control scheme for acyclic triangulated directed graphs (a class of minimally acyclic persistent graphs) based on bipolar coordinates with (almost) global convergence to the desired shape. Prescribed performance control is employed to devise a decentralized control law that avoids singularities and introduces robustness against external disturbances while ensuring predefined transient and steady-state performance for the closed-loop system. Furthermore, i

출처: https://arxiv.org/abs/2108.00916v4


Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal

원문

Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal

Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by lea

출처: https://arxiv.org/abs/2603.19186v1


Spectrally-Guided Diffusion Noise Schedules

원문

Spectrally-Guided Diffusion Noise Schedules

Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper

출처: https://arxiv.org/abs/2603.19222v1


Online Learning and Equilibrium Computation with Ranking Feedback

원문

Online Learning and Equilibrium Computation with Ranking Feedback

Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environment, which may be unavailable in human-in-the-loop applications and/or may be restricted by privacy concerns. In this paper, we study an online learning model in which the learner only observes a \emph{

출처: https://arxiv.org/abs/2603.19221v1


$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E

원문

$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E

Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also ha

출처: https://arxiv.org/abs/2603.19215v1


Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode

원문

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode

Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a strong alternative. We systematically evaluate SSM vision backbones for VLMs in a controlled setting. Under matched ImageNet-1K initialization, the SSM backbone achieves the strongest overall performanc

출처: https://arxiv.org/abs/2603.19209v1


The Exponentially Weighted Signature

원문

The Exponentially Weighted Signature

The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together

출처: https://arxiv.org/abs/2603.19198v1


Score Reversal Is Not Free for Quantum Diffusion Models

원문

Score Reversal Is Not Free for Quantum Diffusion Models

Classical reverse diffusion is generated by changing the drift at fixed noise. We show that the quantum version of this principle obeys an exact law with a sharp phase boundary. For Gaussian pure-loss dynamics, the canonical model of continuous-variable decoherence, we prove that the unrestricted instantaneous reverse optimum exhibits a noiseless-to-noisy transition: below a critical squeezing-to-thermal ratio, reversal can be noiseless; above it, complete positivity forces irreducible reverse n

출처: https://arxiv.org/abs/2603.06488v3


The Convergence Frontier: Integrating Machine Learning and High Performance Quan

원문

The Convergence Frontier: Integrating Machine Learning and High Performance Quan

Integrating quantum mechanics into drug discovery marks a decisive shift from empirical trial-and-error toward quantitative precision. However, the prohibitive cost of ab initio molecular dynamics has historically forced a compromise between chemical accuracy and computational scalability. This paper identifies the convergence of High-Performance Computing (HPC), Machine Learning (ML), and Quantum Computing (QC) as the definitive solution to this bottleneck. While ML foundation models, such as F

출처: https://arxiv.org/abs/2603.17790v2


Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

원문

Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration index (RCI). On the UCI Phishing Web

출처: https://arxiv.org/abs/2603.19204v1


Verifiable Semantics for Agent-to-Agent Communication

원문

Verifiable Semantics for Agent-to-Agent Communication

Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents

출처: https://arxiv.org/abs/2602.16424v2


FinTradeBench: A Financial Reasoning Benchmark for LLMs

원문

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data a

출처: https://arxiv.org/abs/2603.19225v1


F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor

원문

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques,

출처: https://arxiv.org/abs/2603.19223v1


Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic

원문

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP

출처: https://arxiv.org/abs/2603.19220v1


DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L

원문

DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L

Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, p

출처: https://arxiv.org/abs/2603.19216v1


Enhancing Lexicon-Based Text Embeddings with Large Language Models

원문

Enhancing Lexicon-Based Text Embeddings with Large Language Models

Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging LLMs that achieve competitive performance on these tasks. LENS consolidates the vocabulary space through token embedding clustering to handle the issue of token redundancy in LLM vocabularies. To further improve performance, we investigate bidirectional atten

출처: https://arxiv.org/abs/2501.09749v2


How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic

원문

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic

Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over

출처: https://arxiv.org/abs/2603.19195v1


OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

원문

OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and emp

출처: https://arxiv.org/abs/2603.19191v1


iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

원문

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model th

출처: https://arxiv.org/abs/2511.08905v3


This looks like what? Challenges and Future Research Directions for Part-Prototy

원문

This looks like what? Challenges and Future Research Directions for Part-Prototy

The growing interest in eXplainable Artificial Intelligence (XAI) has stimulated research on models with built-in interpretability, among which part-prototype models are particularly prominent. Part-Prototype Models (PPMs) classify inputs by comparing them to learned prototypes and provide human-understandable explanations of the form "this looks like that". Despite this intrinsic interpretability, PPMs have not yet emerged as a competitive alternative to post-hoc explanation methods. This surve

출처: https://arxiv.org/abs/2502.09340v2


Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

원문

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture tha

출처: https://arxiv.org/abs/2603.19182v1


NavTrust: Benchmarking Trustworthiness for Embodied Navigation

원문

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input mo

출처: https://arxiv.org/abs/2603.19229v1


DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an

원문

DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and

출처: https://arxiv.org/abs/2603.19219v1


WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste

원문

WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste

Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world th

출처: https://arxiv.org/abs/2603.11132v2


SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha

원문

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward

출처: https://arxiv.org/abs/2603.19173v1


DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici

원문

DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici

Despite the computational efficiency of MoE models, the excessive memory footprint and I/O overhead inherent in multi-expert architectures pose formidable challenges for real-time inference on resource-constrained edge platforms. While existing static methods struggle with a rigid latency-accuracy trade-off, we observe that expert importance is highly skewed and depth-dependent. Motivated by these insights, we propose DyMoE, a dynamic mixed-precision quantization framework designed for high-perf

출처: https://arxiv.org/abs/2603.19172v1


ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio

원문

ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio

Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning t

출처: https://arxiv.org/abs/2603.19169v1


Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua

원문

Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua

Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th

출처: https://arxiv.org/abs/2603.19166v1


2-D Directed Formation Control Based on Bipolar Coordinates

원문

2-D Directed Formation Control Based on Bipolar Coordinates

This work proposes a novel 2-D formation control scheme for acyclic triangulated directed graphs (a class of minimally acyclic persistent graphs) based on bipolar coordinates with (almost) global convergence to the desired shape. Prescribed performance control is employed to devise a decentralized control law that avoids singularities and introduces robustness against external disturbances while ensuring predefined transient and steady-state performance for the closed-loop system. Furthermore, i

출처: https://arxiv.org/abs/2108.00916v4


Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal

원문

Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal

Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by lea

출처: https://arxiv.org/abs/2603.19186v1


Spectrally-Guided Diffusion Noise Schedules

원문

Spectrally-Guided Diffusion Noise Schedules

Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper

출처: https://arxiv.org/abs/2603.19222v1


Online Learning and Equilibrium Computation with Ranking Feedback

원문

Online Learning and Equilibrium Computation with Ranking Feedback

Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environment, which may be unavailable in human-in-the-loop applications and/or may be restricted by privacy concerns. In this paper, we study an online learning model in which the learner only observes a \emph{

출처: https://arxiv.org/abs/2603.19221v1


$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E

원문

$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E

Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also ha

출처: https://arxiv.org/abs/2603.19215v1


Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode

원문

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode

Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a strong alternative. We systematically evaluate SSM vision backbones for VLMs in a controlled setting. Under matched ImageNet-1K initialization, the SSM backbone achieves the strongest overall performanc

출처: https://arxiv.org/abs/2603.19209v1


The Exponentially Weighted Signature

원문

The Exponentially Weighted Signature

The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together

출처: https://arxiv.org/abs/2603.19198v1


Score Reversal Is Not Free for Quantum Diffusion Models

원문

Score Reversal Is Not Free for Quantum Diffusion Models

Classical reverse diffusion is generated by changing the drift at fixed noise. We show that the quantum version of this principle obeys an exact law with a sharp phase boundary. For Gaussian pure-loss dynamics, the canonical model of continuous-variable decoherence, we prove that the unrestricted instantaneous reverse optimum exhibits a noiseless-to-noisy transition: below a critical squeezing-to-thermal ratio, reversal can be noiseless; above it, complete positivity forces irreducible reverse n

출처: https://arxiv.org/abs/2603.06488v3


The Convergence Frontier: Integrating Machine Learning and High Performance Quan

원문

The Convergence Frontier: Integrating Machine Learning and High Performance Quan

Integrating quantum mechanics into drug discovery marks a decisive shift from empirical trial-and-error toward quantitative precision. However, the prohibitive cost of ab initio molecular dynamics has historically forced a compromise between chemical accuracy and computational scalability. This paper identifies the convergence of High-Performance Computing (HPC), Machine Learning (ML), and Quantum Computing (QC) as the definitive solution to this bottleneck. While ML foundation models, such as F

출처: https://arxiv.org/abs/2603.17790v2


Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

원문

Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration index (RCI). On the UCI Phishing Web

출처: https://arxiv.org/abs/2603.19204v1


Verifiable Semantics for Agent-to-Agent Communication

원문

Verifiable Semantics for Agent-to-Agent Communication

Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents

출처: https://arxiv.org/abs/2602.16424v2