260321 arxiv 모음
FinTradeBench: A Financial Reasoning Benchmark for LLMs
FinTradeBench: A Financial Reasoning Benchmark for LLMs
Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data a
출처: https://arxiv.org/abs/2603.19225v1
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques,
출처: https://arxiv.org/abs/2603.19223v1
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP
출처: https://arxiv.org/abs/2603.19220v1
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L
Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, p
출처: https://arxiv.org/abs/2603.19216v1
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging LLMs that achieve competitive performance on these tasks. LENS consolidates the vocabulary space through token embedding clustering to handle the issue of token redundancy in LLM vocabularies. To further improve performance, we investigate bidirectional atten
출처: https://arxiv.org/abs/2501.09749v2
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic
Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over
출처: https://arxiv.org/abs/2603.19195v1
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and emp
출처: https://arxiv.org/abs/2603.19191v1
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model th
출처: https://arxiv.org/abs/2511.08905v3
This looks like what? Challenges and Future Research Directions for Part-Prototy
This looks like what? Challenges and Future Research Directions for Part-Prototy
The growing interest in eXplainable Artificial Intelligence (XAI) has stimulated research on models with built-in interpretability, among which part-prototype models are particularly prominent. Part-Prototype Models (PPMs) classify inputs by comparing them to learned prototypes and provide human-understandable explanations of the form "this looks like that". Despite this intrinsic interpretability, PPMs have not yet emerged as a competitive alternative to post-hoc explanation methods. This surve
출처: https://arxiv.org/abs/2502.09340v2
Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture tha
출처: https://arxiv.org/abs/2603.19182v1
NavTrust: Benchmarking Trustworthiness for Embodied Navigation
NavTrust: Benchmarking Trustworthiness for Embodied Navigation
There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input mo
출처: https://arxiv.org/abs/2603.19229v1
DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an
DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an
With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and
출처: https://arxiv.org/abs/2603.19219v1
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste
Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world th
출처: https://arxiv.org/abs/2603.11132v2
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha
As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting [[NVIDIA]] Blackwell GPUs. The benchmark covers forward
출처: https://arxiv.org/abs/2603.19173v1
DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici
DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici
Despite the computational efficiency of MoE models, the excessive memory footprint and I/O overhead inherent in multi-expert architectures pose formidable challenges for real-time inference on resource-constrained edge platforms. While existing static methods struggle with a rigid latency-accuracy trade-off, we observe that expert importance is highly skewed and depth-dependent. Motivated by these insights, we propose DyMoE, a dynamic mixed-precision quantization framework designed for high-perf
출처: https://arxiv.org/abs/2603.19172v1
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio
Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically [[Coherent]] stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning t
출처: https://arxiv.org/abs/2603.19169v1
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua
Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th
출처: https://arxiv.org/abs/2603.19166v1
2-D Directed Formation Control Based on Bipolar Coordinates
2-D Directed Formation Control Based on Bipolar Coordinates
This work proposes a novel 2-D formation control scheme for acyclic triangulated directed graphs (a class of minimally acyclic persistent graphs) based on bipolar coordinates with (almost) global convergence to the desired shape. Prescribed performance control is employed to devise a decentralized control law that avoids singularities and introduces robustness against external disturbances while ensuring predefined transient and steady-state performance for the closed-loop system. Furthermore, i
출처: https://arxiv.org/abs/2108.00916v4
Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal
Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal
Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by lea
출처: https://arxiv.org/abs/2603.19186v1
Spectrally-Guided Diffusion Noise Schedules
Spectrally-Guided Diffusion Noise Schedules
Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper
출처: https://arxiv.org/abs/2603.19222v1
Online Learning and Equilibrium Computation with Ranking Feedback
Online Learning and Equilibrium Computation with Ranking Feedback
Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environment, which may be unavailable in human-in-the-loop applications and/or may be restricted by privacy concerns. In this paper, we study an online learning model in which the learner only observes a \emph{
출처: https://arxiv.org/abs/2603.19221v1
$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E
$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E
Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also ha
출처: https://arxiv.org/abs/2603.19215v1
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode
Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a strong alternative. We systematically evaluate SSM vision backbones for VLMs in a controlled setting. Under matched ImageNet-1K initialization, the SSM backbone achieves the strongest overall performanc
출처: https://arxiv.org/abs/2603.19209v1
The Exponentially Weighted Signature
The Exponentially Weighted Signature
The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together
출처: https://arxiv.org/abs/2603.19198v1
Score Reversal Is Not Free for Quantum Diffusion Models
Score Reversal Is Not Free for Quantum Diffusion Models
Classical reverse diffusion is generated by changing the drift at fixed noise. We show that the quantum version of this principle obeys an exact law with a sharp phase boundary. For Gaussian pure-loss dynamics, the canonical model of continuous-variable decoherence, we prove that the unrestricted instantaneous reverse optimum exhibits a noiseless-to-noisy transition: below a critical squeezing-to-thermal ratio, reversal can be noiseless; above it, complete positivity forces irreducible reverse n
출처: https://arxiv.org/abs/2603.06488v3
The Convergence Frontier: Integrating Machine Learning and High Performance Quan
The Convergence Frontier: Integrating Machine Learning and High Performance Quan
Integrating quantum mechanics into drug discovery marks a decisive shift from empirical trial-and-error toward quantitative precision. However, the prohibitive cost of ab initio molecular dynamics has historically forced a compromise between chemical accuracy and computational scalability. This paper identifies the convergence of High-Performance Computing (HPC), Machine Learning (ML), and Quantum Computing (QC) as the definitive solution to this bottleneck. While ML foundation models, such as F
출처: https://arxiv.org/abs/2603.17790v2
Robustness, Cost, and Attack-Surface Concentration in Phishing Detection
Robustness, Cost, and Attack-Surface Concentration in Phishing Detection
Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration [[INDEX]] (RCI). On the UCI Phishing Web
출처: https://arxiv.org/abs/2603.19204v1
Verifiable Semantics for Agent-to-Agent Communication
Verifiable Semantics for Agent-to-Agent Communication
Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents
출처: https://arxiv.org/abs/2602.16424v2
FinTradeBench: A Financial Reasoning Benchmark for LLMs
FinTradeBench: A Financial Reasoning Benchmark for LLMs
Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data a
출처: https://arxiv.org/abs/2603.19225v1
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques,
출처: https://arxiv.org/abs/2603.19223v1
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP
출처: https://arxiv.org/abs/2603.19220v1
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L
Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, p
출처: https://arxiv.org/abs/2603.19216v1
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging LLMs that achieve competitive performance on these tasks. LENS consolidates the vocabulary space through token embedding clustering to handle the issue of token redundancy in LLM vocabularies. To further improve performance, we investigate bidirectional atten
출처: https://arxiv.org/abs/2501.09749v2
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic
Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over
출처: https://arxiv.org/abs/2603.19195v1
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and emp
출처: https://arxiv.org/abs/2603.19191v1
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model th
출처: https://arxiv.org/abs/2511.08905v3
This looks like what? Challenges and Future Research Directions for Part-Prototy
This looks like what? Challenges and Future Research Directions for Part-Prototy
The growing interest in eXplainable Artificial Intelligence (XAI) has stimulated research on models with built-in interpretability, among which part-prototype models are particularly prominent. Part-Prototype Models (PPMs) classify inputs by comparing them to learned prototypes and provide human-understandable explanations of the form "this looks like that". Despite this intrinsic interpretability, PPMs have not yet emerged as a competitive alternative to post-hoc explanation methods. This surve
출처: https://arxiv.org/abs/2502.09340v2
Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture tha
출처: https://arxiv.org/abs/2603.19182v1
NavTrust: Benchmarking Trustworthiness for Embodied Navigation
NavTrust: Benchmarking Trustworthiness for Embodied Navigation
There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input mo
출처: https://arxiv.org/abs/2603.19229v1
DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an
DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an
With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and
출처: https://arxiv.org/abs/2603.19219v1
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste
Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world th
출처: https://arxiv.org/abs/2603.11132v2
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha
As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward
출처: https://arxiv.org/abs/2603.19173v1
DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici
DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici
Despite the computational efficiency of MoE models, the excessive memory footprint and I/O overhead inherent in multi-expert architectures pose formidable challenges for real-time inference on resource-constrained edge platforms. While existing static methods struggle with a rigid latency-accuracy trade-off, we observe that expert importance is highly skewed and depth-dependent. Motivated by these insights, we propose DyMoE, a dynamic mixed-precision quantization framework designed for high-perf
출처: https://arxiv.org/abs/2603.19172v1
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio
Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning t
출처: https://arxiv.org/abs/2603.19169v1
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua
Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th
출처: https://arxiv.org/abs/2603.19166v1
2-D Directed Formation Control Based on Bipolar Coordinates
2-D Directed Formation Control Based on Bipolar Coordinates
This work proposes a novel 2-D formation control scheme for acyclic triangulated directed graphs (a class of minimally acyclic persistent graphs) based on bipolar coordinates with (almost) global convergence to the desired shape. Prescribed performance control is employed to devise a decentralized control law that avoids singularities and introduces robustness against external disturbances while ensuring predefined transient and steady-state performance for the closed-loop system. Furthermore, i
출처: https://arxiv.org/abs/2108.00916v4
Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal
Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal
Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by lea
출처: https://arxiv.org/abs/2603.19186v1
Spectrally-Guided Diffusion Noise Schedules
Spectrally-Guided Diffusion Noise Schedules
Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper
출처: https://arxiv.org/abs/2603.19222v1
Online Learning and Equilibrium Computation with Ranking Feedback
Online Learning and Equilibrium Computation with Ranking Feedback
Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environment, which may be unavailable in human-in-the-loop applications and/or may be restricted by privacy concerns. In this paper, we study an online learning model in which the learner only observes a \emph{
출처: https://arxiv.org/abs/2603.19221v1
$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E
$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E
Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also ha
출처: https://arxiv.org/abs/2603.19215v1
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode
Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a strong alternative. We systematically evaluate SSM vision backbones for VLMs in a controlled setting. Under matched ImageNet-1K initialization, the SSM backbone achieves the strongest overall performanc
출처: https://arxiv.org/abs/2603.19209v1
The Exponentially Weighted Signature
The Exponentially Weighted Signature
The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together
출처: https://arxiv.org/abs/2603.19198v1
Score Reversal Is Not Free for Quantum Diffusion Models
Score Reversal Is Not Free for Quantum Diffusion Models
Classical reverse diffusion is generated by changing the drift at fixed noise. We show that the quantum version of this principle obeys an exact law with a sharp phase boundary. For Gaussian pure-loss dynamics, the canonical model of continuous-variable decoherence, we prove that the unrestricted instantaneous reverse optimum exhibits a noiseless-to-noisy transition: below a critical squeezing-to-thermal ratio, reversal can be noiseless; above it, complete positivity forces irreducible reverse n
출처: https://arxiv.org/abs/2603.06488v3
The Convergence Frontier: Integrating Machine Learning and High Performance Quan
The Convergence Frontier: Integrating Machine Learning and High Performance Quan
Integrating quantum mechanics into drug discovery marks a decisive shift from empirical trial-and-error toward quantitative precision. However, the prohibitive cost of ab initio molecular dynamics has historically forced a compromise between chemical accuracy and computational scalability. This paper identifies the convergence of High-Performance Computing (HPC), Machine Learning (ML), and Quantum Computing (QC) as the definitive solution to this bottleneck. While ML foundation models, such as F
출처: https://arxiv.org/abs/2603.17790v2
Robustness, Cost, and Attack-Surface Concentration in Phishing Detection
Robustness, Cost, and Attack-Surface Concentration in Phishing Detection
Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration index (RCI). On the UCI Phishing Web
출처: https://arxiv.org/abs/2603.19204v1
Verifiable Semantics for Agent-to-Agent Communication
Verifiable Semantics for Agent-to-Agent Communication
Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents
출처: https://arxiv.org/abs/2602.16424v2
관련 노트
- [[Coherent]]
- [[NVIDIA]]
- [[INDEX]]
- [[260314_arxiv]] — 키워드 유사
- [[260319_arxiv]] — 키워드 유사
- [[260318_arxiv]] — 키워드 유사
- [[260317_arxiv]] — 키워드 유사
- [[260316_x]] — 키워드 유사
FinTradeBench: A Financial Reasoning Benchmark for LLMs
FinTradeBench: A Financial Reasoning Benchmark for LLMs
Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data a
출처: https://arxiv.org/abs/2603.19225v1
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques,
출처: https://arxiv.org/abs/2603.19223v1
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP
출처: https://arxiv.org/abs/2603.19220v1
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L
Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, p
출처: https://arxiv.org/abs/2603.19216v1
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging LLMs that achieve competitive performance on these tasks. LENS consolidates the vocabulary space through token embedding clustering to handle the issue of token redundancy in LLM vocabularies. To further improve performance, we investigate bidirectional atten
출처: https://arxiv.org/abs/2501.09749v2
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic
Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over
출처: https://arxiv.org/abs/2603.19195v1
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and emp
출처: https://arxiv.org/abs/2603.19191v1
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model th
출처: https://arxiv.org/abs/2511.08905v3
This looks like what? Challenges and Future Research Directions for Part-Prototy
This looks like what? Challenges and Future Research Directions for Part-Prototy
The growing interest in eXplainable Artificial Intelligence (XAI) has stimulated research on models with built-in interpretability, among which part-prototype models are particularly prominent. Part-Prototype Models (PPMs) classify inputs by comparing them to learned prototypes and provide human-understandable explanations of the form "this looks like that". Despite this intrinsic interpretability, PPMs have not yet emerged as a competitive alternative to post-hoc explanation methods. This surve
출처: https://arxiv.org/abs/2502.09340v2
Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture tha
출처: https://arxiv.org/abs/2603.19182v1
NavTrust: Benchmarking Trustworthiness for Embodied Navigation
NavTrust: Benchmarking Trustworthiness for Embodied Navigation
There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input mo
출처: https://arxiv.org/abs/2603.19229v1
DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an
DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an
With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and
출처: https://arxiv.org/abs/2603.19219v1
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste
Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world th
출처: https://arxiv.org/abs/2603.11132v2
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha
As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward
출처: https://arxiv.org/abs/2603.19173v1
DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici
DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici
Despite the computational efficiency of MoE models, the excessive memory footprint and I/O overhead inherent in multi-expert architectures pose formidable challenges for real-time inference on resource-constrained edge platforms. While existing static methods struggle with a rigid latency-accuracy trade-off, we observe that expert importance is highly skewed and depth-dependent. Motivated by these insights, we propose DyMoE, a dynamic mixed-precision quantization framework designed for high-perf
출처: https://arxiv.org/abs/2603.19172v1
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio
Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning t
출처: https://arxiv.org/abs/2603.19169v1
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua
Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th
출처: https://arxiv.org/abs/2603.19166v1
2-D Directed Formation Control Based on Bipolar Coordinates
2-D Directed Formation Control Based on Bipolar Coordinates
This work proposes a novel 2-D formation control scheme for acyclic triangulated directed graphs (a class of minimally acyclic persistent graphs) based on bipolar coordinates with (almost) global convergence to the desired shape. Prescribed performance control is employed to devise a decentralized control law that avoids singularities and introduces robustness against external disturbances while ensuring predefined transient and steady-state performance for the closed-loop system. Furthermore, i
출처: https://arxiv.org/abs/2108.00916v4
Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal
Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal
Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by lea
출처: https://arxiv.org/abs/2603.19186v1
Spectrally-Guided Diffusion Noise Schedules
Spectrally-Guided Diffusion Noise Schedules
Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper
출처: https://arxiv.org/abs/2603.19222v1
Online Learning and Equilibrium Computation with Ranking Feedback
Online Learning and Equilibrium Computation with Ranking Feedback
Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environment, which may be unavailable in human-in-the-loop applications and/or may be restricted by privacy concerns. In this paper, we study an online learning model in which the learner only observes a \emph{
출처: https://arxiv.org/abs/2603.19221v1
$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E
$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E
Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also ha
출처: https://arxiv.org/abs/2603.19215v1
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode
Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a strong alternative. We systematically evaluate SSM vision backbones for VLMs in a controlled setting. Under matched ImageNet-1K initialization, the SSM backbone achieves the strongest overall performanc
출처: https://arxiv.org/abs/2603.19209v1
The Exponentially Weighted Signature
The Exponentially Weighted Signature
The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together
출처: https://arxiv.org/abs/2603.19198v1
Score Reversal Is Not Free for Quantum Diffusion Models
Score Reversal Is Not Free for Quantum Diffusion Models
Classical reverse diffusion is generated by changing the drift at fixed noise. We show that the quantum version of this principle obeys an exact law with a sharp phase boundary. For Gaussian pure-loss dynamics, the canonical model of continuous-variable decoherence, we prove that the unrestricted instantaneous reverse optimum exhibits a noiseless-to-noisy transition: below a critical squeezing-to-thermal ratio, reversal can be noiseless; above it, complete positivity forces irreducible reverse n
출처: https://arxiv.org/abs/2603.06488v3
The Convergence Frontier: Integrating Machine Learning and High Performance Quan
The Convergence Frontier: Integrating Machine Learning and High Performance Quan
Integrating quantum mechanics into drug discovery marks a decisive shift from empirical trial-and-error toward quantitative precision. However, the prohibitive cost of ab initio molecular dynamics has historically forced a compromise between chemical accuracy and computational scalability. This paper identifies the convergence of High-Performance Computing (HPC), Machine Learning (ML), and Quantum Computing (QC) as the definitive solution to this bottleneck. While ML foundation models, such as F
출처: https://arxiv.org/abs/2603.17790v2
Robustness, Cost, and Attack-Surface Concentration in Phishing Detection
Robustness, Cost, and Attack-Surface Concentration in Phishing Detection
Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration index (RCI). On the UCI Phishing Web
출처: https://arxiv.org/abs/2603.19204v1
Verifiable Semantics for Agent-to-Agent Communication
Verifiable Semantics for Agent-to-Agent Communication
Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents
출처: https://arxiv.org/abs/2602.16424v2
FinTradeBench: A Financial Reasoning Benchmark for LLMs
FinTradeBench: A Financial Reasoning Benchmark for LLMs
Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data a
출처: https://arxiv.org/abs/2603.19225v1
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual Wor
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques,
출처: https://arxiv.org/abs/2603.19223v1
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Polic
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICP
출처: https://arxiv.org/abs/2603.19220v1
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative L
Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, p
출처: https://arxiv.org/abs/2603.19216v1
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging LLMs that achieve competitive performance on these tasks. LENS consolidates the vocabulary space through token embedding clustering to handle the issue of token redundancy in LLM vocabularies. To further improve performance, we investigate bidirectional atten
출처: https://arxiv.org/abs/2501.09749v2
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic
Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over
출처: https://arxiv.org/abs/2603.19195v1
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and emp
출처: https://arxiv.org/abs/2603.19191v1
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model th
출처: https://arxiv.org/abs/2511.08905v3
This looks like what? Challenges and Future Research Directions for Part-Prototy
This looks like what? Challenges and Future Research Directions for Part-Prototy
The growing interest in eXplainable Artificial Intelligence (XAI) has stimulated research on models with built-in interpretability, among which part-prototype models are particularly prominent. Part-Prototype Models (PPMs) classify inputs by comparing them to learned prototypes and provide human-understandable explanations of the form "this looks like that". Despite this intrinsic interpretability, PPMs have not yet emerged as a competitive alternative to post-hoc explanation methods. This surve
출처: https://arxiv.org/abs/2502.09340v2
Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture tha
출처: https://arxiv.org/abs/2603.19182v1
NavTrust: Benchmarking Trustworthiness for Embodied Navigation
NavTrust: Benchmarking Trustworthiness for Embodied Navigation
There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input mo
출처: https://arxiv.org/abs/2603.19229v1
DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an
DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction an
With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we propose DriveTok, an efficient 3D driving scene tokenizer for unified multi-view reconstruction and
출처: https://arxiv.org/abs/2603.19219v1
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Ste
Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world th
출처: https://arxiv.org/abs/2603.11132v2
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Ha
As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward
출처: https://arxiv.org/abs/2603.19173v1
DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici
DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Effici
Despite the computational efficiency of MoE models, the excessive memory footprint and I/O overhead inherent in multi-expert architectures pose formidable challenges for real-time inference on resource-constrained edge platforms. While existing static methods struggle with a rigid latency-accuracy trade-off, we observe that expert importance is highly skewed and depth-dependent. Motivated by these insights, we propose DyMoE, a dynamic mixed-precision quantization framework designed for high-perf
출처: https://arxiv.org/abs/2603.19172v1
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angio
Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning t
출처: https://arxiv.org/abs/2603.19169v1
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Langua
Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th
출처: https://arxiv.org/abs/2603.19166v1
2-D Directed Formation Control Based on Bipolar Coordinates
2-D Directed Formation Control Based on Bipolar Coordinates
This work proposes a novel 2-D formation control scheme for acyclic triangulated directed graphs (a class of minimally acyclic persistent graphs) based on bipolar coordinates with (almost) global convergence to the desired shape. Prescribed performance control is employed to devise a decentralized control law that avoids singularities and introduces robustness against external disturbances while ensuring predefined transient and steady-state performance for the closed-loop system. Furthermore, i
출처: https://arxiv.org/abs/2108.00916v4
Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal
Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Cal
Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by lea
출처: https://arxiv.org/abs/2603.19186v1
Spectrally-Guided Diffusion Noise Schedules
Spectrally-Guided Diffusion Noise Schedules
Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral proper
출처: https://arxiv.org/abs/2603.19222v1
Online Learning and Equilibrium Computation with Ranking Feedback
Online Learning and Equilibrium Computation with Ranking Feedback
Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environment, which may be unavailable in human-in-the-loop applications and/or may be restricted by privacy concerns. In this paper, we study an online learning model in which the learner only observes a \emph{
출처: https://arxiv.org/abs/2603.19221v1
$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E
$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal E
Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also ha
출처: https://arxiv.org/abs/2603.19215v1
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encode
Large vision--language models (VLMs) often use a frozen vision backbone, whose image features are mapped into a large language model through a lightweight connector. While transformer-based encoders are the standard visual backbone, we ask whether state space model (SSM) vision backbones can be a strong alternative. We systematically evaluate SSM vision backbones for VLMs in a controlled setting. Under matched ImageNet-1K initialization, the SSM backbone achieves the strongest overall performanc
출처: https://arxiv.org/abs/2603.19209v1
The Exponentially Weighted Signature
The Exponentially Weighted Signature
The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together
출처: https://arxiv.org/abs/2603.19198v1
Score Reversal Is Not Free for Quantum Diffusion Models
Score Reversal Is Not Free for Quantum Diffusion Models
Classical reverse diffusion is generated by changing the drift at fixed noise. We show that the quantum version of this principle obeys an exact law with a sharp phase boundary. For Gaussian pure-loss dynamics, the canonical model of continuous-variable decoherence, we prove that the unrestricted instantaneous reverse optimum exhibits a noiseless-to-noisy transition: below a critical squeezing-to-thermal ratio, reversal can be noiseless; above it, complete positivity forces irreducible reverse n
출처: https://arxiv.org/abs/2603.06488v3
The Convergence Frontier: Integrating Machine Learning and High Performance Quan
The Convergence Frontier: Integrating Machine Learning and High Performance Quan
Integrating quantum mechanics into drug discovery marks a decisive shift from empirical trial-and-error toward quantitative precision. However, the prohibitive cost of ab initio molecular dynamics has historically forced a compromise between chemical accuracy and computational scalability. This paper identifies the convergence of High-Performance Computing (HPC), Machine Learning (ML), and Quantum Computing (QC) as the definitive solution to this bottleneck. While ML foundation models, such as F
출처: https://arxiv.org/abs/2603.17790v2
Robustness, Cost, and Attack-Surface Concentration in Phishing Detection
Robustness, Cost, and Attack-Surface Concentration in Phishing Detection
Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration index (RCI). On the UCI Phishing Web
출처: https://arxiv.org/abs/2603.19204v1
Verifiable Semantics for Agent-to-Agent Communication
Verifiable Semantics for Agent-to-Agent Communication
Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents
출처: https://arxiv.org/abs/2602.16424v2