大语言模型 (LLM)

关于 GPT、Claude、Llama、Gemini 等大语言模型的最新资讯、技术突破与行业应用。

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
LLM

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games

arXiv:2510.12563v3 Announce Type: replace Abstract: Large Reasoning Models (LRMs) have demonstrated impressive performan...

The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence
LLM

The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence

arXiv:2510.25883v2 Announce Type: replace Abstract: Why do brains and deep networks converge on similar representations?...

Analyzing and Improving Fast Sampling of Text-to-Image Diffusion Models
LLM

Analyzing and Improving Fast Sampling of Text-to-Image Diffusion Models

arXiv:2603.00763v1 Announce Type: new Abstract: Text-to-image diffusion models have achieved unprecedented success but s...

DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs
LLM

DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs

arXiv:2510.19842v2 Announce Type: replace Abstract: Large Language Models (LLMs) demonstrate strong performance on mathe...

ScholarEval: Research Idea Evaluation Grounded in Literature
LLM

ScholarEval: Research Idea Evaluation Grounded in Literature

arXiv:2510.16234v2 Announce Type: replace Abstract: As AI tools become increasingly common for research ideation, robust...

OpenAutoNLU: Open Source AutoML Library for NLU
LLM

OpenAutoNLU: Open Source AutoML Library for NLU

arXiv:2603.01824v1 Announce Type: new Abstract: OpenAutoNLU is an open-source automated machine learning library for nat...

Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
LLM

Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning

arXiv:2510.04284v3 Announce Type: replace Abstract: The professionalism of a human doctor in outpatient service depends ...

ScholarEval: Research Idea Evaluation Grounded in Literature
LLM

ScholarEval: Research Idea Evaluation Grounded in Literature

arXiv:2510.16234v2 Announce Type: replace Abstract: As AI tools become increasingly common for research ideation, robust...

DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
LLM

DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization

arXiv:2510.04474v2 Announce Type: replace Abstract: Recent large reasoning models (LRMs) driven by reinforcement learnin...

A Representation-Consistent Gated Recurrent Framework for Robust Medical Time-Series Classification
LLM

A Representation-Consistent Gated Recurrent Framework for Robust Medical Time-Series Classification

arXiv:2603.00067v1 Announce Type: new Abstract: Medical time-series data are characterized by irregular sampling, high n...

Diversity over Uniformity: Rethinking Representation in Generated Image Detection
LLM

Diversity over Uniformity: Rethinking Representation in Generated Image Detection

arXiv:2603.00717v1 Announce Type: new Abstract: With the rapid advancement of generative models, generated image detecti...

BornoViT: A Novel Efficient Vision Transformer for Bengali Handwritten Basic Characters Classification
LLM

BornoViT: A Novel Efficient Vision Transformer for Bengali Handwritten Basic Characters Classification

arXiv:2603.00755v1 Announce Type: new Abstract: Handwritten character classification in the Bengali script is a signific...

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
LLM

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games

arXiv:2510.12563v3 Announce Type: replace Abstract: Large Reasoning Models (LRMs) have demonstrated impressive performan...

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
LLM

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games

arXiv:2510.12563v3 Announce Type: replace Abstract: Large Reasoning Models (LRMs) have demonstrated impressive performan...

8点1氪丨椰树集团再陷擦边营销风波被约谈;电影难看20分钟内可退款40%,一影院试行“观影后悔权”;中欧航线票价暴涨
LLM

8点1氪丨椰树集团再陷擦边营销风波被约谈;电影难看20分钟内可退款40%,一影院试行“观影后悔权”;中欧航线票价暴涨

今日热点导览 伊朗称霍尔木兹海峡已关闭: “不会让一滴石油流出” 雷军:小米机器人已在汽车工厂实习,未来5年大批人形机器人进厂 亚马逊在阿联酋数据中心遭撞击起火 多家金店暂停投资金条销售 北京银行贵金属业务出现BUG BOSS直聘称网传伊朗...

Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
LLM

Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning

arXiv:2510.04284v3 Announce Type: replace Abstract: The professionalism of a human doctor in outpatient service depends ...

DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
LLM

DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization

arXiv:2510.04474v2 Announce Type: replace Abstract: Recent large reasoning models (LRMs) driven by reinforcement learnin...

Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
LLM

Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning

arXiv:2510.04284v3 Announce Type: replace Abstract: The professionalism of a human doctor in outpatient service depends ...

Semantic Novelty Trajectories in 80,000 Books: A Cross-Corpus Embedding Analysis
LLM

Semantic Novelty Trajectories in 80,000 Books: A Cross-Corpus Embedding Analysis

arXiv:2603.01791v1 Announce Type: new Abstract: I apply Schmidhuber's compression progress theory of interestingness at ...

FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
LLM

FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

arXiv:2510.04040v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly rely on Chain-of-Thought (...

FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
LLM

FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

arXiv:2510.04040v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly rely on Chain-of-Thought (...

FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
LLM

FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

arXiv:2510.04040v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly rely on Chain-of-Thought (...

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
LLM

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

arXiv:2510.01367v4 Announce Type: replace Abstract: Reward hacking, where a reasoning model exploits loopholes in a rewa...

Diversity over Uniformity: Rethinking Representation in Generated Image Detection
LLM

Diversity over Uniformity: Rethinking Representation in Generated Image Detection

arXiv:2603.00717v1 Announce Type: new Abstract: With the rapid advancement of generative models, generated image detecti...

A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging
LLM

A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging

arXiv:2603.00714v1 Announce Type: new Abstract: Visual analysis and reconstruction of pipeline inner walls remain challe...

Understanding the Role of Training Data in Test-Time Scaling
LLM

Understanding the Role of Training Data in Test-Time Scaling

arXiv:2510.03605v2 Announce Type: replace Abstract: Test-time scaling improves the reasoning capabilities of large langu...

A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging
LLM

A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging

arXiv:2603.00714v1 Announce Type: new Abstract: Visual analysis and reconstruction of pipeline inner walls remain challe...

Understanding the Role of Training Data in Test-Time Scaling
LLM

Understanding the Role of Training Data in Test-Time Scaling

arXiv:2510.03605v2 Announce Type: replace Abstract: Test-time scaling improves the reasoning capabilities of large langu...

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
LLM

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

arXiv:2510.01367v4 Announce Type: replace Abstract: Reward hacking, where a reasoning model exploits loopholes in a rewa...

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
LLM

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

arXiv:2510.01367v4 Announce Type: replace Abstract: Reward hacking, where a reasoning model exploits loopholes in a rewa...

nchellwig at SemEval-2026 Task 3: Self-Consistent Structured Generation (SCSG) for Dimensional Aspect-Based Sentiment Analysis using Large Language Models
LLM

nchellwig at SemEval-2026 Task 3: Self-Consistent Structured Generation (SCSG) for Dimensional Aspect-Based Sentiment Analysis using Large Language Models

arXiv:2603.01788v1 Announce Type: new Abstract: We present Self-Consistent Structured Generation (SCSG) for Dimensional ...

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
LLM

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

arXiv:2509.24156v2 Announce Type: replace Abstract: Large reasoning models (LRMs) exhibit unprecedented capabilities in ...

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
LLM

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

arXiv:2510.01367v4 Announce Type: replace Abstract: Reward hacking, where a reasoning model exploits loopholes in a rewa...

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
LLM

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

arXiv:2509.24393v2 Announce Type: replace Abstract: Although Large Reasoning Models (LRMs) have progressed in solving co...

BiJEPA: Bi-directional Joint Embedding Predictive Architecture for Symmetric Representation Learning
LLM

BiJEPA: Bi-directional Joint Embedding Predictive Architecture for Symmetric Representation Learning

arXiv:2603.00049v1 Announce Type: new Abstract: Self-Supervised Learning (SSL) has shifted from pixel-level reconstructi...

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
LLM

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

arXiv:2509.24393v2 Announce Type: replace Abstract: Although Large Reasoning Models (LRMs) have progressed in solving co...

Towards Khmer Scene Document Layout Detection
LLM

Towards Khmer Scene Document Layout Detection

arXiv:2603.00707v1 Announce Type: new Abstract: While document layout analysis for Latin scripts has advanced significan...

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
LLM

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

arXiv:2509.24393v2 Announce Type: replace Abstract: Although Large Reasoning Models (LRMs) have progressed in solving co...

Towards Universal Khmer Text Recognition
LLM

Towards Universal Khmer Text Recognition

arXiv:2603.00702v1 Announce Type: new Abstract: Khmer is a low-resource language characterized by a complex script, pres...

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
LLM

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

arXiv:2509.24156v2 Announce Type: replace Abstract: Large reasoning models (LRMs) exhibit unprecedented capabilities in ...

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
LLM

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

arXiv:2509.23465v2 Announce Type: replace Abstract: Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamen...

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
LLM

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

arXiv:2509.24156v2 Announce Type: replace Abstract: Large reasoning models (LRMs) exhibit unprecedented capabilities in ...

LLM-as-an-Annotator: Training Lightweight Models with LLM-Annotated Examples for Aspect Sentiment Tuple Prediction
LLM

LLM-as-an-Annotator: Training Lightweight Models with LLM-Annotated Examples for Aspect Sentiment Tuple Prediction

arXiv:2603.01778v1 Announce Type: new Abstract: Training models for Aspect-Based Sentiment Analysis (ABSA) tasks require...

Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework
LLM

Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework

arXiv:2603.00010v1 Announce Type: new Abstract: Transit Network Design is a well-studied problem in the field of transpo...

From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
LLM

From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents

arXiv:2509.23415v2 Announce Type: replace Abstract: Despite the impressive performance of LLM-powered agents, their adop...

BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
LLM

BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving

arXiv:2509.23589v3 Announce Type: replace Abstract: Diffusion-based planners have shown strong potential for autonomous ...

Bilinear representation mitigates reversal curse and enables consistent model editing
LLM

Bilinear representation mitigates reversal curse and enables consistent model editing

arXiv:2509.21993v3 Announce Type: replace Abstract: The reversal curse--a language model's inability to infer an unseen ...

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
LLM

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

arXiv:2509.23465v2 Announce Type: replace Abstract: Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamen...

Beyond the Resum\'e: A Rubric-Aware Automatic Interview System for Information Elicitation
LLM

Beyond the Resum\'e: A Rubric-Aware Automatic Interview System for Information Elicitation

arXiv:2603.01775v1 Announce Type: new Abstract: Effective hiring is integral to the success of an organisation, but it i...

SCOUT: Fast Spectral CT Imaging in Ultra LOw-data Regimes via PseUdo-label GeneraTion
LLM

SCOUT: Fast Spectral CT Imaging in Ultra LOw-data Regimes via PseUdo-label GeneraTion

arXiv:2603.00687v1 Announce Type: new Abstract: Noise and artifacts during computed tomography (CT) scans are a fundamen...

LLMs can unmask pseudonymous users at scale with surprising accuracy
LLM

LLMs can unmask pseudonymous users at scale with surprising accuracy

Burner accounts on social media sites can increasingly be analyzed to identify the pseudonymous users who post to them u...

TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction
LLM

TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction

arXiv:2603.00697v1 Announce Type: new Abstract: We present TokenSplat, a feed-forward framework for joint 3D Gaussian re...

BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
LLM

BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving

arXiv:2509.23589v3 Announce Type: replace Abstract: Diffusion-based planners have shown strong potential for autonomous ...

STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification
LLM

STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification

arXiv:2603.00695v1 Announce Type: new Abstract: Multi-modal object Re-Identification (ReID) aims to exploit complementar...

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
LLM

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

arXiv:2509.23465v2 Announce Type: replace Abstract: Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamen...

Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles
LLM

Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

arXiv:2509.21028v3 Announce Type: replace Abstract: We introduce SciTrek, a diagnostic question-answering benchmark desi...

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
LLM

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

arXiv:2509.23465v2 Announce Type: replace Abstract: Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamen...

AnnoABSA: A Web-Based Annotation Tool for Aspect-Based Sentiment Analysis with Retrieval-Augmented Suggestions
LLM

AnnoABSA: A Web-Based Annotation Tool for Aspect-Based Sentiment Analysis with Retrieval-Augmented Suggestions

arXiv:2603.01773v1 Announce Type: new Abstract: We introduce AnnoABSA, the first web-based annotation tool to support th...

From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
LLM

From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents

arXiv:2509.23415v2 Announce Type: replace Abstract: Despite the impressive performance of LLM-powered agents, their adop...

From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
LLM

From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents

arXiv:2509.23415v2 Announce Type: replace Abstract: Despite the impressive performance of LLM-powered agents, their adop...