


default search action
EMNLP 2025: Suzhou, China
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng:

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9, 2025. Association for Computational Linguistics 2025, ISBN 979-8-89176-332-6 - Dominic Petrak, Thy Thy Tran, Iryna Gurevych:

Towards Automated Error Discovery: A Study in Conversational AI. 1-23 - Mohsinul Kabir, Ajwad Abrar, Sophia Ananiadou:

Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs. 24-51 - Donya Rooein, Vilém Zouhar, Debora Nozza, Dirk Hovy

:
Biased Tales: Cultural and Topic Bias in Generating Children's Stories. 52-72 - Donghyun Kim, Sriram Ravula, Taemin Ha, Alex Dimakis, Daehyeok Kim, Aditya Akella:

Large Language Models as Realistic Microservice Trace Generators. 73-91 - David Beauchemin, Michelle Albert-Rochette, Richard Khoury, Pierre-Luc Déziel:

JUDGEBERT: Assessing Legal Meaning Preservation Between Sentences. 92-118 - David Beauchemin, Richard Khoury:

QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments. 119-130 - Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea:

Revisiting LLM Value Probing Strategies: Are They Robust and Expressive? 131-145 - Kian Ahrabian, Pegah Jandaghi, Negar Mokhberian, Sai Praneeth Karimireddy, Jay Pujara:

A Systematic Analysis of Base Model Choice for Reward Modeling. 146-164 - Branislav Pecher, Ivan Srba, Mária Bieliková:

Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance. 165-184 - Melanie Subbiah, Akankshya Mishra, Grace Kim, Liyan Tang, Greg Durrett, Kathleen McKeown:

Is the Top Still Spinning? Evaluating Subjectivity in Narrative Understanding. 185-203 - Jakub Macina, Nico Daheim, Ido Hakimi, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan:

MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors. 204-221 - Haishuo Fang, Xiaodan Zhu, Iryna Gurevych:

Preemptive Detection and Correction of Misaligned Actions in LLM Agents. 222-244 - Simon Münker:

Fingerprinting LLMs through Survey Item Factor Correlation: A Case Study on Humor Style Questionnaire. 245-258 - Tianlu Zheng, Yifan Zhang, Xiang An, Ziyong Feng, Kaicheng Yang, Qichuan Ding:

Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval. 259-271 - David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan:

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning. 272-292 - Yuhang Tian

, Dandan Song, Zhijing Wu, Pan Yang, Changzhi Zhou, Jun Yang, Hao Wang, Huipeng Ma, Chenhao Li
, Luan Zhang:
CompKBQA: Component-wise Task Decomposition for Knowledge Base Question Answering. 293-309 - Yang Zhao, Yixin Wang, Mingzhang Yin:

Permutative Preference Alignment from Listwise Ranking of Human Judgments. 310-334 - Junyu Cheng, Chang Pan, Shuangyin Li:

ToneCraft: Cantonese Lyrics Generation with Harmony of Tones and Pitches. 335-353 - Zechen Li

, Shohreh Deldari, Linyao Chen, Hao Xue, Flora D. Salim:
SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition. 354-379 - Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Trung Le, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do:

MixLoRA-DSI: Dynamically Expandable Mixture-of-LoRA Experts for Rehearsal-Free Generative Retrieval over Dynamic Corpora. 380-396 - Patrick Giedemann, Pius von Däniken, Jan Milan Deriu, Álvaro Rodrigo, Anselmo Peñas, Mark Cieliebak:

ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in Videos. 397-413 - Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, Pengfei Liu:

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments. 414-431 - Enjun Du, Siyi Liu, Yongqi Zhang:

Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning. 432-453 - Zhaodan Zhang, Jin Zhang

, Hui Xu, Jiafeng Guo, Xueqi Cheng:
MPRF: Interpretable Stance Detection through Multi-Path Reasoning Framework. 454-470 - Junjie Ye

, Yuming Yang, Yang Nan, Shuo Li, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan:
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels. 471-513 - Jingyu Wei, Bo Liu, Tianjiao Wan, Baoyun Peng, Xingkong Ma, Mengmeng Guo:

JI2S: Joint Influence-Aware Instruction Data Selection for Efficient Fine-Tuning. 514-527 - Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui:

SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models. 528-540 - Xiangchen Wang, Jinrui Zhang, Teng Wang, Haigang Zhang, Feng Zheng:

Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors. 541-558 - Xuanliang Zhang, Dingzirui Wang, Keyan Xu, Qingfu Zhu, Wanxiang Che:

RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals. 559-579 - ZhaoDan Zhang, Jin Zhang

, Xueqi Cheng, Hui Xu:
T-MAD: Target-driven Multimodal Alignment for Stance Detection. 580-595 - Kun Peng, Cong Cao, Hao Peng, Guanlin Wu, Zhifeng Hao, Lei Jiang, Yanbing Liu, Philip S. Yu:

Emotion Transfer with Enhanced Prototype for Unseen Emotion Recognition in Conversation. 596-608 - Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Ranjie Duan, Xiaoshuang Jia, Shaowei Yuan, Simeng Qin, Zhiqiang Wang, Xiaojun Jia:

PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization. 609-628 - Yilong Xu, Jinhua Gao, Xiaoming Yu, Yuanhai Xue, Baolong Bi, Huawei Shen, Xueqi Cheng:

Training a Utility-based Retriever Through Shared Context Attribution for Retrieval-Augmented Language Models. 629-648 - Kaiyue Feng, Siyue Zhang, Bingsen Chen, Yilun Zhao, Chen Zhao:

SportReason: Evaluating Retrieval-Augmented Reasoning across Tables and Text for Sports Question Answering. 649-662 - Junsheng Huang, Zhitao He, Yuchen Huang, Sandeep Polisetty, Qingyun Wang, Yi R. Fung:

MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness. 663-676 - Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He:

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation. 677-693 - Chenxing Wei, Mingwen Ou, Ying He, Yao Shu, Fei Yu:

PAFT: Prompt-Agnostic Fine-Tuning. 694-717 - Linger Deng, Linghao Zhu, Yuliang Liu, Yu Wang, Qunyi Xie, Jingjing Wu, Gang Zhang, Yingying Zhu, Xiang Bai:

Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning. 718-735 - Yanshu Li, Jianjiang Yang, Tian Yun, Pinyuan Feng, Jinfa Huang, Ruixiang Tang:

TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration. 736-763 - Tianxin Xie, Yan Rong, Pengfei Zhang, Wenwu Wang, Li Liu:

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey. 764-791 - Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng:

Automating Steering for Safe Multimodal Large Language Models. 792-814 - Yilin Jiang, Mingzi Zhang, Sheng Jin, Zengyi Yu, Xiangjie Kong, Binghao Tu:

EMNLP: Educator-role Moral and Normative Large Language Models Profiling. 815-843 - Bohao Chu

, Meijie Li, Sameh Frihat, Chengyu Gu, Georg Lodde, Elisabeth Livingstone, Norbert Fuhr:
TracSum: A New Benchmark for Aspect-Based Summarization with Sentence-Level Traceability in Medical Domain. 844-864 - Wenbin Hu, Haoran Li, Huihao Jing, Qi Hu, Ziqian Zeng, Sirui Han, Heli Xu, Tianshu Chu, Peizhao Hu, Yangqiu Song:

Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning. 865-883 - Liqiang Ming, Sheng-hua Zhong, Yuncong Li:

Towards General-Domain Word Sense Disambiguation: Distilling Large Language Model into Compact Disambiguator. 884-897 - Hongyuan Lu, Zixuan Li, Zefan Zhang, Wai Lam:

SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language Models. 898-913 - Haoyi Wu, Zhihao Teng, Kewei Tu:

Parallel Continuous Chain-of-Thought with Jacobi Iteration. 914-926 - Yuhang Chen, Zhen Tan, Tianlong Chen:

EQA-RM: A Generative Embodied Reward Model with Test-time Scaling. 927-945 - Yongkang Chen, Xiaohu Du, Xiaotian Zou, Chongyang Zhao, Huan Deng, Hu Li, Xiaohui Kuang:

Refusal-Aware Red Teaming: Exposing Inconsistency in Safety Evaluations. 946-955 - Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang:

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking. 956-976 - Yihan Wang, Peiyu Liu, Xin Yang:

LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL. 977-991 - Yihong Liu, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang, Amir Hossein Kargaran, Sascha Rothe, François Yvon, Hinrich Schütze:

On Relation-Specific Neurons in Large Language Models. 992-1022 - Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji:

IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents. 1023-1039 - Xingjian Diao, Weiyi Wu, Keyi Kong, Peijun Qing, Xinwen Xu, Ming Cheng, Soroush Vosoughi, Jiang Gui:

ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering. 1040-1057 - Yuanyang Yin, Yaqi Zhao, Yajie Zhang, Yuanxing Zhang, Ke Lin, Jiahao Wang, Xin Tao, Pengfei Wan, Wentao Zhang, Feng Zhao:

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs. 1058-1070 - George Arthur Baker, Mario Sanz-Guerrero, Katharina von der Wense:

Molecular String Representation Preferences in Pretrained LLMs: A Comparative Study in Zero- & Few-Shot Molecular Property Prediction. 1071-1085 - Ming Wang, Miao Zhang, Xuebo Liu, Liqiang Nie:

Weight-Aware Activation Sparsity with Constrained Bayesian Optimization Scheduling for Large Language Models. 1086-1098 - Ziming You, Yumiao Zhang, Dexuan Xu, Yiwei Lou, Yandong Yan, Wei Wang, Huamin Zhang, Yu Huang:

DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation. 1099-1123 - Yang Du, Zhuoran Lin, Kaiqiang Song, Biao Wang, Zhicheng Zheng, Tiezheng Ge, Bo Zheng, Qin Jin:

VC4VG: Optimizing Video Captions for Text-to-Video Generation. 1124-1138 - Alireza Salemi, Hamed Zamani:

LaMP-QA: A Benchmark for Personalized Long-form Question Answering. 1139-1159 - Yubo Zhu, Dongrui Liu, Zecheng Lin, Wei Tong, Sheng Zhong, Jing Shao:

The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations. 1160-1176 - Huihao Jing, Haoran Li, Wenbin Hu, Qi Hu, Heli Xu, Tianshu Chu, Peizhao Hu, Yangqiu Song:

MCIP: Protecting MCP Safety via Model Contextual Integrity Protocol. 1177-1194 - Wenyu Tao, Xiaofen Xing, Zeliang Li, Xiangmin Xu:

SAKI-RAG: Mitigating Context Fragmentation in Long-Document RAG via Sentence-level Attention Knowledge Integration. 1195-1213 - Yuchen Ji, Bo Xu, Jie Shi, Jiaqing Liang, Deqing Yang, Yu Mao, Hai Chen, Yanghua Xiao:

Skeletons Matter: Dynamic Data Augmentation for Text-to-Query. 1214-1236 - Cheng Shen, Yew-Soon Ong, Joey Tianyi Zhou:

CondenseLM: LLMs-driven Text Dataset Condensation via Reward Matching. 1237-1252 - Gueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Ying Cheng, Hung-Ting Su, Yung-Hao Tang, Shang-Hong Lai, Winston H. Hsu:

MovieCORE: COgnitive REasoning in Movies. 1253-1272 - Yuhao Chen, Yuanjie Lyu, Shuochen Liu, Chao Zhang, Junhui Lv, Tong Xu:

Think Wider, Detect Sharper: Reinforced Reference Coverage for Document-Level Self-Contradiction Detection. 1273-1288 - Arijit Maji, Raghvendra Kumar, Akash Ghosh, Anushka, Nemil Shah, Abhilekh Borah, Vanshika Shah, Nishant Mishra, Sriparna Saha:

DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture. 1289-1313 - Changbing Yang, Franklin Ma, Freda Shi, Jian Zhu:

LingGym: How Far Are LLMs from Thinking Like Field Linguists? 1314-1340 - Haijian Ma, Daizong Liu, Xiaowen Cai, Pan Zhou, Yulai Xie:

Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation. 1341-1358 - Sarfaroz Yunusov, Kaige Chen, Kazi Nishat Anwar, Ali Emami

:
Personality Matters: User Traits Predict LLM Preferences in Multi-Turn Collaborative Tasks. 1359-1372 - Yiming Jia, Jiachen Li, Xiang Yue, Bo Li, Ping Nie, Kai Zou, Wenhu Chen:

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search. 1373-1393 - Qingcheng Zeng, Weihao Xuan, Leyang Cui, Rob Voigt:

Thinking Out Loud: Do Reasoning Models Know When They're Right? 1394-1407 - Weihao Xuan, Qingcheng Zeng, Heli Qi, Junjue Wang, Naoto Yokoya:

Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models. 1408-1450 - Mengqi Liao, Xiangyu Xi, Ruinian Chen, Jia Leng, Yangen Hu, Ke Zeng, Shuai Liu, Huaiyu Wan:

Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs. 1451-1463 - Ingroj Shrestha, Padmini Srinivasan:

LLM Bias Detection and Mitigation through the Lens of Desired Distributions. 1464-1480 - Teng Lin, Yuyu Luo, Honglin Zhang, Jicheng Zhang, Chunlin Liu, Kaishun Wu, Nan Tang:

MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering. 1481-1494 - Yifei Wang, Feng Xiong, Yong Wang, Linjing Li, Xiangxiang Chu, Daniel Dajun Zeng:

POSITION BIAS MITIGATES POSITION BIAS: Mitigate Position Bias Through Inter-Position Knowledge Distillation. 1495-1512 - Weihao Xuan, Rui Yang, Heli Qi, Qingcheng Zeng, Yunze Xiao, Aosong Feng, Dairui Liu, Yun Xing, Junjue Wang, Fan Gao, Jinghui Lu

, Yuang Jiang, Huitao Li, Xin Li, Kunyu Yu, Ruihai Dong, Shangding Gu, Yuekang Li, Xiaofei Xie
, Felix Juefei-Xu, Foutse Khomh, Osamu Yoshie, Qingyu Chen, Douglas Teodoro, Nan Liu, Randy Goebel, Lei Ma, Edison Marrese-Taylor, Shijian Lu, Yusuke Iwasawa, Yutaka Matsuo, Irene Li:
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation. 1513-1532 - Weiming Zhang, Qingyao Li, Xinyi Dai, Jizheng Chen, Kounianhua Du, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang:

NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging. 1533-1549 - Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee:

Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD. 1550-1575 - Yuan Liu, Zhongyin Zhao, Le Tian, Haicheng Wang, Xubing Ye, Yangxiu You, Zilin Yu, Chuhan Wu, Zhou Xiao, Yang Yu, Jie Zhou:

POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion. 1576-1601 - Xuemei Tang, Xufeng Duan, Zhenguang G. Cai:

Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition. 1602-1617 - Nafiseh Nikeghbal, Amir Hossein Kargaran, Jana Diesner:

CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs. 1618-1639 - Huan Xu, Zequn Li, Wen Tang, Jian Jun Zhang:

From Schema to State: Zero-Shot Scheme-Only Dialogue State Tracking via Diverse Synthetic Dialogue and Step-by-Step Distillation. 1640-1652 - Zhi-Yuan Chen, Hao Wang, Xinyu Zhang, Enrui Hu, Yankai Lin:

Beyond the Surface: Measuring Self-Preference in LLM Judgments. 1653-1672 - Dong Shu, Xuansheng Wu, Haiyan Zhao, Mengnan Du, Ninghao Liu:

Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders. 1673-1682 - Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo, Shihao Liu, Daiting Shi, Dawei Yin, Xueqi Cheng:

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation. 1683-1702 - Ege Yigit Çelik, Selma Tekir:

CiteBART: Learning to Generate Citations for Local Citation Recommendation. 1703-1719 - Lan Zhang

, Marco Valentino, André Freitas
:
Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions. 1720-1738 - Caleb Ziems, William Barr Held, Jane Yu, Amir Goldberg, David Grusky, Diyi Yang:

Culture Cartography: Mapping the Landscape of Cultural Knowledge. 1739-1757 - Gregory Polyakov, Christian Hepting, Carsten Eickhoff, Seyed Ali Bahrainian:

Interpretability Analysis of Arithmetic In-Context Learning in Large Language Models. 1758-1777 - Yao Zhang

, Chenyang Lin, Shijie Tang, Haokun Chen, Shijie Zhou, Yunpu Ma, Volker Tresp:
SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence. 1778-1818 - Nikta Gohari Sadr, Sahar Heidariasl, Karine Megerdoomian, Laleh Seyyed-Kalantari, Ali Emami

:
We Politely Insist: Your LLM Must Learn the Persian Art of Taarof. 1819-1838 - Dustin Wright

, Zain Muhammad Mujahid, Lu Wang, Isabelle Augenstein
, David Jurgens:
Unstructured Evidence Attribution for Long Context Query Focused Summarization. 1839-1867 - Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam:

RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language. 1868-1894 - Mingyuan Wu, Jize Jiang, Haozhen Zheng, Meitang Li, Zhaoheng Li, Beitong Tian, Bo Chen, Yongjoo Park, Minjia Zhang, ChengXiang Zhai, Klara Nahrstedt:

Cache-of-Thought: Master-Apprentice Framework for Cost-Effective Vision Language Model Reasoning. 1895-1909 - Xuyang Liu, Yiyu Wang, Junpeng Ma, Linfeng Zhang:

Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models. 1910-1924 - Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Dong Yu:

Router-Tuning: A Simple and Effective Approach for Dynamic Depth. 1925-1938 - Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang:

Foot-In-The-Door: A Multi-turn Jailbreak for LLMs. 1939-1950 - Yuan Yuan, Muyu He, Muhammad Adil Shahid, Ziyang Li, Jiani Huang, Li Zhang:

TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games. 1951-1965 - Minghui Li, Hao Zhang, Yechao Zhang, Wei Wan, Shengshan Hu, Pei Xiaobing, Jing Wang:

Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling. 1966-1978 - Peifeng Wang, Austin Xu, Yilun Zhou, Caiming Xiong, Shafiq Joty:

Direct Judgement Preference Optimization. 1979-2009 - Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, Neil Zhenqiang Gong:

WebInject: Prompt Injection Attack to Web Agents. 2010-2030 - Tian Lan, Jiang Li, Yemin Wang, Xu Liu, Xiangdong Su, Guanglai Gao:

F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations. 2031-2046 - Taylor Sorensen, Pushkar Mishra, Roma Patel, Michael Henry Tessler, Michiel A. Bakker, Georgina Evans, Iason Gabriel, Noah D. Goodman, Verena Rieser:

Value Profiles for Encoding Human Variation. 2047-2095 - Lucius E. J. Bynum, Kyunghyun Cho:

Language Models as Causal Effect Generators. 2096-2115 - Joshua Rozner, Leonie Weissweiler, Kyle Mahowald, Cory Shain:

Constructions are Revealed in Word Distributions. 2116-2138 - Yilun Yang, Yekun Chai:

CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages. 2139-2169 - Jiyue Jiang, Yitao Xu, Zikang Wang, Yihan Ye, Yanruisheng Shao, Yuheng Shan, Jiuming Wang, Xiaodan Fan, Jiao Yuan, Yu Li:

RBPtool: A Deep Language Model Framework for Multi-Resolution RBP-RNA Binding Prediction and RNA Molecule Design. 2170-2185 - Yiran Yang, Haifeng Sun, Jingyu Wang, Qi Qi, Zirui Zhuang, Huazheng Wang, Pengfei Ren, Jing Wang, Jianxin Liao:

Unveiling Internal Reasoning Modes in LLMs: A Deep Dive into Latent Reasoning vs. Factual Shortcuts with Attribute Rate Ratio. 2186-2206 - Zirui He, Mingyu Jin, Bo Shen, Ali Payani, Yongfeng Zhang, Mengnan Du:

SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models. 2207-2236 - Joshua Rozner, Leonie Weissweiler, Cory Shain:

BabyLM's First Constructions: Causal interventions provide a signal of learning. 2237-2249 - Itay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby-Tavor:

Effective Red-Teaming of Policy-Adherent Agents. 2250-2268 - Zongxi Li, Yang Li, Haoran Xie, S. Joe Qin:

CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering. 2269-2288 - Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yu Su, Haofei Yu, Jiaxuan You:

SafeScientist: Enhancing AI Scientist Safety for Risk-Aware Scientific Discovery. 2289-2317 - Adrian Benton, Alexander Gutkin, Christo Kirov, Brian Roark:

Improving Informally Romanized Language Identification. 2318-2336 - Ivan Kobyzev, Abbas Ghaddar, Dingtao Hu, Boxing Chen:

Integral Transformer: Denoising Attention, Not Too Much Not Too Little. 2337-2354 - Yicheng Fu, Zhemin Huang, Liuxin Yang, Yumeng Lu, Zhongdongming Dai:

CHENGYU-BENCH: Benchmarking Large Language Models for Chinese Idiom Understanding and Use. 2355-2366 - Divyanshu Aggarwal, Ashutosh Sathe, Sunayana Sitaram:

Improving Cross Lingual Transfer by Pretraining with Active Forgetting. 2367-2378 - Shuo Xing, Peiran Li, Yuping Wang, Ruizheng Bai, Yueqi Wang, Chan-Wei Hu, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tu:

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization. 2379-2397 - Crystal Qian, Aaron T. Parisi, Clémentine Bouleau, Vivian Tsai, Maël Lebreton, Lucas Dixon:

To Mask or to Mirror: Human-AI Alignment in Collective Reasoning. 2398-2423 - Krishna C. Puvvada, Faisal Ladhak, Santiago Akle Serano, Cheng-Ping Hsieh, Shantanu Acharya, Somshubra Majumdar, Fei Jia, Samuel Kriman, Simeng Sun, Dima Rekesh, Boris Ginsburg:

SWAN: An Efficient and Scalable Approach for Long-Context Language Modeling. 2424-2438 - Melissa Roemmele, John Joon Young Chung, Taewook Kim, Yuqian Sun, Alex Calderwood, Max Kreminski:

LLMs Behind the Scenes: Enabling Narrative Scene Illustration. 2439-2457 - Le Zhang, Bo Wang, Xipeng Qiu, Siva Reddy, Aishwarya Agrawal:

REARANK: Reasoning Re-ranking Agent via Reinforcement Learning. 2458-2471 - Marcus Ma, Georgios Chochlakis, Niyantha Maruthu Pandiyan, Jesse Thomason, Shrikanth Narayanan:

Large Language Models Do Multi-Label Classification Differently. 2472-2495 - Lester James Validad Miranda, Elyanah Aco, Conner G. Manuel, Jan Christian Blaise Cruz, Joseph Marvin Imperial:

FilBench: Can LLMs Understand and Generate Filipino? 2496-2529 - ChengYan Wu, Bolei Ma, Yihong Liu, Zheyu Zhang, Ningyuan Deng, Yanshu Li, Baolan Chen, Yi Zhang, Yun Xue, Barbara Plank:

M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis. 2530-2557 - Alexandr Nesterov, Andrey Sakhovskiy, Ivan Sviridov, Airat Valiev, Vladimir Makharev, Petr Anokhin, Galina Zubkova, Elena Tutubalina:

RuCCoD: Towards Automated ICD Coding in Russian. 2558-2585 - Dayu Yang, Tianyang Liu, Daoan Zhang, Antoine Simoulin, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Xin Qian, Grey Yang, Jiebo Luo, Julian J. McAuley:

Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs. 2586-2616 - Pin-Jie Lin, Rishab Balasubramanian, Fengyuan Liu, Nikhil Kandpal, Tu Vu:

Efficient Model Development through Fine-tuning Transfer. 2617-2636 - Mingyang Wang, Lukas Lange, Heike Adel, Yunpu Ma, Jannik Strötgen, Hinrich Schütze:

Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes. 2637-2665 - Yuhan Liu, Michael JQ Zhang, Eunsol Choi:

User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal. 2666-2681 - Yu-Wen Chen, Melody Ma, Julia Hirschberg:

Read to Hear: A Zero-Shot Pronunciation Assessment Using Textual Descriptions and LLMs. 2682-2694 - Sanchit Sinha, Guangzhi Xiong, Aidong Zhang:

COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision-Language Models. 2695-2711 - Tong Bao, Mir Tafseer Nayeem, Davood Rafiei, Chengzhi Zhang:

SurveyGen: Quality-Aware Scientific Survey Generation with Large Language Models. 2712-2736 - Zhisheng Zheng, Puyuan Peng, Anuj Diwan, Cong Phuoc Huynh, Xiaohang Sun, Zhu Liu, Vimal Bhat, David Harwath:

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing. 2737-2756 - Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, Huan Liu:

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge. 2757-2791 - Iustin Sirbu, Robert-Adrian Popovici, Cornelia Caragea, Stefan Trausan-Matu, Traian Rebedea:

MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification. 2792-2808 - Prakamya Mishra, Jiang Liu, Jialian Wu, Xiaodong Yu, Zicheng Liu, Emad Barsoum:

TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games. 2809-2831 - Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, Jundong Li:

Learning from Diverse Reasoning Paths with Routing and Collaboration. 2832-2845 - Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu, Fenglin Liu, Junde Wu:

Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning. 2846-2857 - Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding:

MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models. 2858-2873 - Jonathan Ivey, Susan Gauch, David Jurgens

:
NUTMEG: Separating Signal From Noise in Annotator Disagreement. 2874-2887 - Abhilekh Borah, Chhavi Sharma, Danush Khanna, Utkarsh Bhatt, Gurpreet Singh, Hasnat Md Abdullah, Raghav Kaushik Ravi, Vinija Jain, Jyoti Patel, Shubham Singh, Vasu Sharma, Arpita Vats, Rahul Raja, Aman Chadha, Amitava Das:

Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations. 2888-2947 - Hayoung Jung, Shravika Mittal, Ananya Aatreya, Navreet Kaur, Munmun De Choudhury, Tanushree Mitra:

MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform. 2948-2982 - Rimon Melamed, Lucas H. McCabe, H. Howie Huang:

Demystifying optimized prompts in language models. 2983-2999 - Cihan Xiao, Matthew Wiesner, Debashish Chakraborty, Reno Kriz, Keith Cunningham, Kenton Murray, Kevin Duh, Luis Tavarez-Arce, Paul McNamee, Sanjeev Khudanpur:

Whisper-UT: A Unified Translation Framework for Speech and Text. 3000-3016 - Yubo Wang, Ping Nie, Kai Zou, Lijun Wu, Wenhu Chen:

Unleashing the Reasoning Potential of LLMs by Critique Fine-Tuning on One Problem. 3017-3027 - Hongxiang Zhang, Hao Chen, Muhao Chen, Tianyi Zhang:

Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation. 3028-3046 - Tianhao Zhang, Zhecheng Sheng, Zhexiao Lin, Chen Jiang, Dongyeop Kang:

BBScoreV2: Learning Time-Evolution and Latent Alignment from Stochastic Representation. 3047-3061 - Yu Xia, Yiran Shen, Junda Wu, Tong Yu, Sungchul Kim, Ryan A. Rossi, Lina Yao, Julian J. McAuley:

SAND: Boosting LLM Agents with Self-Taught Action Deliberation. 3062-3077 - Lingyao Li, Dawei Li, Zhenhui Ou, Xiaoran Xu, Jingxiao Liu, Zihui Ma, Runlong Yu, Min Deng:

LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment. 3078-3096 - Hua Shen, Nicholas Clark, Tanu Mitra:

Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values? 3097-3118 - Jiazheng Li, Yuxiang Zhou, Junru Lu, Gladys Tyen, Lin Gui, Cesare Aloisi, Yulan He:

Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time. 3119-3140 - Sania Waheed, Na Min An:

Image Embedding Sampling Method for Diverse Captioning. 3141-3157 - Huihan Li, You Chen, Siyuan Wang, Yixin He, Ninareh Mehrabi, Rahul Gupta, Xiang Ren:

Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time. 3158-3180 - Jiarui Yao, Ruida Wang, Tong Zhang:

FANS: Formal Answer Selection for LLM Natural Language Math Reasoning Using Lean4. 3181-3200 - Gagan Bhatia, Maxime Peyrard, Wei Zhao:

Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning. 3201-3219 - Jianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen:

Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark. 3220-3248 - Boyu Guan, Chuang Han, Yining Zhang, Yupu Liang, Zhiyang Zhang, Yang Zhao, Chengqing Zong:

SHIFT: Selected Helpful Informative Frame for Video-guided Machine Translation. 3249-3267 - Bohan Lyu, Siqiao Huang, Zichen Liang, Qian Sun, Jiaming Zhang:

Surge: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors. 3268-3308 - Carlos Mullov, Alexander Waibel:

Few-Shot Learning Translation from New Languages. 3309-3330 - Yunze Xiao, Lynnette Hui Xian Ng, Jiarui Liu, Mona T. Diab:

Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design. 3331-3350 - Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li, Wenjie Li:

TokenSkip: Controllable Chain-of-Thought Compression in LLMs. 3351-3363 - Tu Anh Dinh

, Jan Niehues:
Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability. 3364-3382 - Zhaofeng Wu, Michihiro Yasunaga, Andrew Cohen, Yoon Kim, Asli Celikyilmaz, Marjan Ghazvininejad:

reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs. 3383-3409 - Ting-Yun Chang, Muru Zhang, Jesse Thomason, Robin Jia

:
Why Do Some Inputs Break Low-Bit LLM Quantization? 3410-3429 - Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci:

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation. 3430-3442 - Hao Nan Sheng

, Zhi-Yong Wang
, Hing Cheung So, Mingrui Yang:
AROMA: Autonomous Rank-one Matrix Adaptation. 3443-3459 - Ziyang Ma, Qingyue Yuan, Zhenglin Wang, Deyu Zhou:

Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens. 3460-3477 - Qibin Li, Zhen Xu, Shengyuan Bai, Nianmin Yao, Kaili Sun, Bowen Wu, Ying Li, Baoxun Wang:

Anchoring-Guidance Fine-Tuning (AnGFT): Elevating Professional Response Quality in Role-Playing Conversational Agents. 3478-3496 - Yuhang He, Yash Jain, Xubo Liu, Andrew Markham, Vibhav Vineet:

RiTTA: Modeling Event Relations in Text-to-Audio Generation. 3497-3511 - Xiaofeng Zhang, Yihao Quan, Chen Shen, Chaochen Gu, Xiaosong Yuan, Shaotian Yan, Jiawei Cao, Hao Cheng, Kaijie Wu, Jieping Ye:

Shallow Focus, Deep Fixes: Enhancing Shallow Layers Vision Attention Sinks to Alleviate Hallucination in LVLMs. 3512-3534 - Peerat Limkonchotiwat, Pume Tuchinda, Lalita Lowphansirikul, Surapon Nonesung, Panuthep Tasawong, Alham Fikri Aji, Can Udomcharoenchaikit, Sarana Nutanong:

WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai. 3535-3558 - Zhengyi Zhao, Shubo Zhang, Yuxi Zhang, Yanxi Zhao, Yifan Zhang, Zezhong Wang, Huimin Wang, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu:

MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models. 3559-3582 - Dongning Rao, Rongchu Zhou, Peng Chen, Zhihua Jiang:

A Comprehensive Literary Chinese Reading Comprehension Dataset with an Evidence Curation Based Solution. 3583-3603 - Jie Shi, Xi Cao, Bo Xu, Jiaqing Liang, Yanghua Xiao, Jia Chen, Peng Wang, Wei Wang:

Dialect-SQL: An Adaptive Framework for Bridging the Dialect Gap in Text-to-SQL. 3604-3619 - Yixuan Tang

, Yi Yang:
FinMTEB: Finance Massive Text Embedding Benchmark. 3620-3638 - Anuj Diwan, Zhisheng Zheng, David Harwath, Eunsol Choi:

Scaling Rich Style-Prompted Text-to-Speech Datasets. 3639-3659 - Mahammed Kamruzzaman, Gene Louis Kim:

Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs. 3660-3678 - Jianxing Yu, Zihao Gou, Chen Li, Zhisheng Wang

, Peiji Yang, Wenqing Chen, Jian Yin:
Eliciting Implicit Acoustic Styles from Open-domain Instructions to Facilitate Fine-grained Controllable Generation of Speech. 3679-3695 - Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu:

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models. 3696-3715 - Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li:

AdaptThink: Reasoning Models Can Learn When to Think. 3716-3730 - Zhengyi Zhao, Shubo Zhang, Zezhong Wang, Huimin Wang, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu:

T2: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering. 3731-3756 - Yang Wu, Ruijia Wang, Jie Wu:

Non-Existent Relationship: Fact-Aware Multi-Level Machine-Generated Text Detection. 3757-3768 - Ziwei Ji, Lei Yu, Yeskendir Koishekenov, Yejin Bang, Anthony Hartshorn, Alan Schelten, Cheng Zhang, Pascale Fung, Nicola Cancedda:

Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations. 3769-3793 - Huanghai Liu, Quzhe Huang, Qingjing Chen

, Yiran Hu, Jiayu Ma, Yun Liu, Weixing Shen, Yansong Feng:
JUREX-4E: Juridical Expert-Annotated Four-Element Knowledge Base for Legal Reasoning. 3794-3814 - Vinay Samuel, Harshita Diddee, Yiming Zhang, Daphne Ippolito:

CIE: Controlling Language Model Text Generations Using Continuous Signals. 3815-3825 - Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Bin Ji, Ma Jun, Xiaodong Liu, Jing Wang, Jianfeng Zhang, Jie Yu, Feilong Bao, Wangbaosheng:

Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience. 3826-3843 - Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang:

Language-to-Space Programming for Training-Free 3D Visual Grounding. 3844-3864 - Wanlong Liu, Junying Chen, Ke Ji, Li Zhou, Wenyu Chen, Benyou Wang:

RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions. 3865-3888 - Yilong Lai, Jialong Wu, Zhenglin Wang, Deyu Zhou:

AdaRewriter: Unleashing the Power of Prompting-based Conversational Query Reformulation via Test-Time Adaptation. 3889-3905 - Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen, Hongsheng Li, Fangyuan Li:

SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant? 3906-3931 - Tan Yue, Rui Mao, Zilong Song, Zonghai Hu, Dongyan Zhao:

F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task. 3932-3948 - Qiyuan Chen, Hongsen Huang, Qian Shao, Jiahe Chen, Jintai Chen, Hongxia Xu, Renjie Hua, Ren Chuan, Jian Wu:

Icon2: Aligning Large Language Models Using Self-Synthetic Preference Data via Inherent Regulation. 3949-3968 - Ming Dong, Jinkui Zhang, Bolong Zheng, Xinhui Tu, Po Hu, Tingting He:

DSCD: Large Language Model Detoxification with Self-Constrained Decoding. 3969-3984 - Jue Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang:

From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models. 3985-4002 - Songbo Hu, Ivan Vulic, Anna Korhonen:

Quantifying Language Disparities in Multilingual Large Language Models. 4003-4018 - Jihyung Lee, Daehui Kim, Seonjeong Hwang, Hyounghun Kim, Gary Lee:

KoBLEX: Open Legal Question Answering with Multi-hop Reasoning. 4019-4053 - Bichen Wang, Yuzhe Zi, Yixin Sun, Hao Yang, Yanyan Zhao, Bing Qin:

End-to-End Learnable Psychiatric Scale Guided Risky Post Screening for Depression Detection on Social Media. 4054-4066 - Xinjie Zhao, Fan Gao, Xingyu Song, Yingjian Chen, Rui Yang, Yanran Fu, Yuyang Wang, Yusuke Iwasawa, Yutaka Matsuo, Irene Li:

ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA. 4067-4089 - Peter A. Jansen, Samiah Hassan, Ruoyao Wang:

Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science. 4090-4102 - Jiale Kang, Ziyin Yue, Qingyu Yin, Rui Jiang, Weile Li, Zening Lu, Zhouran Ji:

ModRWKV: Transformer Multimodality in Linear Time. 4103-4115 - Jiaao Yu, Yijing Lin, Zhipeng Gao, Xuesong Qiu, Lanlan Rui:

Multimedia Event Extraction with LLM Knowledge Editing. 4116-4124 - Shuo Wang

, Renhao Li, Xi Chen, Yulin Yuan, Min Yang, Derek F. Wong:
Exploring the Impact of Personality Traits on LLM Bias and Toxicity. 4125-4143 - Chenyuan He, Yuxiang Jia, Fei Gao, Senbin Zhu, Hongde Liu, Hongying Zan, Min Peng:

Task-aware Contrastive Mixture of Experts for Quadruple Extraction in Conversations with Code-like Replies and Non-opinion Detection. 4144-4159 - Dianqing Liu, Yi Liu, Guoqing Jin, Zhendong Mao:

Mitigating Biases in Language Models via Bias Unlearning. 4160-4178 - Jing Xiong, Jianghan Shen, Fanghua Ye, Chaofan Tao, Zhongwei Wan, Jianqiao Lu, Xun Wu, Chuanyang Zheng, Zhijiang Guo

, Min Yang, Lingpeng Kong, Ngai Wong:
UNComp: Can Matrix Entropy Uncover Sparsity? - A Compressor Design from an Uncertainty-Aware Perspective. 4179-4199 - Haiquan Qiu, You Wu, Dong Li, Jianmin Guo, Quanming Yao:

Superpose Task-specific Features for Model Merging. 4200-4214 - Suifeng Zhao, Zhuoran Jin, Sujian Li, Jun Gao

:
FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain. 4215-4249 - Qinzhuo Wu, Pengzhi Gao, Wei Liu, Jian Luan:

BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism. 4250-4272 - Siyue Zhang, Yilun Zhao, Liyuan Geng, Arman Cohan, Anh Tuan Luu, Chen Zhao:

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective. 4273-4303 - Heng Wang, Yotaro Shimose, Shingo Takamatsu:

BannerAgency: Advertising Banner Design with Multimodal LLM Agents. 4304-4329 - Weijie Shi, Jipeng Zhang, Yaguang Wu, Jingzhi Fang, Shibo Zhang, Yao Zhao, Hao Chen

, Ruiyuan Zhang, Yue Cui, Jia Zhu, Sirui Han, Jiajie Xu, Xiaofang Zhou:
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training. 4330-4350 - Chang Su, Dengliang Shi, Siyuan Huang, Jintao Du, Changhua Meng, Yu Cheng, Weiqiang Wang, Zhouhan Lin:

Training LLMs to be Better Text Embedders through Bidirectional Reconstruction. 4351-4369 - Shaomu Tan, Christof Monz:

ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling. 4370-4387 - Zhiyuan Peng, Xin Yin, Rui Qian, Peiqin Lin, Yongkang Liu, Hao Zhang, Chenhao Ying, Yuan Luo:

SolEval: Benchmarking Large Language Models for Repository-level Solidity Smart Contract Generation. 4388-4411 - Nathan Roll, Calbert Graham, Yuka Tatsumi, Kim Tien Nguyen, Meghan Sumner, Dan Jurafsky:

In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties. 4412-4426 - Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu:

Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills. 4427-4443 - Yijun Shen, Delong Chen, Fan Liu, Xingyu Wang, Chuanyi Zhang, Liang Yao, Yuhui Zheng:

Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions. 4444-4464 - Hao Sun, Zile Qiao, Bo Wang, Guoxin Chen, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang:

DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling. 4465-4478 - Jianwei Wang, Chengming Shi, Junyao Yang, Haoran Li, Qianli Ma, Huiping Zhuang, Cen Chen, Ziqian Zeng:

RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis. 4479-4500 - Haorui Wang, Zheng Wang, Yuxuan Zhang, Bo Wang

, Bin Wu:
Synergizing Multimodal Temporal Knowledge Graphs and Large Language Models for Social Relation Recognition. 4501-4520 - Chaeeun Kim, Jinu Lee, Wonseok Hwang:

LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation. 4521-4554 - Jingxuan Wei, Nan Xu, Junnan Zhu, Haoyanni, Gaowei Wu, Qi Chen, Bihui Yu, Lei Wang:

ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering. 4555-4569 - Di Zhao

, Longhui Ma, Siwei Wang, Miao Wang, Zhao Lv:
COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI Automation. 4570-4593 - Jiguo Liu, Chao Liu, Meimei Li, Nan Li, Shihao Gao, Dali Zhu:

DASA-Trans-STM: Adaptive Efficient Transformer for Short Text Matching using Data Augmentation and Semantic Awareness. 4594-4610 - Avinash Madasu, Vasudev Lal, Phillip Howard:

Pruning the Paradox: How CLIP's Most Informative Heads Enhance Performance While Amplifying Bias. 4611-4626 - Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Mingsong Yan, Zi Yang, Paul D. Hovland, Bogdan Nicolae, Franck Cappello, Sui Tang, Zheng Zhang:

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation. 4627-4645 - Ziwen Chen, Xiaoyuan Zhang, Ming Zhu:

TS-CLIP: Time Series Understanding by CLIP. 4646-4664 - Yangyang Xu

, Jinpeng Hu, Zhuoer Zhao, Zhangling Duan, Xiao Sun, Xun Yang:
MultiAgentESC: A LLM-based Multi-Agent Collaboration Framework for Emotional Support Conversation. 4665-4681 - Yilin Wang

, Heng Wang, Yuyang Bai, Minnan Luo:
Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy Models. 4682-4698 - Yun-Shiuan Chuang, Sameer Narendran, Nikunj Harlalka, Alexander Cheung, Sizhe Gao, Siddharth Suresh, Junjie Hu, Timothy T. Rogers:

Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding. 4699-4713 - Jun-Yu Ma, Tianqing Fang, Zhisong Zhang, Hongming Zhang, Haitao Mi, Dong Yu:

Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation. 4714-4720 - Zhongyi Ye, Weitai Zhang, Xinyuan Zhou, Yongxin Zhu, Ninghui Rao, Enhong Chen:

Scalable Data Synthesis through Human-like Cognitive Imitation and Data Recombination. 4721-4735 - Jianan Wang, Bin Li, Jingtao Qi, Xueying Wang, Fu Li, Lihanxun Li

:
BeSimulator: A Large Language Model Powered Text-based Behavior Simulator. 4736-4754 - Hexiang Tan, Fei Sun, Sha Liu, Du Su, Qi Cao, Xin Chen, Jingang Wang, Xunliang Cai, Yuanzhuo Wang, Huawei Shen, Xueqi Cheng:

Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs. 4755-4765 - Zhanming Shen, Tianqi Xu, Hao Wang, Jian Li, Miao Pan:

pFedGPT: Hierarchically Optimizing LoRA Aggregation Weights for Personalized Federated GPT Models. 4766-4778 - Juntao Zhao, Wenhao Lu, Sheng Wang, Lingpeng Kong, Chuan Wu:

QSpec: Speculative Decoding with Complementary Quantization Schemes. 4779-4795 - Zetong Li, Qinliang Su, Minhua Huang, Yin Yang:

Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering. 4796-4808 - Yidan Zhang, Yu Wan, Boyi Deng, Baosong Yang, Haoran Wei, Fei Huang, Bowen Yu, Dayiheng Liu, Junyang Lin, Fei Huang, Jingren Zhou:

P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs. 4809-4836 - Yutao Zhu, Jiajie Jin, Hongjin Qian, Zheng Liu, Zhicheng Dou, Ji-Rong Wen:

Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization. 4837-4856 - Zezhong Jin, Shubhang Desai, Xu Chen, Biyi Fang, Zhuoyi Huang, Zhe Li, Chong-Xin Gan, Xiao Tu, Man-Wai Mak, Yan Lu, Shujie Liu:

TrInk: Ink Generation with Transformer Network. 4857-4864 - Xiaoyi Bao, Zhongqing Wang, Jinghang Gu, Chu-Ren Huang:

CalligraphicOCR for Chinese Calligraphy Recognition. 4865-4877 - Cheng Wang, Gelei Deng, Xianglin Yang, Han Qiu, Tianwei Zhang:

When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models. 4878-4888 - Pingyi Hu, Xiaofan Bai, Xiaojing Ma, Chaoxiang He, Dongmei Zhang, Bin Benjamin Zhu:

RESF: Regularized-Entropy-Sensitive Fingerprinting for Black-Box Tamper Detection of Large Language Models. 4889-4903 - Zhaomin Wu, Jizhou Guo

, Junyi Hou, Bingsheng He, Lixin Fan, Qiang Yang:
Model-based Large Language Model Customization as Service. 4904-4921 - Haochen Sun

, Shuwen Zhang, Lujie Niu, Lei Ren
, Hao Xu, Hao Fu, Fangkun Zhao, Caixia Yuan, Xiaojie Wang:
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents. 4922-4951 - Yao Chen, Jiawei Sheng, Wenyuan Zhang

, Tingwen Liu:
Improving Reasoning Capabilities in Small Models through Mixture-of-layers Distillation with Stepwise Attention on Key Information. 4952-4971 - Renjie Luo, Jiaxi Li, Chen Huang, Wei Lu:

Through the Valley: Path to Effective Long CoT Training for Small Language Models. 4972-4992 - Jiahui Li, Lin Li, Tai-Wei Chang, Kun Kuang, Long Chen, Jun Zhou, Cheng Yang:

RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution. 4993-5022 - Peng Ding, Wen Sun, Dailin Li, Wei Zou, Jiaming Wang, Jiajun Chen, Shujian Huang:

SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models. 5023-5037 - Zizhen Li, Chuanhao Li, Yibin Wang, Qi Chen, Diping Song, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Kaipeng Zhang:

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles. 5038-5076 - Zekun Moore Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jessie Jiashuo Wang, Ning Shi, Siyu Li, Yizhi Li, Haoran Que, Zhaoxiang Zhang, Yuanxing Zhang, Ge Zhang, Ke Xu, Jie Fu, Wenhao Huang:

MIO: A Foundation Model on Multimodal Tokens. 5077-5099 - Nan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, Shaobing Lian:

DART: Distilling Autoregressive Reasoning to Silent Thought. 5100-5108 - Qi Zhang, Shouqing Yang, Lirong Gao, Hao Chen, Xiaomeng Hu, Jinglei Chen, Jiexiang Wang, Sheng Guo, Bo Zheng, Haobo Wang, Junbo Zhao:

LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization. 5109-5122 - Zhanming Shen, Hao Chen, Yulei Tang, Shaolin Zhu, Wentao Ye, Xiaomeng Hu, Haobo Wang, Gang Chen, Junbo Zhao:

CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency. 5123-5137 - Grace LeFevre, Qingcheng Zeng, Adam Leif, Jason Jewell, Denis Peskoff, Rob Voigt:

Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where? 5138-5150 - Zhihan Guo, Jiele Wu, Wenqian Cui, Yifei Zhang, Minda Hu, Yufei Wang, Irwin King:

From General Reward to Targeted Reward: Improving Open-ended Long-context Generation Models. 5151-5166 - Xinyue Lou, You Li, Jinan Xu, Xiangyu Shi, Chi Chen, Kaiyu Huang:

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model. 5167-5186 - Bajian Xiang, Shuaijiang Zhao, Tingwei Guo, Wei Zou:

Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models. 5187-5202 - Yifan Liu, Wenkuan Zhao, Shanshan Zhong, Jinghui Qin, Mingfu Liang, Zhongzhan Huang, Wushao Wen:

AssoCiAm: A Benchmark for Evaluating Association Thinking while Circumventing Ambiguity. 5203-5219 - Zexuan Li, Hongliang Dai, Piji Li:

M-BRe: Discovering Training Samples for Relation Extraction from Unlabeled Texts with Large Language Models. 5220-5238 - Sangyeon Yoon, Wonje Jeung, Albert No:

R-TOFU: Unlearning in Large Reasoning Models. 5239-5258 - Zequn Xie, Chuxin Wang, Yeqiang Wang, Sihang Cai, Shulei Wang, Tao Jin:

Chat-Driven Text Generation and Interaction for Person Retrieval. 5259-5270 - Yuxuan Li, Hirokazu Shirado:

Spontaneous Giving and Calculated Greed in Language Models. 5271-5286 - Lei Jiang, Desheng Wu, Xiaolong Zheng:

SenDetEX: Sentence-Level AI-Generated Text Detection for Human-AI Hybrid Content via Style and Context Fusion. 5287-5302 - Mo Zhiqiang, Yang Hua, Jiahui Li, Yuan Liu, Shawn Wong, Jianmin Huang:

Judge and Improve: Towards a Better Reasoning of Knowledge Graphs with Large Language Models. 5303-5320 - Zhuo Li, Yuhao Du, Xiaoqi Jiao, Steven Y. Guo, Yuege Feng, Xiang Wan, Anningzhe Gao, Jinpeng Hu:

Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm. 5321-5340 - Jiajun Zhou, Yifan Yang, Kai Zhen, Ziyue Liu, Yequan Zhao, Ershad Banijamali, Athanasios Mouchtaris, Ngai Wong, Zheng Zhang:

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models. 5341-5359 - Yingfa Chen, Yutong Wu, Chenyang Song, Zhen Leng Thai, Xingyu Shen, Xu Han, Zhiyuan Liu, Maosong Sun:

Cost-Optimal Grouped-Query Attention for Long-Context Modeling. 5360-5376 - Zhongyi Zhou, Yichen Zhu, Minjie Zhu, Junjie Wen, Ning Liu, Zhiyuan Xu, Weibin Meng, Yaxin Peng, Chaomin Shen, Feifei Feng, Yi Xu:

ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model. 5377-5395 - Ziyi Guan, Jason Chun Lok Li, Zhijian Hou, Pingping Zhang, Donglai Xu, Yuzhi Zhao, Mengyang Wu, Jinpeng Chen, Thanh-Toan Nguyen, Pengfei Xian, Wenao Ma, Shengchao Qin, Graziano Chesi, Ngai Wong:

KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation. 5396-5405 - Jihai Zhang, Xiaoye Qu, Tong Zhu, Yu Cheng:

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling. 5406-5419 - Xiaoxi Li

, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou:
Search-o1: Agentic Search-Enhanced Large Reasoning Models. 5420-5438 - Shenghan Wu, Yimo Zhu, Wynne Hsu, Mong-Li Lee, Yang Deng

:
From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations. 5439-5453 - Shuodi Liu, Yingzhuo Liu, Zi Wang, Yusheng Wang, Huijia Wu, Liuyu Xiang, Zhaofeng He:

Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language Models. 5454-5477 - Junchen Ding, Jiahao Zhang, Yi Liu, Ziqi Ding, Gelei Deng, Yuekang Li:

TombRaider: Entering the Vault of History to Jailbreak Large Language Models. 5478-5493 - Danny Wang, Ruihong Qiu, Guangdong Bai, Zi Huang:

Text Meets Topology: Rethinking Out-of-distribution Detection in Text-Rich Networks. 5494-5523 - Zhuo Li, Yuege Feng, Dandan Guo, Jinpeng Hu, Anningzhe Gao, Xiang Wan:

APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport. 5524-5538 - Feng Xiong, Hongling Xu, Yifei Wang, Runxi Cheng, Yong Wang, Xiangxiang Chu:

HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation. 5539-5555 - Wonje Jeung, Sangyeon Yoon, Albert No:

SEPS: A Separability Measure for Robust Unlearning in LLMs. 5556-5587 - Zehong Yan, Peng Qi, Wynne Hsu, Mong-Li Lee:

TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection. 5588-5604 - Justin Xu, Yiming Li, Zizheng Zhang, Augustine Yui Hei Luk, Mayank Jobanputra, Samarth Oza, Ashley Murray, Meghana Reddy Kasula, Andrew Parker, David W. Eyre:

Tree-of-Quote Prompting Improves Factuality and Attribution in Multi-Hop and Medical Reasoning. 5605-5622 - Yichuan Ma, Yunfan Shao, Peiji Li, Demin Song, Qipeng Guo, Linyang Li, Xipeng Qiu, Kai Chen:

UnitCoder: Scalable Code Synthesis from Pre-training Corpora. 5623-5641 - Jixiao Zhang, Chunsheng Zuo

:
GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models. 5642-5654 - Peichao Lai, Jiaxin Gan, Feiyang Ye, Wentao Zhang, Fangcheng Fu, Yilei Wang, Bin Cui:

Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations. 5655-5674 - Congchi Yin, Qian Yu, Zhiwei Fang, Changping Peng, Piji Li:

Rethinking Cross-Subject Data Splitting for Brain-to-Text Decoding. 5675-5689 - Dongjun Jang, Youngchae Ahn, Hyopil Shin:

RCScore: Quantifying Response Consistency in Large Language Models. 5690-5708 - Hui Li, Ante Wang, Kunquan Li, Zhihao Wang, Liang Zhang, Delai Qiu, Qingsong Liu, Jinsong Su:

A Multi-Agent Framework with Automated Decision Rule Optimization for Cross-Domain Misinformation Detection. 5709-5725 - Shuting Wang, Jiejun Tan

, Zhicheng Dou, Ji-Rong Wen:
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain. 5726-5751 - Xiaopeng Ke, Hexuan Deng, Xuebo Liu, Jun Rao, Zhenxi Song, Jun Yu, Min Zhang:

AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs. 5752-5785 - Junxi Wu, Jinpeng Wang, Zheng Liu, Bin Chen, Dongjian Hu, Hao Wu, Shu-Tao Xia:

MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds. 5786-5805 - Lin Lu, Zhigang Zuo, Ziji Sheng, Pan Zhou:

Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging. 5806-5825 - Xi Chen, Shuo Wang

:
Pragmatic Inference Chain (PIC) Improving LLMs' Reasoning of Authentic Implicit Toxic Language. 5826-5841 - Wang Cai, Hsiu-Yuan Huang, Zhixiang Wang, Yunfang Wu:

Beyond Demonstrations: Dynamic Vector Construction from Latent Representations. 5842-5857 - Ying Zhao, Yuanzhao Guo, Xuemeng Weng, Yuan Tian, Wei Wang, Yi Chang:

Detoxifying Large Language Models via the Diversity of Toxic Samples. 5858-5871 - Yanxu Ji

, Jinzhong Ning, Yi-Jia Zhang, Zhi Liu, Hongfei Lin:
LLM-Driven Implicit Target Augmentation and Fine-Grained Contextual Modeling for Zero-Shot and Few-Shot Stance Detection. 5872-5884 - Mengze Hong

, Wailing Ng, Chen Jason Zhang, Yuanfeng Song, Di Jiang:
Dial-In LLM: Human-Aligned LLM-in-the-loop Intent Clustering for Customer Service Dialogues. 5885-5900 - Xiangchi Yuan, Chunhui Zhang, Zheyuan Liu, Dachuan Shi, Leyan Pan, Soroush Vosoughi, Wenke Lee:

Superficial Self-Improved Reasoners Benefit from Model Merging. 5901-5921 - Wenqiao Zhu, Ji Liu, Rongjunchen Zhang, Haipang Wu, Yulun Zhang:

CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning. 5922-5937 - Mengze Hong

, Wailing Ng, Chen Jason Zhang, Di Jiang:
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation. 5938-5953 - Naen Xu, Jinghuai Zhang, Changjiang Li, Zhi Chen, Chunyi Zhou, Qingming Li, Tianyu Du, Shouling Ji:

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models. 5954-5983 - Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Wenjun Wu, Bo Li, Shaowei Wang, Basura Fernando, Jun Liu:

Diagram-Driven Course Questions Generation. 5984-5999 - Yuanyuan He, Yongsen Pan, Wei Li, Jiali You, Jiawen Deng, Fuji Ren:

ECC: An Emotion-Cause Conversation Dataset for Empathy Response. 6000-6017 - Zijian Wang, Chang Xu:

ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations. 6018-6039 - Jinwang Song, Hongying Zan, Kunli Zhang, Lingling Mu, Yingjie Han, Haobo Hua, Min Peng:

JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema Sampling. 6040-6053 - Zhibo Man, Yuanmeng Chen, Yujie Zhang, Jinan Xu:

DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation. 6054-6071 - David Wadden, Kejian Shi, Jacob Morrison, Alan Li, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan:

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature. 6072-6109 - Xinkui Lin, Yuhui Zhang, Yongxiu Xu, Kun Huang, Hongzhang Mu, Yubin Wang, Gaopeng Gou, Li Qian, Li Peng, Wei Liu, Jian Luan, Hongbo Xu:

MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition. 6110-6130 - Bingrui Sima, Linhua Cong, Wenxuan Wang, Kun He:

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models. 6131-6144 - Kohei Tsuji, Tatsuya Hiraoka, Yuchang Cheng, Eiji Aramaki, Tomoya Iwakura:

Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors. 6145-6163 - Shuo Yan, Ruochen Li, Ziming Luo

, Zimu Wang, Daoyang Li, Liqiang Jing, Kaiyu He, Peilin Wu, Juntong Ni, George Michalopoulos, Yue Zhang, Ziyang Zhang, Mian Zhang, Zhiyu Chen, Xinya Du:
LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research. 6164-6186 - Jinlin Wang, Yulong Ji, Hongyu Yang:

RAV: Retrieval-Augmented Voting for Tactile Descriptions Without Training. 6187-6194 - Takashi Wada, Yuki Hirakawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito:

Static Word Embeddings for Sentence Semantic Representation. 6195-6211 - Jingjin Wang, Jiawei Han:

PropRAG: Guiding Retrieval with Beam Search over Proposition Paths. 6212-6227 - Jun Yan, Wenjie Jacky Mo, Xiang Ren, Robin Jia

:
Rethinking Backdoor Detection Evaluation for Language Models. 6228-6239 - Pingzhi Li, Prateek Yadav, Jaehong Yoon, Jie Peng, Yi-Lin Sung, Mohit Bansal, Tianlong Chen:

Glider: Global and Local Instruction-Driven Expert Router. 6240-6301 - Zhengdong Yang, Zhen Wan, Sheng Li, Chao-Han Huck Yang, Chenhui Chu:

CoVoGER: A Multilingual Multitask Benchmark for Speech-to-text Generative Error Correction with Large Language Models. 6302-6314 - Jinman Zhao, Xueyan Zhang, Jiaru Li, Jingcheng Niu, Yulan Hu, Erxue Min, Gerald Penn:

Tiny Budgets, Big Gains: Parameter Placement Strategy in Parameter Super-Efficient Fine-Tuning. 6315-6333 - Junkai Liu, Yujie Tong, Hui Huang, Bowen Zheng, Yiran Hu, Peicheng Wu, Chuan Xiao, Makoto Onizuka, Muyun Yang, Shuyuan Zheng:

Legal Fact Prediction: The Missing Piece in Legal Judgment Prediction. 6334-6349 - Xu Zhang, Xunjian Yin, Dinghao Jing, Huixuan Zhang, Xinyu Hu, Xiaojun Wan:

DAMON: A Dialogue-Aware MCTS Framework for Jailbreaking Large Language Models. 6350-6366 - Qihan Wang, Shidong Pan, Tal Linzen, Emily Black:

Multilingual Prompting for Improving LLM Generation Diversity. 6367-6389 - Genglin Liu, Vivian T. Le, Salman Rahman, Elisa Kreiss, Marzyeh Ghassemi, Saadia Gabriel:

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations. 6390-6417 - Wenzhi Wang, Paul Reisert, Shoichi Naito, Naoya Inoue, Machi Shimmei, Surawat Pothong, Jungmin Choi, Kentaro Inui:

Identification of Multiple Logical Interpretations in Counter-Arguments. 6418-6433 - Peng Wang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu:

LyapLock: Bounded Knowledge Preservation in Sequential Large Language Model Editing. 6434-6459 - Mengyu Bu, Shaolei Zhang, Zhongjun He, Hua Wu, Yang Feng:

AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment. 6460-6489 - Gangwei Jiang

, Yahui Liu, Zhaoyi Li, Wei Bi, Fuzheng Zhang, Linqi Song
, Ying Wei, Defu Lian:
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning. 6490-6514 - Yiding Wang, Fanxu Meng, Xuefeng Zhang, Fan Jiang, Pingzhi Tang, Muhan Zhang:

HD-PiSSA: High-Rank Distributed Orthogonal Adaptation. 6515-6528 - Runyu Peng, Yunhua Zhou, Kai Lv, Yang Gao, Qipeng Guo, Xipeng Qiu:

Firewall Routing: Blocking Leads to Better Hybrid Inference for LLMs. 6529-6554 - Chengyu Jiao, Shuhao Chen, Yu Zhang:

SPE Attention: Making Attention Equivariant to Semantic-Preserving Permutation for Code Processing. 6555-6568 - Yudong Yang, Jimin Zhuang, Guangzhi Sun, Changli Tang, Yixuan Li, Peihan Li, Yifan Jiang, Wei Li, Zejun Ma, Chao Zhang:

Audio-centric Video Understanding Benchmark without Text Shortcut. 6569-6587 - Songshuo Lu, Hua Wang, Yutian Rong, Zhi Chen, Yaohua Tang:

TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text. 6588-6601 - Haozhan Shen, Kangjia Zhao, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, Mingwei Zhu, Jianwei Yin:

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration. 6602-6618 - Enci Zhang, Xingang Yan, Wei Lin, Tianxiang Zhang, Qianchun Lu:

Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation. 6619-6633 - Keer Lu, Keshi Zhao, Zhuoran Zhang, Zheng Liang, Bin Cui, Tengjiao Wang, Wentao Zhang:

VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs. 6634-6658 - Hengxing Cai, Jinhan Dong, Jingjun Tan, Jingcheng Deng, Sihang Li, Zhifeng Gao, Haidong Wang, Zicheng Su, Agachai Sumalee, Renxin Zhong:

FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models. 6659-6676 - Haoran Chen, Junyan Lin, Xinghao Chen, Yue Fan, Jianfeng Dong, Xin Jin, Hui Su, Jinlan Fu, Xiaoyu Shen:

Multimodal Language Models See Better When They Look Shallower. 6677-6695 - Xujia Wang, Yunjia Qi, Bin Xu:

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization. 6696-6715 - Tianle Gu, Zongqi Wang, Kexin Huang, Yuanqi Yao, Xiangliang Zhang, Yujiu Yang, Xiuying Chen:

Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking. 6716-6733 - Bufan Gao, Elisa Kreiss:

Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases. 6734-6750 - Jikai Wang, Zhenxu Tian, Juntao Li, Qingrong Xia, Xinyu Duan, Zhe-Feng Wang, Baoxing Huai, Min Zhang:

Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification. 6751-6763 - Haoqin Tu, Weitao Feng, Hardy Chen, Hui Liu, Xianfeng Tang, Cihang Xie:

ViLBench: A Suite for Vision-Language Process Reward Modeling. 6764-6779 - Hwan Chang, Yumin Kim, Yonghyun Jun, Hwanhee Lee:

Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering. 6780-6800 - Wei Shi, Sihang Li, Tao Liang, Mingyang Wan, Guojun Ma, Xiang Wang, Xiangnan He:

Route Sparse Autoencoder to Interpret Large Language Models. 6801-6815 - Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Nicolaus Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis:

BTS: Harmonizing Specialized Experts into a Generalist LLM. 6816-6834 - Anant Khandelwal, Manish Gupta, Puneet Agrawal:

CoCoA: Confidence- and Context-Aware Adaptive Decoding for Resolving Knowledge Conflicts in Large Language Models. 6835-6855 - Huixuan Zhang, Xiaojun Wan:

R-Bind: Unified Enhancement of Attribute and Relation Binding in Text-to-Image Diffusion Models. 6856-6870 - Zinan Tang, Xin Gao, Qizhi Pei, Zhuoshi Pan, Mengzhang Cai, Jiang Wu, Conghui He, Lijun Wu:

Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning. 6871-6891 - Wei Liu, Nai Ding:

Information Integration in Large Language Models is Gated by Linguistic Structural Markers. 6892-6904 - Chengfeng Zhao, Shizhu He, Shanshan Jiang, Bin Dong, Jun Zhao, Kang Liu:

Why and How LLMs Benefit from Knowledge Introspection in Commonsense Reasoning. 6905-6920 - Jing He, Mingyang Lv, Qing Shi, Gong Cheng:

GraDaSE: Graph-Based Dataset Search with Examples. 6921-6932 - Youwon Jang, Woo Suk Choi, Minjoon Jung, Min Su Lee, Byoung-Tak Zhang:

Confidence-guided Refinement Reasoning for Zero-shot Question Answering. 6933-6950 - Yiqi Li, Yusheng Liao, Zhe Chen, Yanfeng Wang, Yu Wang:

DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction. 6951-6966 - Zhenhua Xu, Xixiang Zhao, Xubin Yue, Shengwei Tian, Changting Lin, Meng Han:

CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor. 6967-6989 - Yikuan Xia, Jiazun Chen, Sujian Li, Jun Gao

:
Realistic Training Data Generation and Rule Enhanced Decoding in LLM for NameGuess. 6990-7007 - Zhenhua Xu, Meng Han, Wenpeng Xing:

EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint. 7008-7031 - Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Erxue Min, Sophia Ananiadou:

Selective Preference Optimization via Token-Level Reward Function Estimation. 7032-7056 - Seonil Son, Ju-Min Oh, Heegon Jin, Cheolhun Jang, Jeongbeom Jeong, Kuntae Kim:

Arena-lite: Efficient and Reliable Large Language Model Evaluation via Tournament-Based Direct Comparisons. 7057-7075 - Ruiyi Yan, Yugo Murawaki:

Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models. 7076-7098 - Minghua He, Yue Chen, Fangkai Yang, Pu Zhao, Wenjie Yin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang:

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation. 7099-7125 - Junnan Zhu, Jingyi Wang, Bohan Yu, Xiaoyu Wu, Junbo Li, Lei Wang, Nan Xu:

TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question Answering. 7126-7146 - Jinyang Zhang, Kexin Yang, Yu Wan, Muyang Ye, Baosong Yang, Fei Huang, Junyang Lin, Dayiheng Liu:

NOVA-63: Native Omni-lingual Versatile Assessments of 63 Disciplines. 7147-7189 - Zihan Wang

, Zihan Liang, Zhou Shao, Yufei Ma, Huangyu Dai, Ben Chen, Lingtao Mao, Chenyi Lei, Yuqing Ding, Han Li:
InfoGain-RAG: Boosting Retrieval-Augmented Generation through Document Information Gain-based Reranking and Filtering. 7190-7204 - Yicheng Ji, Jun Zhang, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, Huan Li:

SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning. 7205-7219 - Muhammad Dehan Al Kautsar, Lucky Susanto, Derry Tanti Wijaya, Fajri Koto:

What Do Indonesians Really Need from Language Technology? A Nationwide Survey. 7220-7245 - Yimu Wang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki:

LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts. 7246-7261 - Wessel Poelman

, Thomas Bauwens, Miryam de Lhoneux
:
Confounding Factors in Relating Model Performance to Morphology. 7262-7287 - Hongyan Chang, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, Reza Shokri:

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models. 7288-7310 - Gustave Cortal, Alain Finkel:

Formalizing Style in Personal Narratives. 7311-7326 - Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi:

TopicAttack: An Indirect Prompt Injection Attack via Topic Transition. 7327-7345 - Gianluca Sperduti, Dong Nguyen:

PSET: a Phonetics-Semantics Evaluation Testbed. 7346-7356 - Yingli Shen, Wen Lai, Shuo Wang, Ge Gao, Kangyang Luo, Alexander Fraser

, Maosong Sun:
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora. 7357-7379 - Shuzheng Si, Haozhe Zhao, Gang Chen, Yunshui Li, Kangyang Luo, Chuancheng Lv, Kaikai An, Fanchao Qi, Baobao Chang, Maosong Sun:

GATEAU: Selecting Influential Samples for Long Context Alignment. 7380-7411 - Wangyi Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun:

Teach Small Models to Reason by Curriculum Distillation. 7412-7422 - Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang:

Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment. 7423-7438 - Wei Liu, Siya Qi, Xinyu Wang, Chen Qian, Yali Du, Yulan He:

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning. 7439-7458 - Lena Sophia Bolliger, Lena Ann Jäger:

Genre Matters: How Text Types Interact with Decoding Strategies and Lexical Predictors in Shaping Reading Behavior. 7459-7476 - Aziguli Wulamu, Kaiyuan Gong, Lyu Zhengyu, Yu Han, Zhihong Zhu, Bowen Xing:

RTE-GMoE: A Model-agnostic Approach for Relation Triplet Extraction via Graph-based Mixture-of-Expert Mutual Learning. 7477-7488 - Kyeongman Park, Nakyeong Yang, Kyomin Jung:

Avoidance Decoding for Diverse Multi-Branch Story Generation. 7489-7505 - Weiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong:

Probabilistic Soundness Guarantees in LLM Reasoning Chains. 7506-7525 - Heng-Da Xu, Xian-Ling Mao, Fanshu Sun, Tian-Yi Che, Cheng-Xin Xin, Heyan Huang:

SQLWOZ: A Realistic Task-Oriented Dialogue Dataset with SQL-Based Dialogue State Representation for Complex User Requirements. 7526-7551 - Yuxin Gou, Xiaoning Dong, Qin Li, Shishen Gu, Richang Hong, Wenbo Hu:

SURE: Safety Understanding and Reasoning Enhancement for Multimodal Large Language Models. 7552-7593 - Minh-Phuc Truong, Hai An Vu, Tu Vu, Nguyen Thi Ngoc Diep, Linh Van Ngo, Thien Huu Nguyen, Trung Le:

EMO: Embedding Model Distillation via Intra-Model Relation and Optimal Transport Alignments. 7594-7606 - Kun Li

, Lai Man Po, Hongzheng Yang, Xuyuan Xu, Kangcheng Liu, Yuzhi Zhao
:
AesBiasBench: Evaluating Bias and Alignment in Multimodal Language Models for Personalized Image Aesthetic Assessment. 7607-7620 - Anum Afzal, Florian Matthes, Alexander R. Fabbri:

DA-Pred: Performance Prediction for Text Summarization under Domain-Shift and Instruct-Tuning. 7621-7632 - Jielong Tang

, Yang Yang, Jianxing Yu, Zhen-Xing Wang, Haoyuan Liang, Liang Yao, Jian Yin:
UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NER. 7633-7651 - Yi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Yizhen Yuan, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu:

An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint. 7652-7671 - Songze Li, Zhiqiang Liu, Zhengke Gui, Huajun Chen, Wen Zhang:

Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching. 7672-7692 - Yuanjun Feng, Vivek Choudhary, Yash Raj Shrestha:

Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making. 7693-7706 - Johannes Moll, Louisa Fay, Asfandyar Azhar, Sophie Ostmeier, Sergios Gatidis, Tim C. Lueth, Curtis Langlotz, Jean-Benoit Delbrouck:

Structuring Radiology Reports: Challenging LLMs with Lightweight Models. 7707-7724 - Yunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, Xiaoyu Shen:

PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks. 7725-7734 - Yuebin Xu, Zhiyi Chen, Zeyi Wen:

EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model Inference. 7735-7745 - Xia Du, Shuhan Sun, Pengyuan Liu, Dong Yu:

Investigating Value-Reasoning Reliability in Small Large Language Models. 7746-7786 - Zahra Dehghanighobadi, Asja Fischer, Muhammad Bilal Zafar:

Can LLMs Explain Themselves Counterfactually? 7787-7815 - Chuanyang Zheng, Yihang Gao, Guoxuan Chen, Han Shi, Jing Xiong, Xiaozhe Ren, Chao Huang, Zhenguo Li, Yu Li:

Self-Adjust Softmax. 7816-7836 - Shaoqing Lin, Chong Teng, Fei Li, Donghong Ji, Lizhen Qu, Zhuang Li

:
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement. 7837-7862 - Ernesto Luis Estevanell-Valladares, Suilan Estevez-Velarde, Yoan Gutiérrez, Andrés Montoyo, Ruslan Mitkov:

XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoML. 7863-7880 - Roman Vashurin, Maiya Goloburda, Preslav Nakov, Maxim Panov:

UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models. 7881-7908 - Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, Lihong Li:

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning. 7909-7928 - Tobias Domhan, Dawei Zhu:

Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models. 7929-7947 - Petros Raptopoulos, Giorgos Filandrianos, Maria Lymperaiou, Giorgos Stamou:

PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements. 7948-7984 - Xu Sun, Lionel Delphin-Poulat, Christèle Tarnec, Anastasia Shimorina:

PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization. 7985-8009 - Ziqing Qiao, Yongheng Deng, Jiali Zeng, Dong Wang, Lai Wei, Guanbo Wang, Fandong Meng, Jie Zhou, Ju Ren, Yaoxue Zhang:

ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning. 8010-8029 - Hao Li, Lijun Li, Zhenghao Lu, Xianyi Wei, Rui Li, Jing Shao, Lei Sha:

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment. 8030-8050 - Yuxia Gong, Shuguo Hu, Huaiwen Zhang:

Cross-domain Rumor Detection via Test-Time Adaptation and Large Language Models. 8051-8066 - Chun Hu, Junhui He, Shangyu Wu, Yuxin He, Chun Jason Xue, Qingan Li:

MLWQ: Efficient Small Language Model Deployment via Multi-Level Weight Quantization. 8067-8077 - Seongryong Jung, Suwan Yoon, DongGeon Kim, Hwanhee Lee:

ToDi: Token-wise Distillation via Fine-Grained Divergence Control. 8078-8091 - Qingyao Li

, Wei Xia, Xinyi Dai, Kounianhua Du, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang:
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation. 8092-8110 - Yucheng Sun, Alessandro Stolfo, Mrinmaya Sachan:

Probing for Arithmetic Errors in Language Models. 8111-8128 - Minda Hu, Qiyuan Zhang, Yufei Wang, Bowei He, Hongru Wang, Jingyan Zhou, Liangyou Li, Yasheng Wang, Chen Ma, Irwin King:

NILE: Internal Consistency Alignment in Large Language Models. 8129-8147 - Rong Ma, Lei Wang, Yating Yang, Bo Ma, Rui Dong, Fengyi Yang, Ahtamjan Ahmat, Kaiwen Lu, Xinyue Wang:

Mining the Past with Dual Criteria: Integrating Three types of Historical Information for Context-aware Event Forecasting. 8148-8163 - Andrei Catalin Coman

, Ionut-Teodor Sorodoc, Leonardo F. R. Ribeiro, Bill Byrne, James Henderson, Adrià de Gispert:
RAGferee: Building Contextual Reward Models for Retrieval-Augmented Generation. 8164-8211 - Minh Duc Bui, Carolin Holtermann

, Valentin Hofmann, Anne Lauscher, Katharina von der Wense:
Large Language Models Discriminate Against Speakers of German Dialects. 8212-8240 - Yini Wang, Xian Zhou, Shengan Zheng, Linpeng Huang, Zhunchen Luo, Wei Luo, Xiaoying Bai:

Uncovering Argumentative Flow: A Question-Focus Discourse Structuring Framework. 8241-8259 - Tarun Tater, Diego Frassinelli, Sabine Schulte im Walde:

AbsVis - Benchmarking How Humans and Vision-Language Models "See" Abstract Concepts in Images. 8260-8281 - Tatiana Anikina, Ján Cegin, Jakub Simko, Simon Ostermann:

A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages. 8282-8303 - Houxing Ren, Zimu Lu, Weikang Shi, Haotian Hou, Yunqiao Yang, Ke Wang, Aojun Zhou, Junting Pan, Mingjie Zhan, Hongsheng Li:

Alignment with Fill-In-the-Middle for Enhancing Code Generation. 8304-8320 - Hanbo Huang, Yihan Li, Bowen Jiang, Bo Jiang, Lin Liu, Zhuotao Liu, Ruoyu Sun, Shiyu Liang:

A Middle Path for On-Premises LLM Deployment: Preserving Privacy Without Sacrificing Model Confidentiality. 8321-8359 - Jonghyun Hong, Sungyoon Lee:

Variance Sensitivity Induces Attention Entropy Collapse and Instability in Transformers. 8360-8378 - Min Hyuk Kim, Changheon Kim, Seok Bong Yoo:

X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA. 8379-8397 - Ahmet Yavuz Uluslu, Tannon Kew, Tilia Ellendorff, Gerold Schneider, Rico Sennrich:

Robust Native Language Identification through Agentic Decomposition. 8398-8414 - Jiawei Chen, Xinyan Guan, Qianhao Yuan, Guozhao Mo, Weixiang Zhou, Yaojie Lu, Hongyu Lin, Ben He, Le Sun, Xianpei Han:

ConsistentChat: Building Skeleton-Guided Consistent Multi-Turn Dialogues for Large Language Models from Scratch. 8415-8441 - Yizheng Sun, Hao Li, Chang Xu, Hongpeng Zhou, Chenghua Lin, Riza Batista-Navarro, Jingyuan Sun:

Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study. 8442-8456 - Nisrine Rair, Alban Goupil, Valeriu Vrabie, Emmanuel Chochoy:

When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity. 8457-8480 - Yingming Wang, Pepa Atanasova:

Self-Critique and Refinement for Faithful Natural Language Explanations. 8481-8507 - Arghodeep Nandi, Megha Sundriyal, Euna Mehnaz Khan, Jikai Sun, Emily K. Vraga, Jaideep Srivastava, Tanmoy Chakraborty:

The Psychology of Falsehood: A Human-Centric Survey of Misinformation Detection. 8508-8525 - Xinhao Huang, Zhibo Ren, Yipeng Yu, Ying Zhou, Zulong Chen, Zeyi Wen:

SEAL: Structure and Element Aware Learning Improves Long Structured Document Retrieval. 8526-8536 - Yu Zhang, Dong Guo, Fang Wu, Guoliang Zhu, Dian Ding, Yiming Zhang:

AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity. 8537-8549 - Michael Sejr Schlichtkrull:

Attacks by Content: Automated Fact-checking is an AI Security Issue. 8550-8565 - Yuezhang Peng, Yuxin Liu, Fei Wen, Xie Chen:

MUZO: Leveraging Multiple Queries and Momentum for Zeroth-Order Fine-Tuning of Large Language Models. 8566-8584 - Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang:

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors. 8585-8602 - Sergey Pletenev, Maria Marina, Nikolay Ivanov, Daria Galimzianova, Nikita Krayko, Mikhail Salnikov, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii:

Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA. 8603-8620 - Alina Klerings, Jannik Brinkmann, Daniel Ruffinelli, Simone Paolo Ponzetto:

Steering Language Models in Multi-Token Generation: A Case Study on Tense and Aspect. 8621-8639 - Navve Wasserman, Oliver Heinimann, Yuval Golbari, Tal Zimbalist, Eli Schwartz, Michal Irani:

DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers. 8640-8658 - Yupei Du, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko, Barbara Plank:

Reason to Rote: Rethinking Memorization in Reasoning. 8659-8679 - Kazuki Matsuda, Yuiga Wada, Shinnosuke Hirano, Seitaro Otsuki, Komei Sugiura:

VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions. 8680-8696 - Maria Marina, Nikolay Ivanov, Sergey Pletenev, Mikhail Salnikov, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii:

LLM-Independent Adaptive RAG: Let the Question Speak for Itself. 8697-8709 - Hongyi Luo, Qing Cheng, Daniel Matos, Hari Krishna Gadi

, Yanfeng Zhang, Lu Liu, Yongliang Wang, Niclas Zeller, Daniel Cremers, Liqiu Meng:
TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse Route. 8710-8729 - Yuqicheng Zhu, Jingcheng Wu, Yizhen Wang, Hongkuan Zhou, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab:

Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees. 8730-8752 - Shengxiang Gao, Jey Han Lau, Jianzhong Qi:

Beyond Seen Data: Improving KBQA Generalization Through Schema-Guided Logical Form Generation. 8753-8772 - Yan Li, Tianyi Zhang, Zechuan Li, Caren Han:

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation. 8773-8793 - Yilun Liu, Minggui He, Feiyu Yao, Yuhe Ji, Shimin Tao, Jingzhou Du, Justin Li, Jian Gao, Zhang Li, Hao Yang, Boxing Chen, Osamu Yoshie:

Taming Text-to-Image Synthesis for Novices: User-centric Prompt Generation via Multi-turn Guidance. 8794-8811 - Dong Nguyen, Esther Ploeger

:
We Need to Measure Data Diversity in NLP - Better and Broader. 8812-8821 - Lei Yu, Jingcheng Niu, Zining Zhu, Xi Chen, Gerald Penn:

Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity. 8822-8837 - Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares:

Hierarchical Bracketing Encodings Work for Dependency Graphs. 8838-8851 - Zhenqi Jia, Rui Liu, Berrak Sisman, Haizhou Li:

Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis. 8852-8858 - Mehdi Ali, Manuel Brack, Max Lübbering, Elias Wendt, Abbas Goher Khan, Richard Rutmann, Alex Jude, Maurice Kraus, Alexander Arno Weber, Felix Stollenwerk, David Kaczér, Florian Mai, Lucie Flek, Rafet Sifa, Nicolas Flores-Herr, Joachim Köhler, Patrick Schramowski, Michael Fromm, Kristian Kersting:

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models. 8859-8898 - Hyukhun Koh, Minha Jhang, Dohyung Kim, Sangmook Lee, Kyomin Jung:

Conditional [MASK] Discrete Diffusion Language Model. 8899-8923 - Yogesh Kumar:

Language-Guided Temporal Token Pruning for Efficient VideoLLM Processing. 8924-8931 - Anda Cheng, Wei Huang, Yinggui Wang:

A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization. 8932-8943 - Xinyu Liu, Bei Li, Jiahao Liu, Junhao Ruan, Kechen Jiao, Hongyin Tang, Jingang Wang, Tong Xiao, JingBo Zhu:

IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method. 8944-8958 - Tianqing Fang, Hongming Zhang, Zhisong Zhang, Kaixin Ma, Wenhao Yu, Haitao Mi, Dong Yu:

WebEvolver: Enhancing Web Agent Self-Improvement with Co-evolving World Model. 8959-8975 - Stephen Meisenbacher, Maulik Chevli, Florian Matthes:

Leveraging Semantic Triples for Private Document Generation with Local Differential Privacy Guarantees. 8976-8992 - Yiheng Jing, Mingming Zhang

, Yong Zhuang, Jiacheng Guo, Juan Wang, Xiaoyang Xu, Wenzhe Yi, Keyan Guo, Hongxin Hu:
HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection. 8993-9006 - Yijiong Yu, Wei Wang, Ran Chen, Ji Pei:

Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence. 9007-9014 - Wenxin Tang, Jingyu Xiao, Wenxuan Jiang, Xi Xiao, Yuhang Wang, Xuxin Tang, Qing Li, Yuehe Ma, Junliang Liu, Shisong Tang, Michael R. Lyu:

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design. 9015-9039 - Hongyao Tu, Liang Zhang, Yujie Lin, Xin Lin, Haibo Zhang, Long Zhang

, Jinsong Su:
LLM-OREF: An Open Relation Extraction Framework Based on Large Language Models. 9040-9052 - Jian Li, Shenglin Yin, Yujia Zhang, Alan Zhao, Xi Chen, Xiaohui Zhou, Pengfei Xu:

Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization. 9053-9063 - Leonardo Ranaldi, Federico Ranaldi, Fabio Massimo Zanzotto, Barry Haddow, Alexandra Birch:

Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations. 9064-9085 - Jiajun Chen, Yik-Cheung Tam:

Predicate-Guided Generation for Mathematical Reasoning. 9086-9099 - Raphael Gruber, Abdelrahman Abdallah, Michael Färber, Adam Jatowt:

ComplexTempQA: A 100m Dataset for Complex Temporal Question Answering. 9100-9112 - Qiuchen Wang, Ruixue Ding, Zehui Chen, Weiqi Wu, Shihang Wang, Pengjun Xie, Feng Zhao:

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents. 9113-9134 - Muhammad Falensi Azmi

, Muhammad Dehan Al Kautsar, Alfan Farizki Wicaksono, Fajri Koto:
IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages. 9135-9166 - Harsh Vishwakarma, Ankush Agarwal, Ojas Patil, Chaitanya Devaguptapu, Mahesh Chandran:

Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise Environments. 9167-9201 - Viacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov:

Steering LLM Reasoning Through Bias-Only Adaptation. 9202-9211 - Zuojin Tang, Bin Hu, Chenyang Zhao, De Ma, Gang Pan, Bin Liu:

VLASCD: A Visual Language Action Model for Simultaneous Chatting and Decision Making. 9212-9232 - Yew Ken Chia, Liying Cheng, Hou Pong Chan, Maojia Song, Chaoqun Liu, Mahani Aljunied, Soujanya Poria, Lidong Bing:

M-LongDoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework. 9233-9250 - Pu Jian, Junhong Wu, Wei Sun, Chen Wang, Shuo Ren, Jiajun Zhang:

Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models. 9251-9270 - Youquan Li, Miao Zheng, Fan Yang, Guosheng Dong, Bin Cui, Weipeng Chen, Zenan Zhou, Wentao Zhang:

FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback. 9271-9291 - Fabian Karl

, Ansgar Scherp:
HYDRA: A Multi-Head Encoder-only Architecture for Hierarchical Text Classification. 9292-9303 - Pengyu Zeng, Jun Yin, Miao Zhang, Yuqin Dai, Jizhizi Li, ZhanXiang Jin, Shuai Lu:

CARD: Cross-modal Agent Framework for Generative and Editable Residential Design. 9304-9319 - Jusheng Zhang, Yijia Fan, Kaitong Cai, Zimeng Huang, Xiaofei Sun, Jian Wang, Chengpei Tang, Keze Wang:

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off. 9320-9340 - Thibaut Thonet, Germán Kruszewski, Jos Rozen, Pierre Erbacher, Marc Dymetman:

FaST: Feature-aware Sampling and Tuning for Personalized Preference Alignment with Limited Data. 9341-9370 - Brian S. Lin, Jiaxin Yuan, Zihan Zhou, Shouli Wang, Shuo Wang, Cunliang Kong, Qi Shi, Yuxuan Li, Liner Yang, Zhiyuan Liu, Maosong Sun:

On LLM-Based Scientific Inductive Reasoning Beyond Equations. 9371-9394 - Xiaofu Chen, Israfel Salazar, Yova Kementchedjhieva:

SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation. 9395-9407 - Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zheng, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, Hongsheng Li:

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding. 9408-9421 - Anmol Mekala, Anirudh Atmakuru, Yixiao Song, Marzena Karpinska, Mohit Iyyer:

Does quantization affect models' performance on long-context tasks? 9422-9470 - Tianbo Wang, Yuqing Ma, Kewei Liao, Chengzhao Yang, Zhange Zhang, Jiakai Wang, Xianglong Liu:

Token-Aware Editing of Internal Activations for Large Language Model Alignment. 9471-9509 - Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki M. Asano:

Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs. 9510-9536 - Md. Mehrab Tanjim, Yeonjun In, Xiang Chen, Victor S. Bursztyn, Ryan A. Rossi, Sungchul Kim, Guang-Jie Ren, Vaishnavi Muppala, Shun Jiang, Yongsung Kim, Chanyoung Park:

Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey. 9537-9550 - Xueguan Zhao, Wenpeng Lu, Chaoqun Zheng, Weiyu Zhang, Jiasheng Si, Deyu Zhou:

Plan Dynamically, Express Rhetorically: A Debate-Driven Rhetorical Framework for Argumentative Writing. 9551-9573 - Kechen Jiao, Zhirui Fang, Jiahao Liu, Bei Li, Qifan Wang, Xinyu Liu, Junhao Ruan, Zhongjian Qiao, Yifan Zhu, Yaxin Xu, Jingang Wang, Xiu Li:

TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making. 9574-9588 - Yifan Xia, Guorui Chen

, Wenqian Yu, Zhijiang Li, Philip Torr, Jindong Gu:
Reimagining Safety Alignment with An Image. 9589-9603 - Siva Rajesh Kasa, Karan Gupta, Sumegh Roychowdhury, Ashutosh Kumar, Yaswanth Biruduraju, Santhosh Kumar Kasa, Nikhil Priyatam Pattisapu, Arindam Bhattacharya, Shailendra Agarwal, Vijay Huddar:

Generative or Discriminative? Revisiting Text Classification in the Era of Transformers. 9604-9626 - Ziqi Miao, Yi Ding, Lijun Li, Jing Shao:

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection. 9627-9644 - Alessio Cocchieri, Luca Ragazzi, Giuseppe Tagliavini, Lorenzo Tordi, Antonella Carbonaro, Gianluca Moro:

Can Large Language Models Win the International Mathematical Games? 9645-9671 - Jian Yang, Jiaxi Yang, Wei Zhang, Ke Jin, Yibo Miao, Lei Zhang, Liqun Yang, Zeyu Cui, Yichang Zhang, Zhoujun Li, Binyuan Hui, Junyang Lin:

CodeArena: Evaluating and Aligning CodeLLMs on Human Preference. 9672-9683 - Yuekun Yao, Yupei Du, Dawei Zhu, Michael Hahn, Alexander Koller:

Language models can learn implicit multi-hop reasoning, but only if they have lots of training data. 9684-9702 - Joseph Marvin Imperial, Abdullah Barayan

, Regina Stodden, Rodrigo Wilkens, Ricardo Muñoz Sánchez, Lingyun Gao, Melissa Torgbi, Dawn Knight, Gail Forey, Reka R. Jablonkai, Ekaterina Kochmar, Robert Reynolds, Eugénio Ribeiro, Horacio Saggion, Elena Volodina, Sowmya Vajjala, Thomas François, Fernando Alva-Manchego, Harish Tayyar Madabushi:
UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment. 9703-9755 - Jiawei Guo, Feifei Zhai, Pu Jian, Qianrun Wei, Yu Zhou:

CROP: Contextual Region-Oriented Visual Token Pruning. 9756-9772 - Andrew Piper, Robert Budac:

CR4-NarrEmote: An Open Vocabulary Dataset of Narrative Emotions Derived Using Citizen Science. 9773-9784 - Haoqi Yang, Yao Yao, Zuchao Li, Baoyuan Qi, Liu Guoming, Hai Zhao:

XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression. 9785-9800 - Yueyang Cang, Yuhang Liu, Xiaoteng Zhang, Erlu Zhao, Li Shi:

DINT Transformer. 9801-9809 - Zhiyu Cao, Peifeng Li, Qiaoming Zhu:

ICR: Iterative Clarification and Rewriting for Conversational Search. 9810-9824 - Tong Zhang, Kuofeng Gao, Jiawang Bai, Leo Yu Zhang, Xin Yin, Zonghui Wang, Shouling Ji, Wenzhi Chen:

Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and Alignment. 9825-9838 - Weicong Qin, Yi Xu, Weijie Yu, Teng Shi, Chenglei Shen, Ming He, Jianping Fan, Xiao Zhang, Jun Xu:

Similarity = Value? Consultation Value-Assessment and Alignment for Personalized Search. 9839-9852 - Zhaoyan Gong, Juan Li, Zhiqiang Liu, Lei Liang, Huajun Chen, Wen Zhang:

RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language Models. 9853-9870 - Yao Wang, Di Liang, Minlong Peng:

Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance. 9871-9885 - Xiaonan Wang, Bo Shao, Hansaem Kim:

AI Knows Where You Are: Exposure, Bias, and Inference in Multimodal Geolocation with KoreaGEO. 9886-9903 - Kairong Han, Wenshuo Zhao, Ziyu Zhao, Ye Jun Jian, Lujia Pan, Kun Kuang:

CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models. 9904-9921 - Zhaoheng Huang, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou:

Enhancing LLM Text Detection with Retrieved Contexts and Logits Distribution Consistency. 9922-9934 - Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov:

Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps. 9935-9960 - Zichen Wen, Yifeng Gao, Shaobo Wang, Junyuan Zhang, Qintong Zhang, Weijia Li, Conghui He, Linfeng Zhang:

Stop Looking for "Important Tokens" in Multimodal Language Models: Duplication Matters More. 9961-9980 - Yuchen Deng, Shichen Fan, Naibo Wang, Xinkui Zhao, See-Kiong Ng:

AgentPro: Enhancing LLM Agents with Automated Process Supervision. 9981-10006 - Lorenzo Molfetta, Giacomo Frisoni, Nicolò Monaldini, Gianluca Moro:

PORTS: Preference-Optimized Retrievers for Tool Selection with Large Language Models. 10007-10030 - Xin Song, Liu Haiyan, Haiyang Wang, Ye Wang, Kai Chen, Bin Zhou:

MusKGC: A Flexible Multi-source Knowledge Enhancement Framework for Open-World Knowledge Graph Completion. 10031-10049 - Kai Tang, Rui Wang, Renyu Zhu, Minmin Lin, Xiao Ding, Tangjie Lv, Changjie Fan, Runze Wu, Haobo Wang:

Towards Transferable Personality Representation Learning based on Triplet Comparisons and Its Applications. 10050-10066 - Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari:

Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models. 10067-10079 - Simin Chen, Yiming Chen, Zexin Li, Yifan Jiang, Zhongwei Wan, Yixin He, Dezhi Ran, Tianle Gu, Haizhou Li, Tao Xie, Baishakhi Ray:

Benchmarking Large Language Models Under Data Contamination: A Survey from Static to Dynamic Evaluation. 10080-10098 - Tiansheng Hu, Tongyan Hu, Liuyang Bai, Yilun Zhao, Arman Cohan, Chen Zhao:

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain. 10099-10128 - Yangqin Jiang, Xubin Ren, Lianghao Xia, Da Luo, Kangyi Lin, Chao Huang:

RecGPT: A Foundation Model for Sequential Recommendation. 10129-10143 - Chih-Kai Yang, Neo S. Ho, Hung-yi Lee:

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey. 10144-10170 - Nikita Balagansky, Yaroslav Aksenov, Daniil Laptev, Vadim Kurochkin, Gleb Gerasimov, Nikita Koriagin, Daniil Gavrilov:

Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy. 10171-10179 - Taiming Lu, Philipp Koehn:

Learn and Unlearn: Addressing Misinformation in Multilingual LLMs. 10180-10195 - Dulhan Jayalath, James Bradley Wendt, Nicholas Monath, Sandeep Tata, Beliz Gunel:

PRISM: Efficient Long-Range Reasoning With Short-Context LLMs. 10196-10218 - Yichen Tang

, Weihang Su, Yujia Zhou, Yiqun Liu, Min Zhang, Shaoping Ma, Qingyao Ai:
Augmenting Multi-Agent Communication with State Delta Trajectory. 10219-10240 - Dana Arad, Aaron Mueller, Yonatan Belinkov:

SAEs Are Good for Steering - If You Select the Right Features. 10241-10259 - Kyohoon Jin, Juhwan Choi, Jungmin Yun, Junho Lee, Soojin Jang, YoungBin Kim:

CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples. 10260-10278 - Milad Alshomary, Nikhil Reddy Varimalla, Vishal Anand, Smaranda Muresan, Kathleen McKeown:

Layered Insights: Generalizable Analysis of Human Authorial Style by Leveraging All Transformer Layers. 10279-10292 - Yingming Zheng, Hanqi Li, Kai Yu, Lu Chen:

When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models. 10293-10308 - Hellina Hailu Nigatu, Atnafu Lambebo Tonja, Henok Biadglign Ademtew, Hizkiel Mitiku Alemayehu, Negasi Haile Abadi, Tadesse Destaw Belay, Seid Muhie Yimam:

A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge'ez Script. 10309-10320 - Syeda Jannatus Saba, Steven Skiena:

Evaluating Language Translation Models by Playing Telephone. 10321-10336 - Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci:

Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs. 10337-10358 - Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp:

SPaRC: A Spatial Pathfinding Reasoning Challenge. 10359-10390 - Yao-Ching Yu, Tsun-Han Chiang, Cheng-Wei Tsai, Chien-Ming Huang, Wen-Kwang Tsao:

Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training. 10391-10413 - Yuhang Chen, Zhen Tan, Ajay Kumar Jaiswal, Huaizhi Qu, Xinyu Zhao, Qi Lin, Yu Cheng, Andrew Kwong, Zhichao Cao, Tianlong Chen:

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework. 10414-10424 - Wei Jie Yeo, Ranjan Satapathy, Erik Cambria:

Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models. 10425-10447 - Reza Khanmohammadi, Erfan Miahi, Mehrsa Mardikoraem, Simerjot Kaur, Ivan Brugere, Charese Smiley, Kundan Thind, Mohammad M. Ghassemi:

Calibrating LLM Confidence by Probing Perturbed Representation Stability. 10448-10514 - Yuanzhe Shen, Yide Liu, Zisu Huang, Ruicheng Yin, Xiaoqing Zheng, Xuanjing Huang:

SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading. 10515-10529 - Rui Ha, Chaozhuo Li, Rui Pu, Litian Zhang, Xi Zhang, Sen Su:

DSG-MCTS: A Dynamic Strategy-Guided Monte Carlo Tree Search for Diversified Reasoning in Large Language Models. 10530-10544 - Juntae Lee, Jihwan Bang, Seunghan Yang, Simyung Chang:

CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM. 10545-10559 - Zhuo Liu, Ding Yu, Hangfeng He

:
On the Role of Model Prior in Real-World Inductive Reasoning. 10560-10583 - Hellina Hailu Nigatu, Nikita Mehandru, Negasi Haile Abadi, Blen Gebremeskel, Ahmed Alaa, Monojit Choudhury:

Viability of Machine Translation for Healthcare in Low-Resourced Languages. 10584-10598 - Yilun Qiu, Tianhao Shi, Xiaoyan Zhao, Fengbin Zhu, Yang Zhang, Fuli Feng:

Latent Inter-User Difference Modeling for LLM Personalization. 10599-10617 - Kangyu Qiao, Shaolei Zhang, Yang Feng:

IG-Pruning: Input-Guided Block Pruning for Large Language Models. 10618-10629 - Momoka Furuhashi, Kouta Nakayama, Takashi Kodama, Saku Sugawara:

Are Checklists Really Useful for Automatic Evaluation of Generative Tasks? 10630-10653 - Kirill Semenov

, Rico Sennrich:
Measuring the Effect of Disfluency in Multilingual Knowledge Probing Benchmarks. 10654-10672 - Changyue Wang, Weihang Su, Qingyao Ai, Yichen Tang

, Yiqun Liu:
Knowledge Editing through Chain-of-Thought. 10673-10693 - Qian Dong, Jia Chen, Qingyao Ai, Hongning Wang, Haitao Li, Yi Wu, Yao Hu, Yiqun Liu, Shaoping Ma:

SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation. 10694-10705 - Yufei Wang, Adriana Kovashka:

Probing Logical Reasoning of MLLMs in Scientific Diagrams. 10706-10718 - Huishuai Zhang, Bohan Wang, Luoxin Chen:

AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training. 10719-10738 - Feiyang Kang, Newsha Ardalani, Michael Kuchnik, Youssef Emad, Mostafa Elhoushi, Shubhabrata Sengupta, Shang-wen Li, Ramya Raghavendra, Ruoxi Jia, Carole-Jean Wu:

Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls. 10739-10758 - Yumeng Shi, Quanyu Long, Wenya Wang

:
Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question Answering. 10759-10771 - Zonghai Yao, Michael Sun, Won Seok Jang, Sunjae Kwon, Soie Kwon, Hong Yu:

DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge. 10772-10798 - Monjoy Narayan Choudhury, Junling Wang

, Yifan Hou, Mrinmaya Sachan:
Can Vision-Language Models Solve Visual Math Equations? 10799-10808 - Benlu Wang, Iris Xia, Yifan Zhang, Junda Wang, Feiyun Ouyang, Shuo Han, Arman Cohan, Hong Yu, Zonghai Yao:

From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations. 10809-10833 - Yi Sui, Chaozhuo Li, Chen Zhang, Dawei Song, Qiuchi Li:

Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge. 10834-10858 - Ziliang Qiu, Renfen Hu:

Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models. 10859-10872 - Advit Deepak, Megan Mou, Jing Huang, Diyi Yang:

Identifying Unlearned Data in LLMs via Membership Inference Attacks. 10873-10892 - Zihao Li, Xu Wang, Yuzhe Yang, Ziyu Yao

, Haoyi Xiong, Mengnan Du:
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models. 10893-10913 - KV Aditya Srivatsa, Kaushal Kumar Maurya, Ekaterina Kochmar:

LLMs cannot spot math errors, even when allowed to peek into the solution. 10914-10928 - Haoyu Huang, Chong Chen, Zeang Sheng, Yang Li, Wentao Zhang:

Can LLMs be Good Graph Judge for Knowledge Graph Construction? 10929-10948 - Zhi Zhang, Yixian Shen, Congfeng Cao, Ekaterina Shutova:

NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning. 10949-10966 - Abdellah El Mekki, Houdaifa Atou, Omer Nacar, Shady Shehata, Muhammad Abdul-Mageed:

NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities. 10967-10991 - Yuan Gao, Weiwei Sun:

A Computational Simulation of Language Production in First Language Acquisition. 10992-11006 - Danna Zheng, Mirella Lapata, Jeff Z. Pan:

Long-Form Information Alignment Evaluation Beyond Atomic Facts. 11007-11027 - AbdelRahim A. Elmadany, Sang Yun Kwon, Hawau Olamide Toyin, Alcides Alcoba Inciarte, Hanan Aldarmaki, Muhammad Abdul-Mageed:

Voice of a Continent: Mapping Africa's Speech Technology Frontier. 11028-11050 - Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma:

Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains. 11051-11079 - Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Jiahao Zhang:

Circuit Complexity Bounds for RoPE-based Transformer Architecture. 11080-11097 - Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma:

Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments. 11098-11126 - Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang:

Towards Infinite-Long Prefix in Transformer. 11127-11191 - Zixian Ma, Jianguo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Caiming Xiong, Ranjay Krishna, Silvio Savarese:

LATTE: Learning to Think with Vision Specialists. 11192-11229 - Xianren Zhang, Hui Liu, Delvin Ce Zhang, Xianfeng Tang, Qi He, Dongwon Lee, Suhang Wang:

SUA: Stealthy Multimodal Large Language Model Unlearning Attack. 11230-11243 - Hongbo Liu

, Jia Xu:
ResFormer: All-Time Reservoir Memory for Long Sequence Classification. 11244-11256 - Zeping Yu, Yonatan Belinkov, Sophia Ananiadou:

Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models. 11257-11272 - Enora Rice, Katharina von der Wense, Alexis Palmer:

Interdisciplinary Research in Conversation: A Case Study in Computational Morphology for Language Documentation. 11273-11285 - Huanxin Sheng, Xinyi Liu, Hangfeng He

, Jieyu Zhao, Jian Kang:
Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction. 11286-11328 - Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang:

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time. 11329-11354 - Miao Zhou, Lina Yang, Thomas Wu, Dongnan Yang, Xinru Zhang:

Dual-Path Dynamic Fusion with Learnable Query for Multimodal Sentiment Analysis. 11355-11365 - Yunzhi Yao, Jizhan Fang, Jia-Chen Gu, Ningyu Zhang, Shumin Deng, Huajun Chen, Nanyun Peng:

CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners. 11366-11382 - Yuheng Wu, Jianwen Xie, Denghui Zhang, Zhaozhuo Xu:

DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic. 11383-11397 - Yangyifan Xu, Shuo Ren, Jiajun Zhang:

Collaborative Beam Search: Enhancing LLM Reasoning via Collective Consensus. 11398-11410 - Keane Ong, Rui Mao, Deeksha Varshney, Paul Pu Liang, Erik Cambria, Gianmarco Mengaldo:

Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation. 11411-11434 - Zhuohang Li, Chao Yan, Nicholas J. Jackson, Wendi Cui, Bo Li, Jiaxin Zhang, Bradley A. Malin:

Towards Statistical Factuality Guarantee for Large Vision-Language Models. 11435-11456 - Guangzhi Sun, Potsawee Manakul, Xiao Zhan, Mark J. F. Gales:

Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? 11457-11467 - Bolian Li, Yanran Wu, Xinyu Luo, Ruqi Zhang:

Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner. 11468-11478 - Ruiyu Xiao, Lei Wu

, Yuanxing Liu, Weinan Zhang, Ting Liu:
Stimulate the Critical Thinking of LLMs via Debiasing Discussion. 11479-11492 - Xintong Li, Jalend Bantupalli, Ria Dharmani, Yuwei Zhang, Jingbo Shang:

Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning. 11493-11506 - Ozan Irsoy, Pengxiang Cheng

, Jennifer L. Chen, Daniel Preotiuc-Pietro, Shiyue Zhang, Duccio Pappadopulo:
Improving Instruct Models for Free: A Study on Partial Adaptation. 11507-11521 - Xintong Li, Junda Wu, Tong Yu, Rui Wang, Yu Wang, Xiang Chen, Jiuxiang Gu, Lina Yao, Julian J. McAuley, Jingbo Shang:

CoMMIT: Coordinated Multimodal Instruction Tuning. 11522-11536 - Tianhao Wu, Weizhe Yuan, Olga Golovneva, Jing Xu, Yuandong Tian, Jiantao Jiao, Jason E. Weston, Sainbayar Sukhbaatar:

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge. 11537-11554 - Song Wang, Zhen Tan, Zihan Chen, Shuang Zhou, Tianlong Chen, Jundong Li:

AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction. 11555-11567 


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID