


default search action
33rd MM 2025: Dublin, Ireland
- Cathal Gurrin, Klaus Schoeffmann, Min Zhang, Luca Rossetto, Stevan Rudinac, Duc-Tien Dang-Nguyen, Wen-Huang Cheng, Phoebe Chen, Jenny Benois-Pineau:

Proceedings of the 33rd ACM International Conference on Multimedia, MM 2025, Dublin, Ireland, October 27-31, 2025. ACM 2025, ISBN 979-8-4007-2035-2
Keynote Talks
- Shalini De Mello

:
AI-Mediated Human Interaction. 1 - Tat-Seng Chua:

Next Phase of Research on Multimodal Foundation Models: From Alignments to Content Generation and Quality Assessment. 2 - Steve Hodges

:
SenseCam and Isotyping: The Challenges and Benefits of Working with New Hardware. 3-4
Content: Media Interpretation
- Haolun Li

, Weihuang Liu
, Jiateng Liu
, Zhenhua Tang
, Chi-Man Pun
, Qiguang Miao
, Feng Xu
, Hao Gao
:
MotionRefineNet: Fine-Grained Pose Sequence Smoothing and Refinement. 5-14 - Mo Yang

, Luo Chen
, Jiali Zhou
:
Change-UP: Advancing Visualization and Inference Capability for Multi-level Remote Sensing Change Interpretation. 15-24 - Yuxiang Zhao

, Wei Huang
, Haipeng Zeng
, Huan Zhao
, Yujie Song
:
Cross Time Domain Intention Interaction for Conditional Trajectory Prediction. 25-33 - Ye-Chan Kim

, SeungJu Cha
, Si-Woo Kim
, Taewhan Kim
, Dong-Jin Kim
:
SIDA: Synthetic Image Driven Zero-shot Domain Adaptation. 34-42 - Han Hu

, Wenli Du
, Bing Wang
:
Efficient Video Anomaly Detection via Scene-Dependent Memory Assisted Inter-Frame RGB Difference Reconstruction. 43-51 - Hyungjun Doh

, Dong In Lee
, Seunggeun Chi
, Pin-Hao Huang
, Kwonjoon Lee
, Sangpil Kim
, Karthik Ramani
:
Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction. 52-61 - Guoyi Li

, Die Hu
, Haozhe Li
, Qirui Tang
, Xiaomeng Fu
, Yulei Wu
, Xiaodan Zhang
, Honglei Lyu
:
Zero-Shot Multimodal Fact-Checking with Conceptual Reasoning. 62-71 - Junyu Zhou

, Yuyang Huang
, Wenrui Dai
, Junni Zou
, Ziyang Zheng
, Nuowen Kan
, Chenglin Li
, Hongkai Xiong
:
3DGabSplat: 3D Gabor Splatting for Frequency-adaptive Radiance Field Rendering. 72-81 - Songze Li

, Yunfei Guo
, Shen Chen
, Bin Li
, Kaiqing Lin
, Changsheng Chen
, Haodong Li
, Taiping Yao
, Shouhong Ding
:
DITL2: Dual-Stage Invariance Transfer Learning for Generalizable Document Image Tampering Localization. 82-91 - Rouqi Zhang

, Chengdi Lu
, Hancheng Lu
, Yang Cao
, Tiesong Zhao
:
RobustVisH: Robust Visual-Haptic Cross-Modal Recognition under Transmission Interference. 92-100 - Zhangchi Hu

, Peixi Wu
, Jie Chen
, Huyue Zhu
, Yijun Wang
, Yansong Peng
, Hebei Li
, Xiaoyan Sun
:
Dome-DETR: DETR with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection. 101-110 - Xiaojian Lin

, Wenxin Zhang
, Yuchu Jiang
, Wangyu Wu
, Yiran Guo
, Kangxu Wang
, Zongzheng Zhang
, Guijin Wang
, Lei Jin
, Hao Zhao
:
Butter: Frequency Consistency and Hierarchical Fusion for Autonomous Driving Object Detection. 111-120 - Xinkui Lin

, Yongxiu Xu
, Minghao Tang
, Shilong Zhang
, Hongbo Xu
, Hao Xu
, Yubin Wang
:
REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts. 121-130 - Xiaoran Xu

, Jiangang Yang
, Wenyue Chong
, Wenhui Shi
, Shichu Sun
, Jing Xing
, Jian Liu
:
Boosting Single-Domain Generalized Object Detection via Vision-Language Knowledge Interaction. 131-140 - Shaohua Liu

, Ning Gao
, Zuoya Gu
, Hongkun Dou
, Yue Deng
, Hongjue Li
:
Spatiotemporal Degradation-Aware 3D Gaussian Splatting for Realistic Underwater Scene Reconstruction. 141-150 - Tianyi Ma

, Maoying Qiao
:
EBaR: Efficient Buffer and Resetting for Single-Sample Continual Test-Time Adaptation. 151-160 - Wenzhe He

, Xiaojun Chen
, Wentang Chen
, Hongyu Wang
, Ying Liu
, Ruihui Li
:
RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion. 161-170 - Ruian He

, Zixian Zhang
, Ri Cheng
, Weimin Tan
, Bo Yan
:
Efficient Trajectory Space-Time Super-Resolution for Fast Live-cell Imaging. 171-179 - Hongzhao Li

, Hualei Wan
, Liangzhi Zhang
, Mingyuan Jiu
, Shupan Li
, Mingliang Xu
, Muhammad Haris Khan
:
Towards Robust Multimodal Domain Generalization via Modality-Domain Joint Adversarial Training. 180-188 - Hongda Qin

, Xiao Lu
, Zhiyong Wei
, Ningjiang Chen
:
Object-Preserving Counterfactual Diffusion Augmentation for Single-Domain Generalized Object Detection. 189-198 - Yidong Chen

, Qi Li
, Yuyang Yang
, Wen Li
, Sheng Ao
, Cheng Wang
:
Unleashing the Power of Data Generation in One-Pass Outdoor LiDAR Localization. 199-208 - Wenli Zheng

, Huiyuan Fu
, Xicong Wang
, Hao Kang
, Chuanming Wang
, Jin Liu
, Zekai Xu
, Heng Zhang
, Huadong Ma
:
EvRAW: Event-guided Structural and Color Modeling for RAW-to-sRGB Image Reconstruction. 209-218 - Zhaoxi Mu

, Rilin Chen
, Andong Li
, Meng Yu
, Xinyu Yang
, Dong Yu
:
From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models. 219-228 - Jin Han

, Yixin Yang
, Zhan Zhan
, Boxin Shi
, Imari Sato
:
EDeF-Net: Spatio-temporal Association Network for Flicker Removal in Event Streams. 229-237 - Jinxiang Lai

, Wenlong Wu
, Jiawei Zhan
, Jian Li
, Bin-Bin Gao
, Jun Liu
, Jie Zhang
, Song Guo
:
BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance Segmentation. 238-246 - Jiaxu Li

, Rui Li
, Jianyu Qi
, Songning Lai
, Linpu Lv
, Kejia Fan
, Jianheng Tang
, Yutao Yue
, Dongzhan Zhou
, Yunhuai Liu
, Huiping Zhuang
:
CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point Clouds. 247-256 - Trong-Thang Pham

, Anh Nguyen
, Zhigang Deng
, Carol C. Wu
, Hien Nguyen
, Ngan Le
:
Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis. 257-266 - Mingliang Zhai

, Yiheng Wang
, Haidong Hu
, Chi-Man Pun
, Hao Gao
:
FGRFlow: Learning Fine-Grained Rigidity Scene Flow from 4D Radar Point Cloud. 267-276 - Xiaoyu Zhang

, Zhifeng Bao
, Hai Dong
, Ziwei Wang
, Jiajun Liu
:
Querying Autonomous Vehicle Point Clouds: Enhanced by 3D Object Counting with CounterNet. 277-285 - Guiping Cao

, Xiangyuan Lan
, Wenjian Huang
, Jianguo Zhang
, Dongmei Jiang
, Yaowei Wang
:
DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection. 286-295 - Zhen Wang

, Dongyuan Li
, Yaozu Wu
, Peide Zhu
, Shiyin Tan
, Renhe Jiang
:
Video-based Transparent Object Segmentation via Temporal Feature Aggregation. 296-304 - Haosheng Cai

, Yang Xue
:
G2LFormer: Global-to-Local Query Enhancement for Robust Table Structure Recognition. 305-314 - Xinyi Hu

, Yuran Wang
, Ruixu Zhang
, Yue Li
, Wenxuan Liu
, Zheng Wang
:
SPAN: Continuous Modeling of Suspicion Progression for Temporal Intention Localization. 315-323 - Tianyi Zhang

, Qinglong Lin
, Yang Hu
, Pengming Feng
, Rubo Zhang
:
Edge-aware Affinity Enhancement for Image Manipulation Localization. 324-332 - Kanglin Qu

, Pan Gao
, Qun Dai
, Yuanhao Sun
:
HydraMamba: Multi-Head State Space Model for Global Point Cloud Learning. 333-342 - Runmin Cong

, Zongji Yu
, Hao Fang
, Haoyan Sun
, Sam Kwong
:
UIS-Mamba: Exploring Mamba for Underwater Instance Segmentation via Dynamic Tree Scan and Hidden State Weaken. 343-352 - Kuo Shi

, Jie Lu
, Shanshan Ye
, Guangquan Zhang
, Zhen Fang
:
MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image Detection. 353-361 - Runtian Yuan

, Mohan Chen
, Jilan Xu
, Ling Zhou
, Qingqiu Li
, Yuejie Zhang
, Rui Feng
, Tao Zhang
, Shang Gao
:
Text-Promptable Propagation for Referring Medical Image Sequence Segmentation. 362-371 - Dunwei Tu

, Huiyu Yi
, Yuchi Wang
, Baile Xu
, Jian Zhao
, Furao Shen
:
Multiple Queries with Multiple Keys: A Precise Prompt Matching Paradigm for Prompt-based Continual Learning. 372-381 - Zihou Zhang

, Hao Li
, Zhengwei Yang
, Zechao Hu
, Liang Li
, Zheng Wang
:
From Language to Instance: Generative Visual Prompting for Zero-shot Camouflaged Object Detection. 382-391 - Chen Cai, Tianyi Liu, Jianjun Gao, Wenyang Liu, Kejun Wu, Ruoyu Wang, Yi Wang, Soo Chin Liew:

From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Open-vocabulary Grounded Situation Recognition. 392-401 - Hanyu Guo

, Suzhou Que
, Junlong Gao
, Hanzi Wang
:
TFPA: Text Features Guided Dynamic Parameter Adjustment for Few Shot Action Recognition. 402-411 - Jitong Liao

, Yulu Gao
, Shaofei Huang
, Jialin Gao
, Jie Lei
, Ronghua Liang
, Si Liu
:
DOMR: Establishing Cross-View Segmentation via Dense Object Matching. 412-421 - Yue Guo

, Haoxiang Liao
, Haibin Ling
, Bingyao Huang
:
NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images. 422-431 - Yichi Zhang

, Zhuo Chen
, Lingbing Guo
, Yajing Xu
, Lei Liang
, Wen Zhang
, Huajun Chen
:
Client-Server Co-design with Multi-modal Codebooks Makes Better and Faster Federate Knowledge Sharing. 432-440 - Bo Wang

, Jin Liu
, Huiyuan Fu
, Xin Wang
, Heng Zhang
, Huadong Ma
:
Severe Light, Textureless Sight: A Benchmark for Extreme Exposure Correction. 441-449 - Zhicheng Lian

, Lizhi Wang
, Hua Huang
:
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech. 450-459 - Zhaoyu Chen, Qian Huang, Xing Li, Yunfei Zhang, Shihao Han, Ge Gao, Yirui Wu, Xin Li, Ziyang Yin:

Geo-CF2Net: Geometry-Prior Cross-Frequency Interactive Fusion Network for 3D Human Action Recognition. 460-469 - Naisong Luo

, Yuan Wang
, Yuwen Pan
, Rui Sun
:
Focus on the Object: Gradient-based Feature Modulation for Camouflaged Object Segmentation. 470-478 - Liuyi Li

, Feng Shi
, Jian Wang
, Jinjing Zhu
, Wenze Shao
:
An Event-tailored State-Space Based Model for Pedestrian Detection. 479-488 - Zhihong Zheng

, Yang Cao
, Junlong Gao
, Hanzi Wang
:
OV-VOD: Open-Vocabulary Video Object Detection. 489-498 - Yin Wang

, Zixuan Wang
, Hao Lu
, Zhen Qin
, Hailiang Zhao
, Guanjie Cheng
, Xin Du
, Ge Su
, Li Kuang
, Mengchu Zhou
, Shuiguang Deng
:
SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples. 499-507 - Kuiye Ding

, Fanda Fan
, Yao Wang
, Ruijie Jian
, Xiaorui Wang
, Luqi Gong
, Yishan Jiang
, Chunjie Luo
, Jianfeng Zhan
:
DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework. 508-517 - Quanmin Liang

, Jinyi Lu
, Qiang Li
, Shuai Liu
, Zhihao Zhao
, Yinzheng Zhao
, Wei Zhang
, Kai Huang
, Yonghong Tian
:
ESOD: Event-Based Small Object Detection. 518-527 - Michael Kohl

, Tobias Wursthorn
, Christof Weiß
:
Cross-Modal Metrics for Capturing Correspondences Between Music Audio and Stage Lighting Signals. 528-534 - Yingbing Liu

, Fei Ma
, Yanan Wu
, Xinxin Zuo
, Fan Zhang
, Yang Wang
:
Collaborative Cloud-edge Generalized Category Discovery. 535-543 - Ping Li

, Chenhao Ping
, Wenxiao Wang
, Mingli Song
:
Sample-level Adaptive Knowledge Distillation for Action Recognition. 544-552 - Jiale Yu

, Baopeng Zhang
, Zhu Teng
, Jianping Fan
:
OV-DAVEL: Towards Open-Vocabulary Dense Audio-Visual Event Localization in Untrimmed Videos. 553-562 - Jie Fu

, Bingkun Bao
:
Retaining Temporal Semantics and Relation Topologies for Continual Weakly-Supervised Audio-Visual Video Parsing. 563-572 - Xiaofeng Liu

, Guanchen Meng
, Chongyang Feng
, Risheng Liu
, Zhongxuan Luo
, Xin Fan
:
TNT-GS: Truncated and Tailored Gaussian Splatting. 573-581 - Pengfei Cai

, Yan Song
, Qing Gu
, Nan Jiang
, Haoyu Song
, Ian McLoughlin
:
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries. 582-591 - Zhaolin Cai

, Fan Li
, Ziwei Zheng
, Yanjun Qin
:
HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs. 592-601 - Guanchun Wang

, Xiangrong Zhang
, Yifei Zhang
, Zelin Peng
, Tianyang Zhang
, Xu Tang
, Licheng Jiao
:
ACMamba: Fast Unsupervised Anomaly Detection via An Asymmetrical Consensus State Space Model. 602-611 - Jian Zhou

, Yingjie Xie
, Cunhang Fan
, Huabin Wang
, Zhao Lv
, Liang Tao
:
DHGCN: Dual HyperGraph Convolutional Network for EEG-Based Auditory Attention Detection. 612-620 - Peiqi Jiang

, Bohan Lei
, Yuhao Sun
, Lingyun Yu
, Zhineng Chen
, Hongtao Xie
, Yongdong Zhang
:
Proactive Deepfake Detection via Self-Verifiable Semantic Watermarking. 621-630 - Yuzhen Li

, Yuehui Han
, Jianjun Qian
, Jian Yang
:
Self-Supervised Vision Graph Neural Networks Based on Contrastive Learning. 631-640 - Luosheng Xu

, Dalin Zhang
, Zhaohui Song
:
Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection. 641-649 - Chenglong Sun

, Shijie Pang
, Yuzheng Wang
, Lizhe Qi
:
RWKV3D: An RWKV-Based Model with Multiple Training Strategies for Point Cloud Analysis. 650-659 - Jinghan Liu

, Xingmei Wang
, Jiaxiang Meng
:
Adaspeaker: Learning Discriminative Speaker Representations with Gradient-Aware Adaptive Scaling. 660-668 - Wenpeng Lang

, Saihui Hou
, Yongzhen Huang
:
Beyond Sparse Keypoints: Dense Pose Modeling for Robust Gait Recognition. 669-678 - Jinwen Wang

, Youfang Lin
, Xiaobo Hu
, Siyu Yang
, Sheng Han
, Shuo Wang
, Kai Lv
:
From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training. 679-688 - Yaoxun Xu

, Hangting Chen
, Jianwei Yu
, Wei Tan
, Shun Lei
, Zhiwei Lin
, Rongzhi Gu
, Zhiyong Wu
:
MuCodec: Ultra Low-Bitrate Music Codec for Music Generation. 689-698 - Chi Huang

, Qi Zhang
, Qian Zhang
, Nan Li
, Yipu Gong
, Xiaowei Wang
, Wei Feng
:
TriGS: Tri-consistency 3D Gaussian Splatting from Sparse and Unposed Views. 699-708 - Xuedong He

, Huiying Xu
, Xinzhong Zhu
, Hongbo Li
:
High-Performance Discriminative Tracking with Spatio-Temporal Template Fusion. 709-718 - Jingdong Zhang

, Hanrong Ye
, Xin Li
, Wenping Wang
, Dan Xu
:
Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions. 719-728 - Jiaxi Wang

, Yaosen Min
, Xun Zhu
, Miao Li
, Ji Wu
:
MIPS: A Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction. 729-738 - Yuxuan Zhang

, Bo Wang
, Yu Du
, Yangfu Zhu
, Haorui Wang
, Guangyao Su
, Tao Zhou
, Bin Wu
:
Cause and Effect: Video Social Relationship Recognition from Causal Perspective. 739-747 - Mashiro Toyooka

, Kiyoharu Aizawa
, Yoko Yamakata
:
A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task. 748-756 - Guitao Xu

, Ziqi Yi
, Peirong Zhang
, Jiahuan Cao
, Shihang Wu
, Lianwen Jin
:
From Pixels to Semantics: A Novel MLLM-Driven Approach for Explainable Tampered Text Detection. 757-766 - Yifan Wang

, Yuntai Ding
, Yiyang Gu
, Ziyue Qiao
, Chong Chen
, Xian-Sheng Hua
, Ming Zhang
, Wei Ju
:
Deep Graph Clustering with Disentangled Representation Learning. 767-776 - Han Li

, Shaofei Huang
, Longfei Xu
, Yulu Gao
, Beipeng Mu
, Si Liu
:
RATopo: Improving Lane Topology Reasoning via Redundancy Assignment. 777-786 - Sensen Wang

, Yuehu Liu
, Chi Zhang
:
BiOMamba: Mamba-based Forward-Then-Backward Temporal Modeling for Online Action Detection and Anticipation. 787-795 - Xiangyu Zheng

, Songcheng He
, Wanyun Li
, Xiaoqiang Li
, Wei Zhang
:
Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation. 796-805 - Xiaobo Liu

, Henglu Wei
, Chuxi Yang
, Wei Yu
, Xudong Zhao
, Xiangyang Ji
:
Camera-Specific Imaging Simulation for Raw Domain Image Super Resolution. 806-815 - Zongsheng Cao

, Yangfan He
, Anran Liu
, Jun Xie
, Zhepeng Wang
, Feng Chen
:
PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation. 816-825 - Haonan Cheng

, Junwei Zhang
, Hengyan Huang
, Long Ye
:
FG-Midiformer: A Symbolic Music Understanding Model towards Fine-Grained Learning of Multi-Attributes. 826-835 - Yiran Meng

, Junhong Ye
, Wei Zhou
, Guanghui Yue
, Xudong Mao
, Ruomei Wang
, Baoquan Zhao
:
VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering. 836-845 - Guorui Song

, Guocun Wang
, Zhe Huang
, Jing Lin
, Xuefei Zhe
, Jian Li
, Haoqian Wang
:
Towards Fine-Grained Human Motion Video Captioning. 846-855
Content: Multimodal Fusion
- Junpu Zhang

, Shengju Yu
, Suyuan Liu
, Siwei Wang
, Miaomiao Li
, Xinwang Liu
, En Zhu
, Kunlun He
:
Learning the Anchors with Similar Distributions to Original Data for Multi-view Clustering. 857-866 - Fengshun Wang

, Qiurui Wang
, Peilin Zhao
:
Learning Long-Range Action Representation by Two-Stream Mamba Pyramid Network for Figure Skating Assessment. 867-875 - Yan Zhang

, Gangyan Zeng
, Daiqing Wu
, Huawen Shen
, Binbin Li
, Yu Zhou
, Can Ma
, Xiaojun Bi
:
Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective. 876-885 - Hui Zhang

, Yiteng Xu
, Yonglin Tian
, Yidong Li
, Tiago H. Falk
, Fei-Yue Wang
:
Selective Shift: Towards Personalized Domain Adaptation in Multi-Agent Collaborative Perception. 886-895 - Mingqian Ji

, Jian Yang
, Shanshan Zhang
:
Enhancing Pseudo-Boxes via Data-Level LiDAR-Camera Fusion for Unsupervised 3D Object Detection. 896-904 - Gaoxiang Cong

, Liang Li
, Jiadong Pan
, Zhedong Zhang
, Amin Beheshti
, Anton van den Hengel
, Yuankai Qi
, Qingming Huang
:
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing. 905-914 - Wenhui Wu

, Guanqi Wen
, Le Ou-Yang
, Ran Wang
, Sam Kwong
:
DUIMC: Deep Unbalanced Incomplete Multi-View Clustering via Graph Constrained Imputation and Contrastive Learning. 915-924 - Hao Wang

, Xiaobao Wei
, Xiaoan Zhang
, Jianing Li
, Chengyu Bai
, Ying Li
, Ming Lu
, Wenzhao Zheng
, Shanghang Zhang
:
EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler. 925-934 - Zhongfan Sun

, Kan Guo
, Yongli Hu
, Daxin Tian
, Qingqing Gao
, Jiapu Wang
, Junbin Gao
, Yanfeng Sun
, Baocai Yin
:
Large-Small Model Synergy with Multimodal Fine-Grained Heuristics for Knowledge-Based Visual Question Answering. 935-944 - Peng Chen

, Xiaobao Wei
, Qingpo Wuwu
, Xinyi Wang
, Xingyu Xiao
, Ming Lu
:
MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians. 945-954 - Peiyuan Jiang

, Yao Liu
, Qiao Liu
, Zongshun Zhang
, Jiaye Yang
, Lu Liu
, Daibing Yao
:
DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition. 955-964 - Tao Ling

, Siping Shi
, Dan Wang
:
Accelerating Long Video Understanding via Compressed Scene Graph-Enabled Chain-of-Thought. 965-974 - Tong Chen

, Bowen Du
, Jiejie Zhao
, Hanyang Xia
, Haiquan Wang
, Jiakai Wang
:
BadMDA: Towards Backdoor Injection during Domain Adaptation to Collapse Multi-Agent Perception. 975-983 - Chen Gao

, Youfang Lin
, Wenbin Wang
, Shuo Zhang
:
Epipolar Consistency-based Network for Structure-Aware LF Semantic Segmentation. 984-992 - Jia-Xuan Jiang

, Jiashuai Liu
, Hongtao Wu
, Yifeng Wu
, Zhong Wang
, Qi Bi
, Yefeng Zheng
:
Single Domain Generalization for Multimodal Cross-Cancer Prognosis via Dirac Rebalancer and Distribution Entanglement. 993-1002 - Yi Liu, Xinyi Liu, Yi Wan, Panwang Xia, Qiong Wu, Yongjun Zhang:

StereoINR: Cross-View Geometry Consistent Stereo Super Resolution with Implicit Neural Representation. 1003-1012 - Lanhu Wu

, Zilin Gao
, Hao Fei
, Mong-Li Lee
, Wynne Hsu
:
LEAF-Mamba: Local Emphatic and Adaptive Fusion State Space Model for RGB-D Salient Object Detection. 1013-1022 - Min Li

, Jinghui He
, Jiachen Li
, Delong Han
, Jin Wan
, Gang Li
:
HGCF: Hierarchical Geometry-Color Fusion for Multimodal Industrial Anomaly Detection. 1023-1031 - Qiyuan Zhu

, Lujun Li
, Dezhi Li
, Jiacheng Liu
, Pengyu Cheng
, Yucheng Xu
, Sirui Han
, Yike Guo
:
Outlier-Aware Model Merging for Efficient Multitask Inference. 1032-1041 - Zhenyang Liu

, Sixiao Zheng
, Siyu Chen
, Cairong Zhao
, Longfei Liang
, Xiangyang Xue
, Yanwei Fu
:
A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding. 1042-1051 - Jinbao Wei

, Yuhang Chen
, Zhijie Wang
, Gang Yang
, Shimin Tao
, Jian Gao
, Aiping Liu
, Xun Chen
:
Rethinking Diffusion Bridge Model with Dual Alignments for Medical Image Synthesis. 1052-1061 - Haichuan Fang

, Haoran Zhang
, Yulin Du
, Qiang Guo
, Zhen Tian
, Youwei Wang
, Yangdong Ye
:
CDIB: Consistency Discovery-guided Information Bottleneck for Multi-modal Knowledge Graph Reasoning. 1062-1071 - Yalan Qin

, Nan Pu
, Hanzhou Wu
, Zhaoxin Fan
:
Flexible Multi-view Clustering with Dynamic Views Generation. 1072-1081 - Zheng Guan

, Xue Wang
, Wenhua Qian
, Peng Liu
, Runzhuo Ma
:
Residual Prior-driven Frequency-aware Network for Image Fusion. 1082-1091 - Mulin Chen

, Bocheng Wang
, Jiaxin Zhong
, Zongcheng Miao
, Xuelong Li
:
Clustering-Oriented Generative Attribute Graph Imputation. 1092-1101 - Taichun Zhou

, Zhibin Dong
, Siwei Wang
, Ke Liang
, Miaomiao Li
, Xinwang Liu
, En Zhu
, Xiangjun Dong
:
DPFMVC: Dynamic Progressive Fusion for Multi-view Clustering. 1102-1111 - Runlin Yu

, Yipu Gong
, Wenrui Li
, Aiwen Sun
, Mengren Zheng
:
Discrepancy-Aware Attention Network for Enhanced Audio-Visual Generalized Zero-Shot Learning. 1112-1121 - Ziming Quan

, Penglei Wang
, Danyang Wu
, Jin Xu
:
Unsupervised Cross-view Message Passing Method for Multi-view Graph Clustering. 1122-1131 - Mingrui Li

, Dong Li
, Sijia Hu
, Kangxu Wang
, Zhenjun Zhao
, Hongyu Wang
:
SLAM-X: Generalizable Dynamic Removal for NeRF and Gaussian Splatting SLAM. 1132-1140 - Jinjia Peng

, Tianhang Cheng
, Guangqi Jiang
, Huibing Wang
:
Prior-oriented Anchor Learning with Coalesced Semantics for Multi-View Clustering. 1141-1150 - Hao Wang

, Hanxiao Li
, Li Xu
:
CrosST: Cross Swin 4D Transformer for Multi-Modal Alzheimer's Detection. 1151-1160 - Binbin Zheng

, Aiqiu Wu
, Kai Fan
, Ao Li
, Minghui Wang
:
Domain-Specific Interactive Prompting for Generalized Nuclei Classification. 1161-1170 - Shaochen Zhang

, Zekun Qi
, Runpei Dong
, Xiuxiu Bai
, Xing Wei
:
Positional Prompt Tuning for Efficient 3D Representation Learning. 1171-1180 - Zhicheng Dong

, Xiaodong Yue
, Yufei Chen
, Yuxian Zhou
:
Trusted Open-World Multi-View Classification with Dynamic Opinion Aggregation. 1181-1189 - Zihan Wang

, Yunhang Shen
, Yuan Fang
, Zuwei Long
, Ke Li
, Xing Sun
, Jiao Xie
, Shaohui Lin
:
Towards Universal Perception through Language-Guided Open-World Object Detection. 1190-1199 - Junyu Chen

, Jiawei Peng
, Yuan Sun
, Jian Dai
, Xingfeng Li
, Zhenwen Ren
:
Scalable Unpaired Multi-View Clustering via Anchor-Driven High-Throughput Encoding. 1200-1209 - Zeyan Li

, Cankun Guo
, Yin Tang
:
Modal Symbiosis: Variational Alignment Unveils New Horizons in Multimodal Representation Learning. 1210-1219 - Zihan Fang

, Zhiyong Xu
, Lan Du
, Shide Du
, Zhiling Cai
, Shiping Wang
:
Enhancing Multi-view Open-set Learning via Ambiguity Uncertainty Calibration and View-wise Debiasing. 1220-1228 - Zhangyong Tang

, Tianyang Xu
, Xuefeng Zhu
, Chunyang Cheng
, Tao Zhou
, Xiaojun Wu
, Josef Kittler
:
Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking. 1229-1238 - Weiqi Liu

, Yongshan Zhang
, Xinxin Wang
, Lefei Zhang
:
Deep Multi-Level Contrastive Clustering for Multi-Modal Remote Sensing Images. 1239-1247 - Jiaqi Cui

, Yilun Li
, Xi Wu
, Jiliu Zhou, Yan Wang
:
PREMISE: Individual Preference-aware Multi-modal Cooperation for Survival Prediction. 1248-1257 - Jiaxing Qi

, Yifan Xu
, Zhifei Yang
, Ruifei Ma
, Chao Zhang
, Kuifei Yu
:
BridgeGLM: Bridging Graph and Language Spaces for Domain Generalization. 1258-1267 - Yating Liu

, Yang Zou
, Xingyuan Li
, Xingyue Zhu
, Kaiqi Han
, Zhiying Jiang
, Long Mau
, Jinyuan Liu
:
Toward a Training-Free Plug-and-Play Refinement Framework for Infrared and Visible Image Registration and Fusion. 1268-1277 - Cai Xu

, Ziqi Wen
, Jie Zhao
, Wanqing Zhao
, Jinlong Yu
, Haishun Chen
, Ziyu Guan
, Wei Zhao
:
Beyond Equal Views: Strength-Adaptive Evidential Multi-View Learning. 1278-1287 - Yoorhim Cho

, Hongyeob Kim
, Semin Kim
, Youjia Zhang
, Yunseok Choi
, Sungeun Hong
:
RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual Data. 1288-1297 - Xinlei Yu

, Changmiao Wang
, Hui Jin
, Ahmed Elazab
, Gangyong Jia
, Xiang Wan
, Changqing Zou
, Ruiquan Ge
:
CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation. 1298-1307 - Bingyu Li

, Da Zhang
, Zhiyuan Zhao
, Junyu Gao
, Xuelong Li
:
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation. 1308-1317 - Liang Zhao

, Shubin Ma
, Bo Xu
, Qingchen Zhang
:
Dual-Learning based Penalized Multi-Align Clustering for Multi-View Incomplete and Disorderly Data. 1318-1326 - Jialei Cui

, Jianwei Du
, Yanzhe Li
, Lei Gao
, Hui Jiang
, Chenfu Bao
:
HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection. 1327-1336 - Disen Hu

, Xun Jiang
, Zhe Sun
, Hao Yang
, Chong Peng
, Peng Yan
, Heng Tao Shen
, Xing Xu
:
Geometric Gradient Divergence Modulation for Imbalanced Multimodal Learning. 1337-1345 - Xuanming Jiang

, Baoyi An
, Zhengwei Zou
, Dingyu Nie
, Jialie Shen
, Xueming Qian
, Guoshuai Zhao
:
Ear with Eye: Lightweight Multimodal Audio-Visual Network Inspired by Bionic Structures. 1346-1355 - Chengzhou Li

, Xiaokang Liu
, Qi Jia
, Jinyuan Liu
, Zhiying Jiang
, Longhan Feng
, Yu Liu
, Zhongxuan Luo
, Xin Fan
:
Physics-Guided Sonar Image Fine-grained Recognition under Scarce Annotations. 1356-1365 - Mianzimei Yang

, Zhipeng Zhou
, Jin Zhang
, Yuanhao Pu
, Hong Xie
, Defu Lian
:
Conflict-Buffering Optimization by Symmetry Teleportation for Deep Long-Tailed Recognition. 1366-1375 - Jiahao Wang

, Fang Liu
, Licheng Jiao
, Hao Wang
, Shuo Li
, Lingling Li
, Puhua Chen
, Xu Liu
, Xinyi Wang
:
FA3T: Feature-Aware Adversarial Attacks for Multi-modal Tracking. 1376-1385 - Zhiwei Zhang

, Ruikai Xu
, Weijian Zhang
, Zhizhong Zhang
, Xin Tan
, Jingyu Gong
, Yuan Xie
, Lizhuang Ma
:
PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion. 1386-1394 - Siyuan Zhang

, Xiaoping Wang
, Jiang Li
, Weibin Feng
, Xin Zhan
, Hongzhi Huang
:
HAFUNet: A Hierarchical Attention Fusion Network for Monocular Depth Estimation Integrating Event and Frame Data. 1395-1403 - Ronghui Li

, Lingxiao Han
, Shi Shu
, Yueyao Liu
, Yukang Lin
, Yue Ma
, Jie Guo
, Ziwei Liu
, Xiu Li
:
A Motion is Worth a Hybrid Sentence: Taming Language Model for Unified Motion Generation by Fine-grained Planning. 1404-1413 - Hongyu Jiang

, Yuxin Huo
, Sirou Sheng
, Hong Tao
, Chenping Hou
:
Scalable One-step Unaligned Multi-view Clustering via Joint High-Order Correlation Learning. 1414-1422 - Xiangping Zheng

, Xuan Feng
, Bo Wu
, Bin Ren
, Wei Li
, Xiuxin Hao
, Xun Liang
, Bin Tang
, Zhiwen Yu
:
Breaking Semantic Barriers: A Zero-Shot Generalized Framework for Graph Anomaly Detection. 1423-1432 - Mi Zheng

, Guanglei Yang
, Zitong Huang
, Zhenhua Guo
, Kevin Han
, Wangmeng Zuo
:
Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous Driving. 1433-1442 - Yuhao Wang

, Lingjuan Miao
, Zhiqiang Zhou
, Lei Zhang
, Yajun Qiao
:
Infrared and Visible Image Fusion with Language-Driven Loss in CLIP Embedding Space. 1443-1451 - Min Dang

, Gang Liu
, Jingqi Zhao
, Adams Wai-Kin Kong
, Nan Luo
, Di Wang
:
DDFD: Diffusion-Based Denoising Fusion for Object Detection in Infrared-Visible Images. 1452-1461 - Jiahuan Long

, Wen Yao
, Tingsong Jiang
, Jiacheng Hou
, Shuai Jia
, Junqi Wu
, Xiaoya Zhang
, Xiaohu Zheng
, Chao Ma
:
CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors. 1462-1470 - Peirong Zhang

, Kai Ding
, Lianwen Jin
:
Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting Verification. 1471-1479 - Zhenxi Wang

, Zongyao Yin
, Yujie Hou
, Xianchuan Yu
:
Robust Multi-view Clustering via Pseudo Label Guided Universum Learning. 1480-1489 - Yao Zhang

, Ping Huang
, Rui Zhang
:
Multimodal Dual Population Evolutionary Reinforcement Learning. 1490-1499 - Bo Xu

, Jie Wei
, Hongya Wang
, Ming Du
, Hui Song
, Yanghua Xiao
:
Bridging the Unseen Gap: Label-Enhanced Information Bottleneck Distillation for Multimodal Named Entity Recognition. 1500-1509 - Mingle Zhou

, Jiahui Liu
, Jin Wan
, Gang Li
, Min Li
:
Exploring Multimodal Prompts For Unsupervised Continuous Anomaly Detection. 1510-1519 - Hongming Wang

, Yifeng Wu
, Huimin Huang
, Hongtao Wu
, Jiaxuan Jiang
, Xiaodong Zhang
, Hao Zheng
, Yawen Huang
, Xian Wu
, Yefeng Zheng
, Jinping Xu
, Jing Cheng
:
BrainSegDMIF: A Dynamic Fusion-enhanced SAM for Brain Lesion Segmentation. 1520-1529 - Tairan Huang

, Yili Wang
, Qiutong Li
, Changlong He
, Jianliang Gao
:
Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection. 1530-1538 - Naichuan Zheng

, Yuchen Du
, Hailun Xia
, Zeyu Liang
:
Signal-SGN: A Spiking Graph Convolutional Network for Skeleton Action Recognition via Learning Temporal-Frequency Dynamics. 1539-1548 - Yang Zhou

, Jin Wang
, Yuxiao Zhang
, Kaixiang Huang
, Guodong Lu
, Jingru Yang
, Shengfeng He
:
Art4Math: Handwritten Mathematical Expression Recognition via Multimodal Sketch Grounding. 1549-1558 - Feiyu Peng

, Chaobo He
, Junwei Cheng
, Huijuan Hu
, Wenkai Zhang
, Youda Mo
:
Frequency-refined Graph Convolution Network with Cross-modal Wavelet Denoising for Recommendation. 1559-1568 - Chuan Zeng

, Zhao Zhang
, Wei Huang
, Lei Zhang
, Le Yi
, Kefu Zhao
:
DC2-SR: A Dual-Consistency Guided Curriculum Learning method for Thick-Slice Fetal MRI Super-Resolution. 1569-1578 - An Xiang

, Zixuan Huang
, Xitong Gao
, Kejiang Ye
, Cheng-zhong Xu
:
BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection. 1579-1587 - Hui Li

, Pengfei Yang
, Juanyang Chen
, Le Dong
, Yanxin Chen
, Quan Wang
:
MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation. 1588-1597 - Shifeng Bao

, Zhe Xue
, Qi Chen
, Shilong Ou
, Amin Beheshti
, Quan Z. Sheng
, Anton van den Hengel
, Yuankai Qi
:
CausalMVC: Causal Content-Style Representation Learning for Deep Multi-View Clustering. 1598-1606 - Wei Li

, Junwei Zhu
, Honghui Xu
, Jiawei Jiang
, Jianwei Zheng
:
SpecSolver: Solving Spatial-Spectral Fusion via Semantic Transformer. 1607-1616 - Junwei Zhu

, Wei Li
, Honghui Xu
, Jiawei Jiang
, Zhi Liu
, Jianwei Zheng
:
Arbitrary-scale Fusion Neural Operator. 1617-1626 - Zhongyun Bao

, Gang Fu
, Jianchi Sun
, Jing Zhou
, Ziqi Yu
, Chunxia Xiao
:
I 2HDiffuser: Image Illumination Harmonization Meets the Diffusion Model. 1627-1636 - Weitai Kang

, Luowei Zhou
, Junyi Wu
, Changchang Sun
, Yan Yan
:
Visual Grounding with Attention-Driven Constraint Balancing. 1637-1645 - Pengfei Ren

, Jingyu Wang, Haifeng Sun
, Qi Qi
, Jing Wang, Jianxin Liao:
Rule Meets Learning: Confidence-Aware Multi-View Fusion for Self-Supervised 3D Hand Pose Estimation. 1646-1655 - Bingfeng Liu

, Songwei Pei
, Shuhuai Wang
, Wenzheng Yang
, Qian Li
, Shangguang Wang
:
Prior-Constrained Relevant Feature driven Image Fusion with Hybrid Feature via Mode Decomposition. 1656-1665 - Yue Zhu

, Haiwen Diao
, Shang Gao
, Jiazuo Yu
, Jiawen Zhu
, Yunzhi Zhuge
, Shuai Hao
, Xu Jia
, Lu Zhang
, Ying Zhang
, Huchuan Lu
:
Regularizing Subspace Redundancy of Low-Rank Adaptation. 1666-1675 - Jintian Ji

, Songhe Feng
:
Anchors Bring Stability and Efficiency: Fast Tensorial Multi-view Clustering on Shuffled Datasets. 1676-1685 - Ziyu Wang

, Yiming Du
, Rui Ning
, Lusi Li
:
Energy-based Deep Incomplete Multi-View Clustering. 1686-1694 - Kai Zhu

, Jun Yin
:
Neighbor Contrastive Learning with Weakened Consensus Graph for Deep Multi-View Clustering. 1695-1703 - Hankun Liu

, Yujian Zhao
, Guanglin Niu
:
Try Harder: Hard Sample Generation and Learning for Cloth-Changing Person Re-ID. 1704-1713 - Shide Du

, Chunming Wu
, Zihan Fang
, Wendi Zhao
, Yilin Wu
, Changwei Wang
, Shiping Wang
:
LargeMvC-Net: Anchor-based Deep Unfolding Network for Large-scale Multi-view Clustering. 1714-1723 - Quangui He

, Jiahui Qu
, Wenqian Dong
, Song Xiao
, Qinghao Gao
:
Cycle-Consistent Mamba-Based Registration-Fusion Joint Network for Unregistered Hyperspectral Image Super-Resolution. 1724-1733 - Liyuan Cao

, Zihang Guo
, Huaiwen Zhang
:
Event Consistency-aware Robust Fake News Detection. 1734-1743 - Qi Peng

, Jialin Cui
, Jiayuan Xie
, Yi Cai
, Qing Li
:
Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree. 1744-1753 - Mengzhen Wang

, Xunbin Huang
, Jiayuan Xie
, Shukai Ma
, Jiale Men
, Dayong Liang
, Yi Cai
:
From Model Diagram to Code: A Benchmark Dataset and Multi-Agent Framework. 1754-1763 - Ziqiang Shi

, Rujie Liu
, Jun Takahashi
, Shan Jiang
:
TrueCount: Improving Open-World Object Counting with Visual-Language Models and Dynamic Multi-Modal Inputs. 1764-1773 - Hong Gao

, Xiangkai Xu
, Tianqi Zhu
, Xiugang Dong
, Yiming Bao
, Min-Ling Zhang
:
Radar-Mamba: 4D Millimeter-Wave Point Cloud Enhancement via State Space Models. 1774-1782 - Jiangyong Yu

, Sifan Zhou
, Dawei Yang
, Shuoyu Li
, Shuo Wang
, Xing Hu
, Chen Xu
, Zukang Xu
, Changyong Shu
, Zhihang Yuan
:
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Static Quantization. 1783-1792 - Peican Zhu

, Yubo Jing
, Le Cheng
, Keke Tang
, Yangming Guo
:
KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection. 1793-1801 - Runqi Wang

, Caoyuan Ma
, Jian Zhao
, Hanrui Xu
, Dongfang Sun
, Haoyang Chen
, Lin Xiong
, Zheng Wang
, Xuelong Li
:
Leader is Guided: Interactive Motion Generation via Lead-Follow Paradigm and Trajectory Guidance. 1802-1811 - Xuesong Li

, Jinguang Tong
, Jie Hong
, Vivien Rolland
, Lars Petersson
:
DGNS: Deformable Gaussian Splatting and Dynamic Neural Surface for Monocular Dynamic 3D Reconstruction. 1812-1821 - Pingting Hao

, Huijie Zhang
, Yongshan Zhang
:
Tensor-based Opposing yet Complementary Learning for Multi-view Multi-label Feature Selection. 1822-1831 - Hui Liu

, Chen Jia
, Fan Shi
, Xu Cheng
, Mengfei Shi
, Xia Xie
, Shengyong Chen
:
LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks. 1832-1841 - Mufan Liu

, Wu Ran
, Zhiquan He
, Zuojie Xie
, Hong Lu
, Peirong Ma
:
Implicit Retinex Decomposition with Chromaticity Disentanglement for Low-Light Image Enhancement. 1842-1851 - Chenbo Zhang

, Bing Huangfu
, Hongxu Ma
, Jihong Guan
, Shuigeng Zhou
:
Multi-modal Prototype Guided Few-shot Object Detection. 1852-1861 - Qiyin Zhong

, Xianglin Qiu
, Xiaolei Wang
, Zhen Zhang
, Gang Liu
, Jimin Xiao
:
FAMRD: Frequency-Aware Multimodal Reverse Distillation for Industrial Anomaly Detection. 1862-1871 - Lei Xie

, Junxiong Huang
, Yuanjing Feng
, Qingrun Zeng
:
Tractography-Guided Dual-Label Collaborative Learning for Multi-Modal Cranial Nerves Parcellation. 1872-1879 - Guoqiang Liang

, Chuan Qin
, De Cheng
, Shizhou Zhang
, Yanning Zhang
:
Boosting Multi-Modal Alignment: Geometric Feature Separation for Class Incremental Learning. 1880-1889 - Xueheng Li

, Xuanhua He
, Tao Hu
, Jie Zhang
, Man Zhou
, Chengjun Xie
, Yingying Wang
, Bo Huang
:
Freq-RWKV: Granularity-Aware Spatial-Frequency Synergy via Dual-Domain Recurrent Scanning for Pan-sharpening. 1890-1899 - Lingren Wang

, Wenxuan Tu
, Jieren Cheng
, Jianan Wang
, Xiangyan Tang
, Chenchen Wang
:
Discovering Maximum Frequency Consensus: Lightweight Federated Learning for Medical Image Segmentation. 1900-1909 - Nan Gao

, Junchao Zhu
, Yilong Zhang
, Ronghua Liang
, Guodao Sun
, Peng Chen
:
Dual Teacher with Dempster-Shafer Guidance for Decision Making in Semi-Supervised Small Object Detection. 1910-1919 - Nan Ma

, Beining Sun
, Yiheng Han
, Genbao Xu
:
Kinematic Enhanced Hypergraph Convolutional Network for Skeleton-based Human Action Recognition with LLM Training Guides. 1920-1928 - Yufei Zhang

, Yicheng Xu
, Hongxin Wei
, Zhiping Lin
, Xiaofeng Zou
, Cen Chen
, Huiping Zhuang
:
Analytic Continual Test-Time Adaptation for Multi-Modality Corruption. 1929-1937 - Pengfei Gu

, Hongxiao Wang
, Yejia Zhang
, Huimin Li
, Chaoli Wang
, Danny Chen
:
TopoImages: Incorporating Local Topology Encoding into Deep Learning Models for Medical Image Classification. 1938-1947 - Dawei Lin

, Meng Yuan
, Ziming Wang
, Tieru Wu
, Yuanning Liu
:
FreeCAD: A Multimodal Framework for 3D CAD Model Generation from Free-Form Prompts. 1948-1956 - Renjie Lin

, Jiacheng Li
, Shide Du
, Shiping Wang
, Le Zhang
:
OIMGC-Net: Optimization-inspired Interpretable Multi-view Graph Clustering Network. 1957-1966 - Qi Shen

, Junchang Xin
, Bing Tian Dai
, Shudi Zhang
, Xinyao Liu
, Zhiqiong Wang
:
ElaSleepNet: Exploring an Elastic Multimodal Neural Network for Sleep Staging via Temporal and Contextual Consistency Learning. 1967-1976 - Zeyu Zhu

, Ke Liang
, Lingyuan Meng
, Xingchen Hu
, Xinwang Liu
, Wanwei Liu
, Kunlun He
:
SALVG: Latent Variable Gene Augmented Graph Learning for Multi-View Clustering in Spatial Transcriptomics. 1977-1986 - Lamei Di

, Bin Zhang
, Yiming Wang
, Wenxia Zhang
:
Frequency Meets Semantics: Text-Visual Fusion with Directional Spectral Enhancement for Salient Object Detection in Optical Remote Sensing Images. 1987-1996 - Miaosen Luo

, Yuncheng Jiang
, Sijie Mai
:
Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis. 1997-2006 - Zeyu Xia

, Canqun Yang
, Haoang Chi
, Tao Tang
, Weiming Xiang
, Yingbo Cui
:
MMF-SV: A Multi-Modal Feature Fusion-Based Structural Variant Caller. 2007-2015 - Ziang Li

, Chengxiang Si
, Zhenyu Cheng
:
Zero in on the Target: A Composite Robust Model for Retrieving Information in Traffic Data to Discover Network Attacks. 2016-2025 - Long Chen

, De Cheng
, Shizhou Zhang
, Yinghui Xing
, Di Xu
, Yanning Zhang
:
Amplitude-aware Domain Style Replay for Lifelong Person Re-identification. 2026-2035 - Jie Qin

, Wei Yang
, Yan Su
, Yiran Zhu
, Weizhen Li
, Yunyue Pan
, Chengchang Pan
, Honggang Qi
:
HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction. 2036-2043 - Zhaochen Guo

, Zhixiang Shen
, Xuanting Xie
, Liangjian Wen
, Zhao Kang
:
Disentangling Homophily and Heterophily in Multimodal Graph Clustering. 2044-2053 - Zhishuo Zhao

, Yi Lin
, Dongyue Guo
, Junyu Fan
:
AV-RISE: Hierarchical Cross-Modal Denoising for Learning Robust Audio-Visual Speech Representation. 2054-2063 - Jiahao Zhang

, Wenzhe Yin
, Shujian Yu
:
Cross-Modal Retrieval with Cauchy-Schwarz Divergence. 2064-2073 - Xinbo Geng

, Fan Shi
, Xu Cheng
, Chen Jia
, Meng Zhao
, Shengyong Chen
:
LFMamba: Focal Stack-aware State Space Modeling for Light Field Salient Object Detection. 2074-2083 - Xiaodi Xu

, Lijie Li
, Ye Wang
, Tao Ren
, Tian Qiao
:
WFF: Wavelet-based Information Fusion for Multimodal Knowledge Graph Link Prediction. 2084-2093 - Xuyao Liu

, Jiahui Qu
, Wenqian Dong
:
Breaking the Spatial-Temporal Consistency Constraint: Towards Reference-Based Hyperspectral Image Super-Resolution. 2094-2103 - Yifan Liu

, Yu Fang
, Zhouhan Lin
:
Visual-informed Silent Video Identity Conversion. 2104-2112 - Zebing Yao

, Hao Fu
, Yuanhang Yang
, Guanghua Gu
:
Dynamic Optimization Noisy Cross-Modal Hashing. 2113-2121 - Yuhang Lan

, Shilin Xu
, Chao Su
, Run Ye
, Dezhong Peng
, Yuan Sun
:
Multi-view Hashing Classification. 2122-2130 - Jielong Lu

, Zhihao Wu
, Jiajun Yu, Qianqian Shen, Jiajun Bu, Haishuai Wang:
Where Views Meet Curves: Virtual Anchors for Hyperbolic Multi-View Graph Diffusion. 2131-2140 - Jun Yang

, Maoyu Mao
:
DiffuSeg: Diffusion-Enhanced Cross-Modal Semantic Segmentation for RGB-D. 2141-2149 - Haochen Yang

, Lei Li
, Jiacheng Guo
, Baolu Li
, Minghai Qin
, Hongkai Yu
, Tianyun Zhang
:
DA3D: Domain-Aware Dynamic Adaptation for All-Weather Multimodal 3D Detection. 2150-2158 - Wentao Wu

, Xiao Wang
, Chenglong Li
, Bo Jiang
, Jin Tang
, Bin Luo
, Qi Liu
:
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework. 2159-2168 - Yichen Bao

, Yuxuan Liu
, Yu Duan
, Jing Li
, Quanxue Gao
:
Multi-view Clustering Based on Probabilistic Tensor Regression. 2169-2177 - Xingchen Li

, Wuyang Zhang
, Guoliang You
, Xiaomeng Chu
, Wenhao Yu
, Yifan Duan
, Yuxuan Xiao
, Yanyong Zhang
:
CalibWorkflow: A General MLLM-Guided Workflow for Centimeter-Level Cross-Sensor Calibration. 2178-2187 - Yongzheng Liu

, Siru Zhong
, Gefeng Luo
, Weilin Ruan
, Yuxuan Liang
:
Towards Multi-Scenario Forecasting of Building Electricity Loads with Multimodal Data. 2188-2196 - Yan Chen

, Bingbing Jiang
, Peng Zhou
, Lei Duan
, Yuhua Qian
, Liang Du
:
Balanced Multiple Kernel Clustering with Discrete Partition Entropy Auto Regularization. 2197-2206 - Jiale Zou

, Yan Chen
, Bingbing Jiang
, Peng Zhou
, Liang Du
, Lei Duan
, Yuhua Qian
:
Robust Tensor Learning with Graph Diffusion for Scalable Multi-view Graph Clustering. 2207-2215 - Linxin Xiao

, Xin Wang
, Zeyang Zhang
, Yang Yao
, Wenwu Zhu
:
DyNAS-DDI: Dynamic Pairwise Architecture Search for Generalizable Drug-Drug Interaction LLM. 2216-2225 - Jianxiang Xie

, Yao Wu
, Yachao Zhang
, Xiaopei Zhang
, Yuan Xie
, Yanyun Qu
:
PLATO-TTA: Prototype-Guided Pseudo-Labeling and Adaptive Tuning for Multi-Modal Test-Time Adaptation of 3D Segmentation. 2226-2234 - Shilin Liu

, Kyohei Kamikawa
, Keisuke Maeda
, Takahiro Ogawa
, Miki Haseyama
:
Context-aware Image-to-Music Generation via Bridging Modalities through Musical Captions. 2235-2243 - Yan Li

, Xingchen Hu
, Jiyuan Liu
, Zhong Liu
:
Federated Incomplete Multi-view Clustering with Individual Structure Preservation and Central Representation Tensorization. 2244-2253 - Hanghui Guo

, Weijie Shi
, Mengze Li
, Juncheng Li
, Hao Chen
, Yue Cui
, Jiajie Xu
, Jia Zhu
, Jiawei Shen
, Zhangze Chen
, Sirui Han
:
Consistent and Invariant Generalization Learning for Short-video Misinformation Detection. 2254-2263 - Ruilin Yao

, Yi Rong
, Tianyu Zou
, Bo Zhang
, Jian Li
, Shengwu Xiong
, Shili Xiong
:
MAP: Parameter-Efficient Tuning for Referring Expression Comprehension via Multi-Modal Adaptive Positional Encoding. 2264-2273 - Hongyang Lin

, Kuixiang Shao
, Peijun Xu
, Zhuoyang Bu
, Yuyang Jiao
, Ziyuan Tang
, Chenxi Xiao
, Jingyi Yu
:
HandCraft: Tactile-Informed Hand-Object Dynamics Capture and Realistic Rendering. 2274-2283 - Linxuan Luo

, Pan Mu
, Cong Bai
:
Physics-Coupled Frequency Dynamic Adaptation Network for Domain Generalized Underwater Object Detection. 2284-2293 - Yanfeng Liu

, Lefei Zhang
:
Multimodal Decomposed Distillation with Instance Alignment and Uncertainty Compensation for Thermal Object Detection. 2294-2303 - Rui Wang

, Yuxuan Liu
, Guangyu Yang
, Quanxue Gao
, Cheng Deng
:
Bi-Orthogonal Non-negative Tensor tri-Factorization for Tensorized Label Learning. 2304-2312 - Xin Peng

, Bowen Liu
, Renxiang Guan
, Wenxuan Tu
:
Multi-view Graph Clustering with Dual Structure Awareness for Remote Sensing Data. 2313-2322 - Mingliang Yan

, Yanhua Yu
, Ruochi Zhang
, Zhiyuan Liu
, Ruicheng Zhang
, Yimeng Ren
, Kangkang Lu
, Zhiyong Huang
, Feng Luo
, Zhen Cai
:
DeepMolTex: Deep Alignment of Molecular Graphs with Large Language Models via Mixture of Modality Experts. 2323-2332 - Xinzhu Li

, Juepeng Zheng
, Yikun Chen
, Xudong Mao
, Guanghui Yue
, Wei Zhou
, Chenlei Lv
, Ruomei Wang
, Fan Zhou
, Baoquan Zhao
:
DepthGait: Multi-Scale Cross-Level Feature Fusion of RGB-Derived Depth and Silhouette Sequences for Robust Gait Recognition. 2333-2341 - Tianming Xu

, Tiantian Guo
, Youdan Feng
, Zihan Chen
, Qiaoyi Xue
, Lingzhi Hu
, Yuhang Shi
:
Anatomical Region-Guided 3D PET/MR Tumor Segmentation via Medical Record. 2342-2351 - Rongqiang Fang

, Yongqi Sun
, Jidong Yuan
, Hongbo Cao
, Jinkun Dong
:
A Language-Assisted Semantic-Aware Disentangled Method for Link Prediction on Heterogeneous Graphs. 2352-2361 - Guimin Hu

, Yi Xin
, Lijie Hu
, Zhihong Zhu
, Hasti Seifi
:
PgM: Partitioner Guided Modal Learning Framework. 2362-2371 - Kaixiang Wang

, Xiaojian Ding
, Wanqi Yang
, Ming Yang
:
Label-Semantics-Guided Multi-View Multi-Label Learning via High-Order Semantic Fusion. 2372-2380 - Chenyang Zhou

, Monghjaya Ha
, Chao Tang
, Licheng Wu
:
UniMTR: Unified Recognition of Dual-style Traditional Mongolian Scripts via Contrastive Representation Alignment. 2381-2389 - Mingyang Yu

, Xiahui Guo
, Peng Chen
, Zhenkai Li
, Yang Shu
:
Towards Measuring and Modeling Geometric Structures in Time Series Forecasting via Image Modality. 2390-2398 - Shu-Xun Yang

, Xian-Ling Mao
, Heyan Huang
:
ESTJ: Enhancing Structured Tendency Judgment in Hybrid-Modal Table Understanding. 2399-2408 - Maoxun Yuan

, Bo Cui
, Tianyi Zhao
, Jiayi Wang
, Shan Fu
, Xue Yang
, Xingxing Wei
:
UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning. 2409-2418 - Nokap Tony Park

:
M2PE-Diff: Music-to-Pose Encoder for Dance Video Generation Leveraging Latent Diffusion Framework. 2419-2428 - Xiaorui Ding

, Huan Ma
, Changqing Zhang
:
A Theoretical Proof of Dynamic Multimodal Fusion Exacerbates Modality Greedy. 2429-2436 - Yiming Xu

, Jiarun Chen
, Zhen Peng
, Zihan Chen
, Qika Lin
, Lan Ma
, Bin Shi
, Bo Dong
:
Court of LLMs: Evidence-Augmented Generation via Multi-LLM Collaboration for Text-Attributed Graph Anomaly Detection. 2437-2446 - Shanghui Deng

, Xiao Zheng
, Chang Tang
, Kun Sun
, Yuanyuan Liu
, Xinwang Liu
:
Find True Collaborators: Banzhaf Index-based Cross View Alignment for Partially View-aligned Clustering. 2447-2456 - Wenlan Chen

, Lu Gao
, Cheng Liang
, Fei Guo
:
Deep Variational Incomplete Multi-View Clustering with Information-Theoretic Guidance. 2457-2466 - Jieyi Ge

, Zhaodong Sun
, Wei Peng
, Chenhang Ying
, Yuwei Chen
, Kui Ren
, Xiaobai Li
:
Evidential Remote Physiological Measurement via Uncertainty-aware Fusion of Video and RF. 2467-2475 - Fujian Ren

, Wenlan Chen
, Lu Gao
, Fei Guo
, Cheng Liang
:
Dual-Level Distribution Alignment for Deep Incomplete Multi-View Clustering. 2476-2485 - Guoyi Li

, Die Hu
, Xiaomeng Fu
, Qirui Tang
, Yulei Wu
, Xiaodan Zhang
, Honglei Lyu
:
Entity Graph Alignment and Visual Reasoning for Multimodal Fake News Detection. 2486-2495 - Peng Zhao

, Zhiguang Cao
, Di Wang
, Wen Song
, Wei Pang
, You Zhou
, Yuan Jiang
:
Visual-Enhanced Multimodal Framework for Flexible Job Shop Scheduling Problem. 2496-2505 - Yu Zhao

, Ying Zhang
, Xuhui Sui
, Baohang Zhou
, Haoze Zhu
, Jeff Z. Pan
, Xiaojie Yuan
:
Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning. 2506-2515 - Jianting Tang

, Yubo Wang
, Haoyu Cao
, Linli Xu
:
CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs. 2516-2525 - Guyue Jin

, Tianming Zhao
, Jiacan Yan
, Tian Tian
:
Contextually-Guided State Space Fusion for Misaligned Multi-Spectral Object Detection. 2526-2535 - Libin Liu

, Shen Chen
, Sen Jia
, Jingzhe Shi
, Can Jin
, Zongkai Wu
, Jenq-Neng Hwang
, Lei Li
:
Graph Canvas for Controllable 3D Scene Generation. 2536-2545 - Berta Céspedes-Sarrias

, Carlos Collado-Capell
, Pablo Rodenas-Ruiz
, Olena Hrynenko
, Andrea Cavallaro
:
MM-HSD: Multi-Modal Hate Speech Detection in Videos. 2546-2555
Content: Vision and Language
- Yijie Yang

, Lianyong Qi
, Weiming Liu
, Fan Wang
, Jing Du
, Yuwen Liu
, Xiaolong Xu
, Qiang Ni
, Wanchun Dou
, Xiaokang Zhou
:
Joint Test-time Adaptation with Refined Pseudo-labels and Latent Score Matching. 2556-2565 - Hua Wang

, Hong Liu
, Jiale Ren
, Mingxin Tan
, Zhongzien Jiang
:
CLIP-6D: Empowering CLIP as a Zero-Shot 6D Pose Estimator Through Generalizable Object-Specific Representations. 2566-2575 - Ruipu Wu

, Yige Zhang
, Jinyu Chen
, Linjiang Huang
, Shifeng Zhang
, Xu Zhou
, Liang Wang
, Si Liu
:
AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation. 2576-2585 - Yihua Shao

, Haojin He
, Sijie Li
, Siyu Chen
, Xinwei Long
, Fanhu Zeng
, Yuxuan Fan
, Muyang Zhang
, Ziyang Yan
, Ao Ma
, Xiaochen Wang
, Hao Tang
, Yan Wang
, Shuyan Li
:
EventVAD: Training-Free Event-Aware Video Anomaly Detection. 2586-2595 - Qiuyu Liang

, Yongqiang Zhang
:
SAM based Region-Word Clustering and Inference Score Adjusting for Open-Vocabulary Object Detection. 2596-2605 - Xiao Liang

, Jiawei Hu
, Di Wang
, Zhi Ma
, Lin Zhao
, Ronghan Li
, Bo Wan
, Quan Wang
:
CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale. 2606-2615 - Qian Sun

, Chengzhuo Lu
, Wenyu Chen
, Wenjie Wei
, Jingya Wang
, Jieyuan Zhang
, Xiaoli Liu
, Yalan Ye
, Yang Yang
, Malu Zhang
:
Temporal-coded Spiking Transformer. 2616-2624 - Yuwu Lu

, Haoyu Huang
, Xue Hu
:
Domain-aware Visual Context Prompt for Multi-Source Domain Adaptation. 2625-2633 - Xingke Song

, Jianxu Shangguan
, Yiran Li
, Jialu Zhang
, Jianfeng Ren
, Ruibin Bai
, Xin Chen
, Xudong Jiang
:
CEARI: Co-Evolutionary Agents for Reassembling and Inpainting Puzzles with Gaps and Missing Pieces. 2634-2642 - Xiaoyu Chen

, Yigang Cen
, Wanru Xu
, Yue Zhang
, Yi Jin
, Yidong Li
, Linna Zhang
:
Hierarchical Meta-prototypes Network for Few-shot Action Recognition. 2643-2652 - Kyungjune Lee

, Seongjean Kim
, Hoseok Tong
, Hyucksang Lee
, Seongmin Lee
, Weisi Lin
, Ping An
, Sanghoon Lee
:
Domain Crossover Non-Rigid Registration for 3D Human Meshes. 2653-2662 - Jingyao Wang

, Yiming Chen
, Lingyu Si
, Changwen Zheng
:
Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection. 2663-2672 - Yuxing Liu

, Ji Zhang
, Xuchuan Zhou
, Jingzhong Xiao
, Huimin Yang
, Jiaxin Zhong
:
OoDDINO: A Multi-level Framework for Anomaly Segmentation on Complex Road Scenes. 2673-2682 - Si-Woo Kim

, MinJu Jeon
, Ye-Chan Kim
, Soeun Lee
, Taewhan Kim
, Dong-Jin Kim
:
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning. 2683-2692 - Xun Zhu

, Fanbin Mo
, Zheng Zhang
, Jiaxi Wang
, Yiming Shi
, Ming Wu
, Chuang Zhang
, Miao Li
, Ji Wu
:
Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data. 2693-2702 - Linpu He

, Yanan Li
, Bingze Li
, Elvis Han Cui
, Donghui Wang
:
DSS-Prompt: Dynamic-Static Synergistic Prompting for Few-Shot Class-Incremental Learning. 2703-2712 - Yian Li

, Wentao Tian
, Yang Jiao
, Tianwen Qian
, Na Zhao
, Bin Zhu
, Jingjing Chen
, Yu-Gang Jiang
:
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning. 2713-2722 - Yifei Deng

, Chenglong Li
, Futian Wang
, Jin Tang
:
Learning Hierarchical Cross-modal Association with Intra-modal Context for Text-Image Person Retrieval. 2723-2731 - Xiubo Liang

, Hongzhi Wang
, Zigen Li
, Jinxing Han
, Yu Zhao
, Weidong Geng
:
SGM-Transformer: Rethinking Gradient Information Loss and Compensation in Spiking Neural Networks. 2732-2741 - Qinyue Tong

, Ziqian Lu
, Jun Liu
, Yangming Zheng
, Zhe-Ming Lu
:
MediSee: Reasoning-Based Pixel-Level Perception in Medical Images. 2742-2751 - Shuyong Gao

, Qianyu Guo
, Yu'ang Feng
, Chunyuan Chen
, Xujun Wei
, Yan Wang
, Wenqiang Zhang
:
Progressive Representation Learning for Weakly-Supervised Camouflaged Object Detection. 2752-2761 - Huaihai Lyu

, Chaofan Chen
, Yuheng Ji
, Changsheng Xu
:
EgoPrompt: Prompt Learning for Egocentric Action Recognition. 2762-2770 - Yuwu Lu

, Chunzhi Liu
, Yihan Yang
:
CWCP: Generalizing Virtual Reality to Real World with Contextual-Weather Correlation Pairing for Deraining and Desnowing. 2771-2780 - Pei Liu

, Xin Liu
, Ruoyu Yao
, Junming Liu
, Siyuan Meng
, Ding Wang
, Jun Ma
:
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation. 2781-2790 - Yan Zhang

, Shiwen He
, Lin Yuan
, Jiaxu Leng
, Xinbo Gao
:
DichotomyIR: Universal Image Reconstruction via Dichotomy Classification and Uncertainty Elimination. 2791-2800 - Francesco Tonini

, Lorenzo Vaquero
, Alessandro Conti
, Cigdem Beyan
, Elisa Ricci
:
Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection. 2801-2810 - Zhilin Huang

, Chujun Qin
, Yifei Xing
, Wenming Yang
:
Enhanced Motion-aware Latent Diffusion Models for Video Frame Interpolation. 2811-2820 - Zeming Wei

, Junyi Lin
, Yang Liu
, Weixing Chen
, Jingzhou Luo
, Guanbin Li
, Liang Lin
:
3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians. 2821-2830 - Huy Le

, Nhat Chung
, Tung Kieu
, Anh Nguyen
, Ngan Le
:
BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance. 2831-2840 - Jianghang Lin

, Yue Hu
, Jiangtao Shen
, Yunhang Shen
, Liujuan Cao
, Shengchuan Zhang
, Rongrong Ji
:
What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation. 2841-2850 - Zhengyang Liang

, Meiyu Liang
, Wei Huang
, Yawen Li
, Wu Liu
, Yingxia Shao
, Kangkang Lu
:
Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Retrieval. 2851-2859 - Tiancheng Gu

, Kaicheng Yang
, Ziyong Feng
, Xingjun Wang
, Yanzhao Zhang
, Dingkun Long
, Yingda Chen
, Weidong Cai
, Jiankang Deng
:
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs. 2860-2869 - Lin Peng

, Cong Wan
, Shaokun Wang
, Xiang Song
, Yuhang He
, Yihong Gong
:
CIA: Class- and Instance-aware Adaptation for Vision-Language Models. 2870-2879 - Xi Xiao

, Yunbei Zhang
, Xingjian Li
, Tianyang Wang
, Xiao Wang
, Yuxiang Wei
, Jihun Hamm
, Min Xu
:
Visual Instance-aware Prompt Tuning. 2880-2889 - Yuliang Chen

, Xi Lin
, Chao Sang
, Xiu Su
:
DualFPT: Handling Data Heterogeneity in Federated Prompt Tuning from both Generalized and Personalized Perspective. 2890-2899 - Lingbo Zhang

, Bingqian Sun
, Linghan Cai
, Yifeng Wang
, Ye Zhang
, Songhan Jiang
, Kai Zhang
, Yongbing Zhang
:
Counting by Points: Density-Guided Weakly-Supervised Nuclei Segmentation in Histopathological Images. 2900-2908 - Haodong Chen

, Haojian Huang
, Xinxiang Yin
, Dian Shao
:
FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning. 2909-2918 - Shaowu Xu

, Xibin Jia
, Junyu Gao
, Qianmei Sun
, Jing Chang
, Chao Fan
:
Cross-Modal Dual-Causal Learning for Long-Term Action Recognition. 2919-2928 - Jiahao Li

, Yang Lu
, Yachao Zhang
, Fangyong Wang
, Yuan Xie
, Yanyun Qu
:
Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation. 2929-2938 - Jingyuan Fang

, Yang Ning
, Xiushan Nie
, Xinfeng Liu
, Zhiyong Cheng
:
VLHP: Learning Discriminative Vision-Language Hybrid Prototypes for Weakly Supervised Semantic Segmentation. 2939-2948 - Xin Li

, Mingming Gong
, Yunfei Wu
, Jianxin Dai
, Antai Guo
, Xinghua Jiang
, Haoyu Cao
, Yinsong Liu
, Deqiang Jiang
, Xing Sun
:
DREAM: Document Reconstruction via End-to-end Autoregressive Model. 2949-2957 - Longzhen Yang

, Zhangkai Ni
, Ying Wen
, Yihang Liu
, Lianghua He
, Heng Tao Shen
:
Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation. 2958-2967 - Wenxuan Yang

, Qingqv Wei
, Chenxi Ma
, Weimin Tan
, Bo Yan
:
Scaling Laws for Data-Efficient Visual Transfer Learning. 2968-2976 - Pengcheng Zheng

, Kecheng Chen
, Jiaxin Huang
, Bohao Chen
, Ju Liu
, Yazhou Ren
, Xiaorong Pu
:
Lightweight Medical Image Restoration via Integrating Reliable Lesion-Semantic Driven Prior. 2977-2986 - Kun-Hsiang Lin

, Yu-Wen Tseng
, Kang-Yang Huang
, Jhih-Ciang Wu
, Wen-Huang Cheng
:
InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing. 2987-2996 - Kai Niu

, Liucun Shi
, Ke Han
, Qinzi Zhao
, Yue Wu
, Yanning Zhang
:
Test-Time Adaptation for Text-Based Person Search. 2997-3006 - Si Chen

, Yujia Chen
, Xiaotian Yin
, Xin Liu
, Huakai Lai
, Tianzhu Zhang
:
PAF: Prototype Adaptive Fusion for Test-Time Adaptation of Vision-Language Models. 3007-3016 - Chunyan She

, Fujun Han
, Chengyu Fang
, Shukai Duan
, Lidan Wang
:
Exploring Fourier Prior and Event Collaboration for Low-Light Image Enhancement. 3017-3026 - Liang Yao

, Fan Liu
, Delong Chen
, Chuanyi Zhang
, Yijun Wang
, Ziyun Chen
, Wei Xu
, Shimin Di
, Yuhui Zheng
:
RemoteSAM: Towards Segment Anything for Earth Observation. 3027-3036 - Jiawei Ge

, Xinyu Zhang
, Jiuxin Cao
, Xuelin Zhu
, Weijia Liu
, Qingqing Gao
, Biwei Cao
, Kun Wang
, Chang Liu
, Bo Liu
, Chen Feng
, Ioannis Patras
:
Gen4Track: A Tuning-free Data Augmentation Framework via Self-correcting Diffusion Model for Vision-Language Tracking. 3037-3046 - Kangjie Chen

, BingQuan Dai
, Minghan Qin
, Dongbin Zhang
, Peihao Li
, Yingshuang Zou
, Haoqian Wang
:
SLGaussian: Fast Language Gaussian Splatting in Sparse Views. 3047-3056 - Jo-Ku Cheng

, Zeren Zhang
, Ran Chen
, Jingyang Deng
, Ziran Qin
, Jinwen Ma
:
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions. 3057-3066 - Hang Xiong

, Runmin Cong
, Jinpeng Chen
, Chen Zhang
, Feng Li
, Huihui Bai
, Sam Kwong
:
MM-Prompt: Multi-modality and Multi-granularity Prompts for Few-Shot Segmentation. 3067-3075 - Jiawei Gu

, Ziyue Qiao
, Zechao Li
:
Activation Shape Matters: OOD Detection with Norm-Entropy Fusion. 3076-3084 - Xinchen Ye

, Aokai Zhang
, Rui Xu
:
Semantics-Driven Contrastive Learning for Real-World Depth Super Resolution. 3085-3093 - Jiawen Lin

, Shiran Bian
, Yihang Zhu
, Wenbin Tan
, Yachao Zhang
, Yuan Xie
, Yanyun Qu
:
SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding. 3094-3103 - Yucheng Shu

, Yaohui Wang
, Lihong Qiao
, Feiyan Li
, Bin Xiao
, Weisheng Li
, Xinbo Gao
:
The Overlooked Matters: Revisiting Background, Prototype, and Activation in Few-Shot Medical Image Segmentation. 3104-3113 - Jiaxin Peng

, Siwang Zhou
, Chengqing Li
, Yucheng Li
, Dunyun Chen
:
Mitigating Delivery Artifacts in Real-World Video Super-Resolution. 3114-3123 - Wei Chen

, Jianwei Niu
, Xuefeng Liu
, Xinghao Wu
:
Decoupling Dense Video Captioning via Task-specific Prompts. 3124-3132 - Yongxin Li

, Ying Cheng
, Yaning Pan
, Wen He
, Qing Wang
, Rui Feng
, Xiaobo Zhang
:
Semantic-Aware Hard Negative Mining for Medical Vision-Language Contrastive Pretraining. 3133-3142 - Jiale Li

, Mingrui Wu
, Zixiang Jin
, Hao Chen
, Jiayi Ji
, Xiaoshuai Sun
, Liujuan Cao
, Rongrong Ji
:
MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models. 3143-3152 - Hezhao Liu

, Yang Lu
, Mengke Li
, Yiqun Zhang
, Shreyank N. Gowda
, Chen Gong
, Hanzi Wang
:
FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data. 3153-3162 - Wangsheng He

, Wanru Xu
, Ping Guo
, Zhenjiang Miao
, Yi Tian
:
InstructStep: Fine-Grained Localization of Step Content and Relation in Instructional Video. 3163-3172 - Jiaqi Xu

, Cuiling Lan
, Yan Lu
:
Deciphering Functions of Neurons in Vision-Language Models. 3173-3181 - Kamakshya Prasad Nayak

, Kamalakar Vijay Thakare
, Ashesh Xalxo
, Lalit Lohani
, Debi Prosad Dogra
:
Can Person-Level Attributes Improve Group Re-Identification? 3182-3191 - Changshuo Wang

, Shuting He
, Xiang Fang
, Fangzhe Nan
, Prayag Tiwari
:
Seeing the Overlooked: Bio-Visual Inspired Weak Saliency Feedback Transformer for Person Re-identification. 3192-3201 - Weihuang Lin

, Yiwei Ma
, Xiaoshuai Sun
, Shuting He
, Jiayi Ji
, Liujuan Cao
, Rongrong Ji
:
HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation. 3202-3211 - Da Zhang

, Feiyu Wang
, Bingyu Li
, Zhiyuan Zhao
, Junyu Gao
, Xuelong Li
:
KAID: Knowledge-Aware Interactive Distillation for Vision-Language Models. 3212-3221 - Xiao Hu

, Heiko Neumann
, Jochen Lang
:
A Filtering Framework for Semi-online Referring Video Object Segmentation. 3222-3231 - Ruiqi Dong

, Wenjing Pang
, Chenjie Pan
, Hengyang Lu
, Chenyou Fan
:
StoryCrafter: Instance-Aligned Multi-Character Storytelling with Diffusion Policy Learning. 3232-3241 - Xiaohan Yu

, Zicheng Pan
, Yang Zhao
, Qin Zhang
, Yongsheng Gao
:
Contrastive Lie Algebra Learning for Ultra-Fine-Grained Visual Categorization. 3242-3250 - Xiaoxing Hu

, Kaicheng Yang
, Jun Wang
, Haoran Xu
, Ziyong Feng
, Yupei Wang
:
Decoupled Global-Local Alignment for Improving Compositional Understanding. 3251-3260 - Jingxing Guo

, Guilian Chen
, Yimu Sun
, Huisi Wu
, Jing Qin
:
EchoVim: Making Vision Mamba Docile for Echocardiography Video Segmentation via Dynamic Interaction and Semantic Token-attentive Refinement. 3261-3269 - Haifeng Zhao

, Shuo Xu
, Leilei Ma
, Yufei Zhang
, Lei Wang
, Dengdi Sun
:
Towards Space and Semantics: Object-Purified Representation Learning for Multi-Label Image Classification. 3270-3279 - Junyu Gao

, Xuan Yao
, Yong Rui
, Changsheng Xu
:
Building Embodied EvoAgent: A Brain-inspired Paradigm for Bridging Multimodal Large Models and World Models. 3280-3289 - Chen Feng

, Nicu Sebe
, Georgios Tzimiropoulos
, Miguel R. D. Rodrigues
, Ioannis Patras
:
Unveiling Open-set Noise: Theoretical Insights into Label Noise. 3290-3299 - Zhongrui Gui

, Junyu Xie
, Tengda Han
, Weidi Xie
, Andrew Zisserman
:
Character-Centric Understanding of Animated Movies. 3300-3309 - Ziyun Dai

, Xiaoqiang Li
, Shaohua Zhang
, Yuanchen Wu
, Jide Li
:
See Different, Think Better: Visual Variations Mitigating Hallucinations in LVLMs. 3310-3319 - Cheng Ye

, Weidong Chen
, Peipei Song
, Xinyan Liu
, Lei Zhang
, Zhendong Mao
:
Multi-round Mutual Emotion-Cause Pair Extraction for Emotion-Attributed Video Captioning. 3320-3329 - Wenhao Zheng

, Chenwei Sun
, Wenbo Zhang
, Jiancheng Lv
, Xianggen Liu
:
Target-Guided Bayesian Flow Networks for Quantitatively Constrained CAD Generation. 3330-3339 - Zhiyu Ye

, Guowen Li
, Haoyuan Liang
, Zixi Wang
, Shilei Cao
, Yushan Lai
, Juepeng Zheng
:
Quantifying Samples with Invariance for Source-Free Class Incremental Domain Adaptation. 3340-3349 - Shuai Huang

, Yongxiong Wang
, Huan Luo
, Haodong Jing
, Chendong Qin
, Jingqun Tang
:
MINDEV: Multi-modal Integrated Diffusion Framework for Video Reconstruction from EEG Signals. 3350-3359 - Zhijie Rao

, Jingcai Guo
:
Balancing Cross-Modal Attention for Generalized Zero-Shot Learning. 3360-3369 - Zhenxuan Fang

, Shuaibo Wang
, Weisheng Dong
, Junwei Xu
, Fangfang Wu
, Xin Li
, Guangming Shi
:
Beyond Visual Quality: Fidelity-Oriented Diffusion Model for Real-world Image Super-Resolution. 3370-3379 - Peng Ying

, Zhongnian Li
, Meng Wei
, Xinzheng Xu
:
Reversible Privacy Preserving on Vision-Language Models via Adversarial Multimodal Key. 3380-3389 - Taras Kucherenko

, Derek Peristy
, Judith Bütepage
:
Evaluating the Evaluators: Towards Human-aligned Metrics for Missing Markers Reconstruction. 3390-3398 - Changho Choi

, Youngwoo Shin
, Gyojin Han
, Dong-Jae Lee
, Junmo Kim
:
B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding. 3399-3407 - Fenghe Tang

, Bingkun Nian
, Jianrui Ding
, Wenxin Ma
, Quan Quan
, Chengqi Dong
, Jie Yang
, Wei Liu
, S. Kevin Zhou
:
Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation. 3408-3417 - Ling You

, Wenxuan Huang
, Xinni Xie
, Xiangyi Wei
, Bangyan Li
, Shaohui Lin
, Yang Li
, Changbo Wang
:
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation. 3418-3427 - Bowen Guo

, Shiwei Gan
, Yafeng Yin
, Xiao Liu
, Zhiwei Jiang
, Shunmei Meng
:
Sentence-level Segmentation for Long Sign Language Videos with Captions. 3428-3437 - Jiayi Zou

, Chaofan Chen
, Bing-Kun Bao
, Changsheng Xu
:
DMC3: Dual-Modal Counterfactual Contrastive Construction for Egocentric Video Question Answering. 3438-3447 - Penglei Sun

, Yaoxian Song
, Xiangru Zhu
, Xiang Liu
, Qiang Wang
, Yue Liu
, Changqun Xia
, Tiefeng Li
, Yang Yang
, Xiaowen Chu
:
City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning. 3448-3457 - Yuzhen Niu

, Siling Chen
, Yuzhong Chen
, Fusheng Li
, Rui Xu
, Hui Da
:
CoFiVLA: Synergistic Coarse-Fine Vision-Language Alignment for Image Aesthetic Assessment. 3458-3467 - Duolin Wang

, Guanyu Xing
, Yanli Liu
:
FlowTrack: Integrating Adjacent-Frame Motion Tracking and Adaptive Prediction for Robust Semi-Supervised VOS. 3468-3476 - Lin Zhang

, Yi Tian
, Xiyun Wang
, Wanru Xu
, Yi Jin
, Yaping Huang
:
Differential Contrastive Training for Gaze Estimation. 3477-3486 - Tiancheng Gu

, Kaicheng Yang
, Chaoyi Zhang
, Yin Xie
, Xiang An
, Ziyong Feng
, Dongnan Liu
, Weidong Cai
, Jiankang Deng
:
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm. 3487-3496 - Yanting Pei

, Fan Yang
:
Adaptive Neighbors and Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation with Noisy Labels. 3497-3506 - Bingshuai Liu

, Ante Wang
, Zijun Min
, Chenyang Lyu
, Longyue Wang
, Zhihao Wang
, Xu Han
, Peng Li
, Jinsong Su
:
EditEval: Towards Comprehensive and Automatic Evaluation for Text-guided Video Editing. 3507-3516 - Rui Chen

, Lei Sun
, Jing Tang
, Geng Li
, Xiangxiang Chu
:
FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos. 3517-3526 - Zizhi Chen

, Xinyu Zhang
, Minghao Han
, Yizhou Liu
, Ziyun Qian
, Weifeng Zhang
, Xukun Zhang
, Jingwei Wei
, Lihua Zhang
:
VLM-based Prompts as the Optimal Assistant for Unpaired Histopathology Virtual Staining. 3527-3536 - Zihao Mo

, Junye Chen
, Chaowei Fang
, Guanbin Li
:
PatchWiper: Leveraging Dynamic Patch-Wise Parameters for Real-World Visible Watermark Removal. 3537-3545 - Xueyu Yuan

, Jiarui Zhang
, Jiangqi Song
, Liu Liu
, Li Zhang
, Dan Guo
, Richang Hong
, Meng Wang
:
DFGAP: Towards Depth-Free Cross-Category GAParts Perception via Uncertainty-Quantified Modeling. 3546-3554 - Yudong Zhang

, Ruobing Xie
, Xingwu Sun
, Yiqing Huang
, Jiansheng Chen
, Zhanhui Kang
, Di Wang
, Yu Wang
:
DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models. 3555-3564 - Wenjie Zhu

, Yabin Zhang
, Xin Jin
, Wenjun Zeng
, Lei Zhang
:
Knowledge Regularized Negative Feature Tuning of Vision-Language Models for Out-of-Distribution Detection. 3565-3574 - Ji Ma

, Wei Suo
, Peng Wang
, Yanning Zhang
:
Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers. 3575-3584 - Xudong Wang

, Lei Tan
, Pingyang Dai
, Liujuan Cao
, Rongrong Ji
:
GPT-ReID: Learning Fine-grained Representation with GPT for Text-based Person Retrieval. 3585-3594 - Runze Zhao

, Fuqing Zhu
, Jizhong Han
, Songlin Hu
:
Visual Perception Uncertainty Learning for Hallucination Detection in Large Vision-Language Models. 3595-3604 - Lei Liu

, Xiangdong Su
, Guanglai Gao
:
Fourier Self-Adaptation for Transferring General Pretrained Models to Specific Domains. 3605-3614 - Yiying Yang

, Fukun Yin
, Jiayuan Fan
, Wanzhang Li
, Xin Chen
, Gang Yu
:
Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE. 3615-3624 - Gefan Ye

, Lin Li
, Kexin Li
, Jun Xiao
, Long Chen
:
Zero-shot Compositional Action Recognition with Neural Logic Constraints. 3625-3634 - Yijun Wang

, Siying Wu
, Lubin Gan
, Zheyu Zhang
, Jing Zhang
, Zhangchi Hu
, Huyue Zhu
, Peixi Wu
, Xiaoyan Sun
:
MeDKCoOp: Dual Knowledge-guided Graph Prompt Learning for Biomedical Vision-Language Models. 3635-3644 - Jianhui Wang

, Yangfan He
, Yan Zhong
, Xinyuan Song
, Jiayi Su
, Yuheng Feng
, Ruoyu Wang
, Hongyang He
, Wenyu Zhu
, Xinhang Yuan
, Miao Zhang
, Keqin Li
, Jiaqi Chen
, Tianyu Shi
, Xueqian Wang
:
Twin Co-Adaptive Dialogue for Progressive Image Generation. 3645-3653 - Jiayuan Rao

, Zifeng Li
, Haoning Wu
, Ya Zhang
, Yanfeng Wang
, Weidi Xie
:
Multi-Agent System for Comprehensive Soccer Understanding. 3654-3663 - Yuguang Zhang

, Qihang Fan
, Huaibo Huang
:
Vision Transformer with Sparse Scan Prior. 3664-3672 - Shaohui Dai

, Yansong Qu
, Zheyan Li
, Xinyang Li
, Shengchuan Zhang
, Liujuan Cao
:
Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs. 3673-3682 - Qinchen Wu

, Difei Gao
, Qinghong Lin
, Zhuoyu Wu
, Mike Zheng Shou
:
GUI-Narrator: Detecting and Captioning Computer GUI Actions. 3683-3692 - Liangyu Fu

, Junbo Wang
, Yuke Li
, Qiangguo Jin
, Hongsong Wang
, Jing Ya
, Linjiang Huang
, Liang Yao
, Jiangbin Zheng
, Xuecheng Wu
, Zhiyong Wang
:
DSACap: Enhancing Visual-Semantic Alignment with Diffusion-based Framework for Image Captioning. 3693-3701 - Meng Wei

, Zhongnian Li
, Peng Ying
, Xinzheng Xu
:
Seeing the Undefined: Chain-of-Action for Generative Semantic Labels. 3702-3711 - Zikang Liu

, Kun Zhou
, Wayne Xin Zhao
, Dawei Gao
, Yaliang Li
, Ji-Rong Wen
:
Less is More: High-value Data Selection for Visual Instruction Tuning. 3712-3721 - Mengzu Liu

, Junwei Xu
, Tao Huang
, Fangfang Wu
, Le Dong
, Xin Li
, Weisheng Dong
:
Exploring Global Correlations via Polarity Memory for Multispectral Demosaicing. 3722-3730 - Zhaofeng Shi

, Heqian Qiu
, Lanxiao Wang
, Qingbo Wu
, Fanman Meng
, Hongliang Li
:
Unsupervised Ego- and Exo-centric Dense Procedural Activity Captioning via Gaze Consensus Adaptation. 3731-3740 - Chao Yin

, Hao Li
, Kequan Yang
, Jide Li
, Pinpin Zhu
, Xiaoqiang Li
:
Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation. 3741-3750 - Shanding Diao

, Yang Zhao
, Yuan Chen
, Zhao Zhang
, Wei Jia
, Ronggang Wang
:
Multi-Layer Gaussian Splatting for Single-Image Feed-Forward Spatial Scene Reconstruction. 3751-3759 - Yang Ren

, Hai Jiang
, Wei Li
, Menglong Yang
, Heng Zhang
, Zehua Sheng
, Qingsheng Ye
, Shuaicheng Liu
:
Learning Arbitrary-Scale RAW Image Downscaling with Wavelet-based Recurrent Reconstruction. 3760-3768 - Wenqi Zeng

, Yuqi Sun
, Chenxi Ma
, Weimin Tan
, Bo Yan
:
MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks. 3769-3778 - Zelei Wu

, Xulun Ye
, Jieyu Zhao
:
Clustering-Based Tail-class Mitigation for New-class Discovery. 3779-3787 - Siqi Song

, Limin Yu
, Jimin Xiao
:
SDP: Spectral-Decomposed Prompting for Continual Learning. 3788-3797 - Shubo Liu

, Hongsheng Zhang
, Qian Qiao
, Qi Wu
, Peng Wang
:
VLN-ChEnv: Vision-language Navigation in Changeable Environments. 3798-3807 - Kedong Xiu

, Sai Qian Zhang
:
CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models. 3808-3816 - Fan Yang

, Ling Deng
, Zhiyong Gan
, Qisheng He
, Yuanbo Fang
, Xiangmin Xu
, Shuangping Huang
, Tianshui Chen
:
Optimal Feature Embedding for Document Large Visual Language Model. 3817-3826 - Lin Li

, Guikun Chen
, Zhen Wang
, Jun Xiao
, Long Chen
:
Compositional Zero-shot Learning via Progressive Language-based Observations. 3827-3836 - Weimin Cheng

, Zhenyu Wang
, Tao Huang
, Fangfang Wu
, Weisheng Dong
:
Pushing the Limit of Binarized Neural Network for Image Super Resolution with Smooth Information Transmission. 3837-3846 - Xiang Ma

, Litian Xu
, Lexin Fang
, Caiming Zhang
, Lizhen Cui
:
Reliable Cross-modal Alignment via Prototype Iterative Construction. 3847-3855 - Ran Chen

, Taiyi Su
, Hanli Wang
:
WaveCL: Wavelet Calibration Learning for Referring Video Object Segmentation. 3856-3864 - Jingxing Guo

, Guilian Chen
, Yimu Sun
, Huisi Wu
, Jing Qin
:
Hierarchical Spatiotemporal Context Aggregation and Speckle-aware Deformable Convolution for Echocardiography Video Segmentation. 3865-3874 - Junkang Liu

, Fanhua Shang
, Yuxuan Tian
, Hongying Liu
, Yuanyuan Liu
:
Consistency of Local and Global Flatness for Federated Learning. 3875-3883 - Yangxu Yin

, Honglong Chen
, Yudong Gao
, Peng Sun
, Liantao Wu
, Zhe Li
, Weifeng Liu
:
FFCBA: Feature-based Full-target Clean-label Backdoor Attacks. 3884-3892 - Sijing Li

, Tianwei Lin
, Lingshuai Lin
, Wenqiao Zhang
, Jiang Liu
, Xiaoda Yang
, Juncheng Li
, Yucheng He
, Xiaohui Song
, Jun Xiao
, Yueting Zhuang
, Beng Chin Ooi
:
EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model. 3893-3902 - Changtao Miao

, Qi Chu
, Tao Gong
, Zhentao Tan
, Zhenchao Jin
, Wanyi Zhuang
, Man Luo
, Honggang Hu
, Nenghai Yu
:
Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization. 3903-3912 - Shanshan Li

, Jiawei Hou
, Da Huang
, Yanwei Fu
, Xiangyang Xue
:
Ali-UI: Enhancing Complex Vision-Language Navigation with Alignment of Unified Map and Instruction Parsing. 3913-3922 - Ziming Zhao

, Zhaoxuan Li
, Tingting Li
, Fan Zhang
:
Stealthy-AE: Generating Stealthy Adversarial Examples through Online Social Networks. 3923-3931 - Hanning Chen

, Yang Ni
, Wenjun Huang
, Hyunwoo Oh
, Yezi Liu
, Tamoghno Das
, Mohsen Imani
:
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation. 3932-3941 - Yonghyeon Jo

, Janghyun Kim
, Jinsun Park
:
BAC-GCN: Background-Aware CLIP-GCN Framework for Unsupervised Multi-Label Classification. 3942-3951 - Dingwei Zhang

, Dong Zhang
, Jinhui Tang
:
Mitigating Query Selection Bias in Referring Video Object Segmentation. 3952-3961 - Xiangyu Shan

, Heng Song
, Junwu Zhu
:
DFCNet: Dual-Factor Compensatory Clustering Network for Modality-Imbalanced Generalized Zero-Shot Learning. 3962-3971 - Zhiyuan Fan

, Keyi Liang
:
Video-to-Image Affordance Grounding via Visual Conceptual Learning. 3972-3980 - Qiyan Zhao

, Xiaofeng Zhang
, Yiheng Li
, Yun Xing
, Xiaosong Yuan
, Feilong Tang
, Sinan Fan
, Xuhang Chen
, Da-Han Wang
, Xu-Yao Zhang
:
MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models. 3981-3990 - Dexuan Xu

, Yanyuan Chen
, Yu Huang
, Shihao E
, Yiwei Lou
, Yongzhi Cao
, Hanpin Wang
, Meikang Qiu
:
Medical Vision-Language Pre-training with Multimodal Variational Masked Autoencoder for Robust Medical VQA. 3991-4000 - Yili Li

, Gang Xiong
, Gaopeng Gou
, Xiangyan Qu
, Jiamin Zhuang
, Zhen Li
, Junzheng Shi
:
T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval. 4001-4009 - Yizhi Hu

, Zezhao Tian
, Xingqun Qi
, Chen Su
, Bingkun Yang
, Junhui Yin
, Muyi Sun
, Man Zhang
, Zhenan Sun
:
ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension. 4010-4019 - Xiaoqin Wang

, Xianxu Hou
, Meidan Ding
, Junliang Chen
, Kaijun Deng
, Jinheng Xie
, Linlin Shen
:
DisFaceRep: Representation Disentanglement for Co-occurring Facial Components in Weakly Supervised Face Parsing. 4020-4029 - Zhenni Yu

, Li Zhao
, Guobao Xiao
, Xiaoqin Zhang
:
SAM-TTT: Segment Anything Model via Reverse Parameter Configuration and Test-Time Training for Camouflaged Object Detection. 4030-4038 - Jing Ma

, Haochen Sun
, Zeyuan Zang
, Fangxiang Feng
, Caixia Yuan
, Lei Ren
, Huixing Jiang
, Wei Chen
, Xiaojie Wang
:
VL-DynaRefine: A Vision-Language Dynamic Refinement Approach for Visual Reasoning. 4039-4047 - Jiao Chen

, Jiayi He
, Fangfang Chen
, Zuohong Lv
, Jianhua Tang
:
Forward-Only Continual Learning. 4048-4057 - Jiahua Bao

, Siyao Cheng
, Jiaxing Du
, Changjiang He
, Zeming Lang
, Hao Zhang
, Jie Liu
:
BOLT: Fewer Tokens but More Performance Retention for Efficient Vision-Language Models Inference. 4058-4067 - Ziqi Yuan

, Jun Li
, Yanghao Li
, Yuxiang Huang
, Chi Chen
, Shuo Wang
, Zhinan Gou
:
CITR: Efficient Long Video Understanding Needs Causal Importance. 4068-4076 - Qi Li

, Yucan Zhou
, Jiang Zhou
, XingYou Yang
, Xiaoyan Gu
:
Diverse and Public Features Cooperation via Gradient Rectification for Federated Prompt Learning. 4077-4086 - Shilei Wang

, Gong Cheng
, Pujian Lai
, Dong Gao
, Junwei Han
:
Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and Interaction. 4087-4096 - Xinyu Zhang

, Lingling Zhang
, Yanrui Wu
, Muye Huang
, Jun Liu
:
Cognitive Predictive Coding Network: Rethinking the Generalization in Raven's Progressive Matrices. 4097-4106 - Xiaoxuan Mu

, Haoyu Tang
, Han Jiang
, Tianyuan Liang
, Qinghai Zheng
, Jihua Zhu
:
FACE: A Dual-Template and Adaptive Curriculum Framework for Unsupervised Text-Based Person Search. 4107-4116 - Xinyu Huang

, Yi-Jie Huang
, Youcai Zhang
, Weiwei Tian
, Rui Feng
, Yuejie Zhang
, Yanchun Xie
, Yaqian Li
, Lei Zhang
:
Open-Set Image Tagging with Multi-Grained Text Supervision. 4117-4126 - Zhihao Wang

, Shiyu Liu
, Zhiwei He
, Kangjie Zheng
, Liangying Shao
, Junfeng Yao
, Jinsong Su
:
Gloss Matters: Unlocking the Potential of Non-Autoregressive Sign Language Translation. 4127-4136 - Jiye Xie

, Yifei Gao
, Liangliang You
, Xiang Xu
, Haoran Xu
, Zhiqiang Kou
, Kexue Fu
, Youyang Qu
, Wenjie Yang
, Jianwei Guo
, Weiliang Meng
, Longxiang Gao
, Haoran Yang
, Changwei Wang
, Yu Zhang
:
Collaboration Wins More: Dual-Modal Collaborative Attention Reinforcement for Mitigating Large Vision Language Models Hallucination. 4137-4146 - Xinzhe Xia

, Weiguang Zhao
, Yuyao Yan
, Guanyu Yang
, Rui Zhang
, Kaizhu Huang
, Xi Yang
:
Towards Training-Free Open-World Classification with 3D Generative Models. 4147-4155 - Mingyu Fu

, Wei Suo
, Ji Ma
, Lin Yuanbo Wu
, Peng Wang
, Yanning Zhang
:
Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models. 4156-4165 - Zijun Xu

, Jiahao Guo
, Chunjie Zhang
, Zhongyuan Wang
, Chunxia Xiao
, Chao Liang
:
Quantum Interference-Inspired Who-What-Where Composite-Semantics Instance Search for Story Videos. 4166-4174 - Lihong Qiao

, Shiyi Gao
, Yucheng Shu
, Bin Xiao
, Weisheng Li
, Xinbo Gao
:
Pathology-Aware Reconstruction with Discriminative Knowledge Boosting Alignment for Che-Xray Vision-Language Pre-training. 4175-4184 - Rongzhen Zhao

, Yi Zhao
, Juho Kannala
, Joni Pajarinen
:
Slot Attention with Re-Initialization and Self-Distillation. 4185-4192 - Qucheng Peng

, Chen Bai
, Guoxiang Zhang
, Bo Xu
, Xiaotong Liu
, Xiaoyin Zheng
, Chen Chen
, Cheng Lu
:
NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving. 4193-4202 - Zizhuo Li

, Chunbao Su
, Fan Fan
, Jun Huang
, Jiayi Ma
:
CorrNeXt: Making the ConvNet-Style Correspondence Pruner Stronger for Two-View Geometry. 4203-4212 - Jinxu Zhang

, Qiyuan Fan
, Yongqi Yu
, Yu Zhang
:
DREAM: Integrating Hierarchical Multimodal Retrieval with Multi-page Multimodal Language Model for Documents VQA. 4213-4221 - Junyi Wang

, Yue Qi
:
Visual Localization using Hybrid Feature Grid and Learned Weighted Global Point Cloud. 4222-4231 - Yifan Zhang

, Yang Shi
, Weichen Yu
, Qingsong Wen
, Xue Wang
, Wenjing Yang
, Zhang Zhang
, Liang Wang
, Rong Jin
:
Debiasing Multimodal Large Language Models via Penalization of Language Priors. 4232-4241 - Xiaolei Bo

, Feiyang Yang
, Feilong Xu
, Xiaoli Zhang
:
Cross-Counter-Repeat Attention for Enhanced Understanding of Visual Semantics in Radiology Report Generation. 4242-4250 - Jiacheng Ruan

, Zongyun Zhang
, Jingsheng Gao
, Wenzhen Yuan
, Ting Liu
, Yuzhuo Fu
:
MPI-CD: Multi-Path Information Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models. 4251-4260 - Hao Sun

, Fenggen Yu
, Huiyao Xu
, Tao Zhang
, Changqing Zou
:
LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis. 4261-4270 - Hongchen Wei

, Zhenzhong Chen
:
RealVG: Unleashing MLLMs for Training-Free Spatio-Temporal Video Grounding in the Wild. 4271-4280 - Hongchen Wei

, Zhenzhong Chen
:
Visual Context Window Extension: A New Perspective for Long Video Understanding. 4281-4289 - Yu Liu

, Kun Sun
, Chang Tang
, Yuhua Qian
, Xin Li
:
TPDepth: Leveraging Text Prompts with ControlNet to Boost Diffusion-based Depth Estimation. 4290-4299 - Yingxin Lai

, Hongyang Wang
, Jing Yang
, Xiangui Kang
, Bin Li
, Linlin Shen
, Zitong Yu
:
GM-DF: Generalized Multi-Scenario Deepfake Detection. 4300-4309 - Kun Zhai

, Siheng Chen
, Xingjun Ma
, Yu-Gang Jiang
:
FedAPT: Federated Adversarial Prompt Tuning for Vision-Language Models. 4310-4318 - Jie Wan

, Jianhao Fu
, Ziqi Yang
, Kui Ren
:
BTUAP: Boosting the Transferability of Universal Adversarial Perturbations in the Black-box Setting under various data dependencies. 4319-4328 - Hui Wu

, Haoquan Zhai
, Yuchen Li
, Hengyi Cai
, Peirong Zhang
, Yidan Zhang
, Lei Wang
, Chunle Wang
, Yingyan Hou
, Shuaiqiang Wang
, Dawei Yin
:
MARA: A Multimodal Adaptive Retrieval-Augmented Framework for Document Question Answering. 4329-4338 - Bocheng Pan

, Hailong Shi
, Xingyu Gao
:
DR-VQA: Decompose-then-Reconstruct for Visual Question Answering in BLV Assistance. 4339-4348 - Wei Jia

, Li Jin
, Kaiwen Wei
, Yuying Shang
, Nayu Liu
, Zhicong Lu
, Qing Liu
, Linhao Zhang
, Jiang Zhong
, Yanfeng Hu
:
U-MERE: Unconstrained Multimodal Entity and Relation Extraction with Collaborative Modeling and Order-Sensitive Optimization. 4349-4358 - Luyao Ren

, Wenxin Yu
, Zhiqiang Zhang
, Chang Liu
:
EMIFS: Efficient Multi-scale Information Fusion Self-supervision for Medical Image Segmentation. 4359-4368 - Chenxi Zhang

, Qing Zhang
, Jiayun Wu
, Youwei Pang
:
CGCOD: Class-Guided Camouflaged Object Detection. 4369-4377 - Wenzheng Yang

, Songwei Pei
, Bingfeng Liu
, Qian Li
, Shangguang Wang
:
OGDepth: Leveraging Object Guidance in Diffusion Models for Enhanced Monocular Depth Estimation. 4378-4387 - Xueyi Zhang

, Peiyin Zhu
, Yuan Liao
, Xiyu Wang
, Mingrui Lao
, Siqi Cai
, Yanming Guo
, Haizhou Li
:
TrustCLIP: Learning from Noisy Labels via Semantic Label Verification and Trust-aligned Gradient Projection. 4388-4397 - Yikun Ji

, Yan Hong
, Jiahui Zhan
, Haoxing Chen
, Jun Lan
, Huijia Zhu
, Weiqiang Wang
, Liqing Zhang
, Jianfu Zhang
:
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models. 4398-4407 - Xiaodong Wang

, Hongmin Hu
, Fei Yan
, Junwen Lu
, Zhiqiang Zeng
, Weidong Hong
, Zhedong Zheng
:
UniAD: Integrating Geometric and Semantic Cues for Unified Anomaly Detection. 4408-4417 - Runwei Situ

, Yi Cai
, Yong Xu
, Jiexin Wang
:
Ground and Reconstruct: Entity-Region Bidirectional Alignment Pre-Training for Low-Resource GMNER. 4418-4426 - Yongquan Xue

, Zhaoru Guo
, Zhaozhao Su
, Chong Peng
, Jun Feng
, Pan Zhou
, Marcin Pietron
, Xiyuan Wang
, Liejun Wang
, Panpan Zheng
:
Rodecon-net: Medical Image Segmentation via Robust Decoupling and Contrast-enhanced Fusion. 4427-4435 - Wenxi Huang

, Xiaojun Chen
, Qin Zhang
, Ting Wan
, Ziqi Liu
, Liangjie Zhang
:
MRBench: A Multi-Image Reasoning Benchmark with Adaptive Knowledge Retrieval. 4436-4445 - Xuanliu Zhu

, Yiqiao Chai
, Runnan Li
, Mingying Lan
, Li Gao
:
CrossMind-VL: Multi-Subject Mind-to-Video Decoding with Multimodal LLM Semantic Grounding. 4446-4454 - Jiaqing Fan

, Hanwen Qian
, Mengjuan Jiang
, Fanzhang Li
:
PeriodVOS: Learning Periodic Patterns for Unsupervised Video Object Segmentation via Adaptive Contextual Coupling. 4455-4463 - Xiangzhao Hao

, Kuan Zhu
, Hongyu Guo
, Haiyun Guo
, Ning Jiang
, Quan Lu
, Ming Tang
, Jinqiao Wang
:
Referring Expression Instance Retrieval and A Strong End-to-End Baseline. 4464-4473 - Lifeng Lin

, Rongfeng Lu
, Quan Chen
, Haofan Ren
, Ming Lu
, Yaoqi Sun
, Chenggang Yan
, Anke Xue
:
VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control. 4474-4483 - Sidun Liu

, Wenyu Li
, Peng Qiao
, Yong Dou
:
Regist3R: Incremental Registration with Stereo Foundation Model. 4484-4493 - Zichi Liu

, Yinggui Wang
, Tao Wei
, Chao Ma
:
AnchorSync: Global Consistency Optimization for Long Video Editing. 4494-4503 - Hongxu Ma

, Chenbo Zhang
, Lu Zhang
, Jiaogen Zhou
, Jihong Guan
, Shuigeng Zhou
:
Fine-grained Zero-Shot Object Detection. 4504-4513 - Hongxu Ma

, Guanshuo Wang
, Fufu Yu
, Qiong Jia
, Shouhong Ding
:
MS-DETR: Towards Effective Video Moment Retrieval and Highlight Detection by Joint Motion-Semantic Learning. 4514-4523 - Hao Ruan

, Jinliang Lin
, Yingxin Lai
, Zhiming Luo
, Shaozi Li
:
HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided Drones. 4524-4533 - Yun Li

, Lina Yao
, Zhe Liu
:
Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training. 4534-4541 - Zhuming Wang

, Yihao Zheng
, Jiarui Li
, Yaofei Wu
, Yan Huang
, Zun Li
, Lifang Wu
, Liang Wang
:
VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition. 4542-4551 - Yuzhen Li

, Min Liu
, Yuan Bian
, Xueping Wang
, Zhaoyang Li
, Gen Li
, Yaonan Wang
:
Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual Grounding. 4552-4561 - Yiliang Zhu

, Dayan Wu
, Qinghang Su
, Zexian Yang
, Zheng Lin
, Weiping Wang
:
Mitigating the Evolving Semantic Entanglement in Continual Learning of Vision-Language Models. 4562-4570 - Xiongwei Dang

, Wenxuan Liu
, Xian Zhong
, Zheng Wang
:
SegTraj: A Segmented-Trajectory-Aware Spatio-Temporal Graph Convolutional Network for Social Group Detection. 4571-4579 - Sifan Zuo

, Youfa Liu
, Bo Du
:
CSDN: CLIP-Driven Similarity-Aligned Distillation Network for Weakly-Supervised Object Localization. 4580-4589 - Dirui Xie

, Xiaofang Hu
, Zihan Wei
, Zhengqiqi Yang
, Yanlian Jiang
, Yue Zhou
:
Learning Structural Priors via Laplacian RWKV Diffusion with Light-Effect Dataset for Nighttime Visibility Enhancement. 4590-4599 - Biao Chen

, Kunbin He
, Zhikun Zheng
, Mengmeng Jing
, Lin Zuo
:
Chain-of-Thought Guided Semantic Debiasing for Low-Shot Vision-Language Tasks. 4600-4609 - Shengli Zhou

, Yang Liu
, Feng Zheng
:
Learn 3D VQA Better with Active Selection and Reannotation. 4610-4618 - Kun Ding

, Ying Wang
, Shiming Xiang
:
EvoVLMA: Evolutionary Vision-Language Model Adaptation. 4619-4628 - Yang Liu

, Zhiyong Zhang
:
DSP: Dense-Sparse Parallel Networks for Self-supervised 3D Multi-person Pose Estimation from Multiple Views. 4629-4638 - Meng Chu

, Yicong Li
, Tat-Seng Chua
:
GraphVideoAgent: Enhancing Long-form Video Understanding with Entity Relation Graphs. 4639-4648 - Hancong Wang

, Yue Yu
, Hairong Zheng
, Tong Zhang
:
Test-Time Adaptation of Medical Vision-Language Models with Mixture of Modality Experts. 4649-4658 - Zixuan Wan

, Jiqing Zhang
, Yushan Wang
, Hu Lin
, Yafei Wang
, Zetian Mi
, Xin Yang, Xianping Fu
, Huibing Wang
:
Eye-based Emotion Recognition via Event-Driven Sparse Transformers. 4659-4668 - Guoxin Zhang

, Zhonghong Ou
, Kaiwen Xue
, Jiangfeng Sun
, Yifan Zhu
, Siyuan Yao
, Yiran Shen
, Meina Song
:
DGFSD: Bridging the Gap between Dense and Sparse for Fully Sparse 3D Object Detection. 4669-4678 - Benlong Wu

, Yuang Qi
, Xiuwei Shang
, Weiming Zhang
, Nenghai Yu
, Kejiang Chen
:
MMPro: A Decoupled Perception-Thinking-Execution Framework for Secure GUI Agent. 4679-4687 - Shengqian Zhu

, Chengrong Yu
, Wenbo Qi
, Jiafei Wu
, Ying Song
, Guangjun Li
, Zhang Yi
, Xiaogang Xu
, Junjie Hu
:
PRIME: Prototype-Driven Class Incremental Learning for Medical Image Segmentation. 4688-4697 - Qile Su

, Shoutai Zhu
, Shuai Zhang
, Baoyu Liang
, Chao Tong
:
EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction. 4698-4707 - Haijing Liu

, Tao Pu
, Hefeng Wu
, Keze Wang
, Liang Lin
:
DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition. 4708-4717 - Mahiro Ukai

, Shuhei Kurita
, Nakamasa Inoue
:
STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models. 4718-4727 - Shiying Lin

, Rong Hu
, Zuoyong Li
, Qinghua Lin
, Jiawei Wu
, Changqing Zhang
:
Gradient-Aware Revitalization of Non-Effective Samples in Medical Image Segmentation. 4728-4737 - Chang Su

, Beihong Jin
, Fusang Zhang
, Siheng Li
, Zhi Wang
:
Self-Supervised Human Mesh Recovery from Partial Point Cloud via a Self-Improving Loop. 4738-4747 - Ruoxuan Li

, Xiangyu Wu
, Yang Yang
:
Noise Self-Correction via Relation Propagation for Robust Cross-Modal Retrieval. 4748-4757 - Yangyang Xu

, Xi Ye
, Duo Su
:
Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts. 4758-4767 - Siran Peng

, Tianshuo Zhang
, Li Gao
, Xiangyu Zhu
, Haoyuan Zhang
, Kai Pang
, Zhen Lei
:
WMamba: Wavelet-based Mamba for Face Forgery Detection. 4768-4777 - Nanxing Hu

, Xiaoyue Duan
, Jinchao Zhang
, Guoliang Kang
:
Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models. 4778-4787 - Yiwen Liang

, Hui Chen
, Yizhe Xiong
, Zihan Zhou
, Mengyao Lyu
, Zijia Lin
, Shuaicheng Niu
, Sicheng Zhao
, Jungong Han
, Guiguang Ding
:
Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations. 4788-4797 - Chunpeng Wang

, Wenlong Ma
, Li Zou
, Zhiqiu Xia
, Qi Li
, Bin Ma
, Yunan Liu
:
Toward Robust Deepfake Detection: A Proactive Method Based on Watermarking and Knowledge Distillation. 4798-4807 - Futa Waseda

, Saku Sugawara
, Isao Echizen
:
Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language Models. 4808-4816 - Zhenghao Liu

, Xingsheng Zhu
, Tianshuo Zhou
, Xinyi Zhang
, Xiaoyuan Yi
, Yukun Yan
, Ge Yu
, Maosong Sun
:
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts. 4817-4826 - Garry Yang

, Zizhe Chen
, Man Hon Wong
, Haoyu Lei
, Yongqiang Chen
, Zhenguo Li
, Kaiwen Zhou
, James Cheng
:
MESH - Understanding Videos Like Human: Measuring Hallucinations in Large Video Models. 4827-4836 - Jiawei Zheng

, Feiyan Liu
, Xiaoli Wang
:
Seeing Through Ambiguity: Effective Video-guided Machine Translation via Chaotic Fusion and Causally Aligned Spatio-temporal Attention. 4837-4845 - Qingqing Fang

, Wenxi Lv
, Qinliang Su
:
AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation. 4846-4855 - Qiqi Zhan

, Shiwei Li
, Qingjie Liu
, Yunhong Wang
:
AttriPrompt: Dynamic Prompt Composition Learning for CLIP. 4856-4865 - Rui Pan

, Ruiying Lu
:
SP-Mamba: Spatial-Perception State Space Model for Unsupervised Medical Anomaly Detection. 4866-4874 - Feiran Liu

, Yuzhe Zhang
, Xinyi Huang
, Yinan Peng
, Xinfeng Li
, Lixu Wang
, Yutong Shen
, Ranjie Duan
, Simeng Qin
, Xiaojun Jia
, Qingsong Wen
, Wei Dong
:
The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework. 4875-4883 - Xianrun Xu

, Baoyao Yang
, Wanyun Li
, Jingsong Lin
, Yufei Xu
:
Simple but Effective: Sub-Volume Contrastive Learning for Class-Imbalanced Semi-Supervised 3D Medical Image Segmentation. 4884-4893 - Junlin Fang

, Wenya Wang
, Lingli Zhang
, Fengmao Lv
:
Why is a Bird's Caption a Good Demonstration? Towards Effective Multimodal In-Context Learning without Dedicated Data. 4894-4903 - Xide Xu

, Sandesh Kamath
, Muhammad Atif Butt
, Bogdan Raducanu
:
An h-space Based Adversarial Attack for Protection Against Few-shot Personalization. 4904-4913 - Yiqing Hao

, Yangru Huang
, Yi Jin
, Tao Wang
, Yidong Li
, Yigang Cen
:
Tree of Prompts: Aligning Hierarchical Visual Prior for Continual Generalized Category Discovery. 4914-4922 - Wenxiang Liu

, Yongkang Liu
, Weiliang Meng
, Gaoqi He
, Jianhua Li
:
D3L: Curvature-Constrained Denoising Diffusion Model for 3D Lane Detection. 4923-4931 - Bingcai Wei

, Hui Liu
, Chuang Qian
, Zijian Li
, Wangyu Wu
, Zijie Meng
:
Robust Single Image Sand Removal by Leveraging Uncertainty-aware SAM Priors and Prompt Learning with Refined Perceptual Loss. 4932-4941 - Ziyan Liu

, Junwen Li
, Kaiwen Li
, Tong Ruan
, Chao Wang
, Xinyan He
, Zongyu Wang
, Xuezhi Cao
, Jingping Liu
:
I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking. 4942-4951 - Changzhou Li

, Xinyu Yang
, Weiguo Yang
, Xinyi Li
:
VaF-LangSplat: Voxel-Aware Fusion Language Gaussian Splatting. 4952-4961 - Chang Huang

, Jiahang Cao
, Jun Ma
, Kieren Yu
, Cong Li
, Huayong Yang
, Kaishun Wu
:
DACA-Net: A Degradation-Aware Conditional Diffusion Network for Underwater Image Enhancement. 4962-4971 - Muzhi Dai

, Jiashuo Sun
, Zhiyuan Zhao
, Shixuan Liu
, Rui Li
, Junyu Gao
, Xuelong Li
:
From Captions to Rewards (CaReVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models. 4972-4981 - Yimou Guo

, Yaochen Li
, Jingze Liu
, Jiahui Feng
, Haoyi Lou
, Zhimin Chen
, Yuan Gao
, Yuanqi Su
:
Image Captioning with Multimodal Guidance and Search Space Optimization. 4982-4991 - Yongqi Li

, Lu Yang
, Jian Wang
, Runyang You
, Wenjie Li
, Liqiang Nie
:
Towards Harmless Multimodal Assistants with Blind Preference Optimization. 4992-5000 - Donglu Yang

, Liang Zhang
, Zihao Yue
, Liangyu Chen
, Yichen Xu
, Wenxuan Wang
, Qin Jin
:
ChartM3: Benchmarking Chart Editing with Multimodal Instructions. 5001-5009 - Hao Cheng

, Erjia Xiao
, Jiayan Yang
, Jinhao Duan
, Yichi Wang
, Jiahang Cao
, Qiang Zhang
, Le Yang
, Kaidi Xu
, Jindong Gu
, Renjing Xu
:
Transfer Attack for Bad and Good: Explain and Boost Adversarial Transferability across Multimodal Large Language Models. 5010-5019 - Shibo Sun

, Xue Li
, Donglin Di
, Mingjie Wei
, Lanshun Nie
, Weinan Zhang
, Dechen Zhan
, Yang Song
, Lei Fan
:
LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning. 5020-5029 - Guoxin Zang

, Xue Li
, Donglin Di
, Lanshun Nie
, Dechen Zhan
, Yang Song
, Lei Fan
:
SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment. 5030-5039 - Zhipeng Tang

, Sha Zhang
, Jiajun Deng
, Chenjie Wang
, Guoliang You
, Yuting Huang
, Xinrui Lin
, Yanyong Zhang
:
VLMPlanner: Integrating Visual Language Models with Motion Planning. 5040-5049 - Zhiqing Cui

, Jiahao Yuan
, Hanqing Wang
, Yanshu Li
, Chenxu Du
, Zhenglong Ding
:
Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation. 5050-5059 - Jinghan Yang

, Zhenbo Xu
, Dehua Ma
, Liu Liu
, Fei Liu
, Gong Huang
, Zhaofeng He
:
RecipeRAG: Advancing Recipe Generation with Reinforced Retrieval Augmented Generation. 5060-5069 - Shiqi Zhang

, Sha Zhang
, Jiajun Deng
, Yedong Shen
, Mingxiao Ma
, Yanyong Zhang
:
PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum. 5070-5079 - Xinyao Li

, Dan Zhang
, Zhekai Du
, Lei Zhu
, Zhi Chen
, Jingjing Li
:
PatAug: Augmentation of Augmentation for Test-Time Adaptation. 5080-5089 - Xueqi Ma

, Yanbei Jiang
, Sarah M. Erfani
, James Bailey
, Weifeng Liu
, Krista A. Ehinger
, Jey Han Lau
:
Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis. 5090-5099 - Yidan Wang

, Chenyi Zhuang
, Wutao Liu
, Pan Gao
, Nicu Sebe
:
AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding. 5100-5109 - YongXiang Hua

, Haoyu Cao
, Zhou Tao
, Bocheng Li
, Zihao Wu
, Chaohu Liu
, Linli Xu
:
Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts. 5110-5119 - Yuxin Xie

, Dongyue Chen
, Yue Zhu
, Tong Jia
, Shizhuo Deng
:
Noise-Aware Decoding with Salient Region Enhancing for Zero-Shot Image Captioning. 5120-5129 - Huiyi Chen

, Jiawei Peng
, Kaihua Tang
, Xin Geng
, Xu Yang
:
Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization. 5130-5139 - Bin Kang, Bin Chen, Junjie Wang, Yulin Li, Junzhi Zhao, Junle Wang, Zhuotao Tian:

CalibCLIP: Contextual Calibration of Dominant Semantics for Text-Driven Image Retrieval. 5140-5149 - Jianming Liu

, Wenlong Qiu
, Haitao Wei
:
Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot Segmentation. 5150-5159 - Gang Pan

, Hongen Liu
, Di Sun
:
Formula Spotting Based on Synergy Perception and Representation Mining. 5160-5168 - Hang Yang

, Le Hui
, Jianjun Qian
, Jian Yang
, Yigong Zhang
, Jin Xie
:
Cross-View Geometric Collaboration for Generalizable Sparse View Neural Surface Reconstruction. 5169-5177 - Wenju Sun

, Qingyong Li
, Wen Wang
, Yangliao Geng
, Boyang Li
:
Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts. 5178-5187 - Yiyan Ji

, Haoran Chen
, Qiguang Chen
, Chengyue Wu
, Libo Qin
, Wanxiang Che
:
MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models. 5188-5197 - Changsheng Gao

, Zijie Liu
, Li Li
, Dong Liu
, Xiaoyan Sun
, Weisi Lin
:
DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation. 5198-5207 - Xinshu Li

, Ruoyu Wang
, Erdun Gao
, Mingming Gong
, Lina Yao
:
Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation. 5208-5217 - Sujuan Hou

, Zhihui Feng
, Hao Xiong
, Weiqing Min
, Peng Li
, Shuqiang Jiang
:
DSDGF-Nutri: A Decoupled Self-Distillation Network with Gating Fusion For Food Nutritional Assessment. 5218-5227 - Yixin Xu

, Hao Wu
, Jingzhou Zhu
, Fengyuan Xu
, Sheng Zhong
:
PriCAF: Privacy-Preserving Contribution Assessment in Federated Learning Before Model Training. 5228-5236 - Yuehao Huang

, Liang Liu
, Shuangming Lei
, Yukai Ma
, Hao Su
, Jianbiao Mei
, Pengxiang Zhao
, Yaqing Gu
, Yong Liu
, Jiajun Lv
:
CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking. 5237-5246 - Yafei Zhang

, Yongle Shang
, Huafeng Li
:
Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching. 5247-5256 - Fan Li

, Zanyi Wang
, Zeyi Huang
, Guang Dai
, Jingdong Wang
, Mengmeng Wang
:
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP. 5257-5266 - Yongheng Zhang

, Xu Liu
, Ruihan Tao
, Qiguang Chen
, Hao Fei
, Wanxiang Che
, Libo Qin
:
ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models. 5267-5276 - Gang Pan

, Liming Pan
, Hongze Mi
, Rongyu Xiong
,


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID