


default search action
7th PRCV 2024: Urumqi, China - Part V
- Zhouchen Lin, Ming-Ming Cheng, Ran He, Kurban Ubul, Wushouer Silamu, Hongbin Zha, Jie Zhou, Cheng-Lin Liu:

Pattern Recognition and Computer Vision - 7th Chinese Conference, PRCV 2024, Urumqi, China, October 18-20, 2024, Proceedings, Part V. Lecture Notes in Computer Science 15035, Springer 2025, ISBN 978-981-97-8619-0
Multi-modal Information Processing
- Zehui Wang

, Zhihan Zhang, Hongtao Wang
:
A Multi-modal Framework with Contrastive Learning and Sequential Encoding for Enhanced Sleep Stage Detection. 3-17 - Anran Wu

, Shuwen Yang, Yujia Xia, Xingjiao Wu
, Tianlong Ma, Liang He:
Charting the Uncharted: Building and Analyzing a Multifaceted Chart Question Answering Dataset for Complex Logical Reasoning Process. 18-33 - Yaokun Zhong, Tianming Liang, Jian-Fang Hu:

Time-Frequency Mutual Learning for Moment Retrieval and Highlight Detection. 34-48 - Yanyu Qi

, Ruohao Guo
, Zhenbo Li
, Dantong Niu
, Liao Qu
:
Masked Visual Pre-training for RGB-D and RGB-T Salient Object Detection. 49-66 - Xian Qu, Yingyi Yang, Xiaoming Mai:

Cascade Coarse-to-Fine Point-Query Transformer for RGB-T Crowd Counting. 67-83 - Jiaqi Hu, Jiedong Zhuang, Xiaoyu Liang, Dayong Wang, Lu Yu, Haoji Hu:

Perceptual Image Compression with Text-Guided Multi-level Fusion. 84-97 - Haiwen Zhang, Zixi Yang, Yuanzhi Liu, Xinran Wang, Zheqi He, Kongming Liang, Zhanyu Ma:

Evaluating Attribute Comprehension in Large Vision-Language Models. 98-113 - Yihang Meng, Hao Cheng, Zihua Wang, Hongyuan Zhu, Xiuxian Lao, Yu Zhang:

Efficient Multi-modal Human-Centric Contrastive Pre-training with a Pseudo Body-Structured Prior. 114-128 - Yusong Hu, Yuting Gao, Zihan Xu, Ke Li, Xialei Liu:

A3R: Vision Language Pre-training by Attentive Alignment and Attentive Reconstruction. 129-142 - Yuxuan Wang, Tianwei Cao, Kongming Liang, Zhongjiang He, Hao Sun, Yongxiang Li, Zhanyu Ma:

Mixture-of-Hand-Experts: Repainting the Deformed Hand Images Generated by Diffusion Models. 143-157 - Xi Yu, Wenti Huang, Jun Long:

ConD2: Contrastive Decomposition Distilling for Multimodal Sentiment Analysis. 158-172 - Ruihao Zhang, Jinsong Geng, Cenyu Liu, Wei Zhang, Zunlei Feng, Liang Xue, Yijun Bei:

Multi-layer Tuning CLIP for Few-Shot Image Classification. 173-186 - Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao:

DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model. 187-200 - Zebao Zhang

, Shuang Yang
, Haiwei Pan
:
Text-Dominant Interactive Attention for Cross-Modal Sentiment Analysis. 201-215 - Yuqiu Kong, Junhua Liu, Cuili Yao:

Dual Context Perception Transformer for Referring Image Segmentation. 216-230 - Min Luo, Boda Lin, Binghao Tang, Haolong Yan, Si Li:

ELEMO: Elements Focused Emotion Recognition for Sticker Images. 231-245 - Lin Cao, Wenwen Sun, Yanan Guo, Shoujing Wang, Boqian Lv:

Cross-Modal Dual Matching and Comparison for Text-to-Image Person Re-identification. 246-259 - Turghun Tayir

, Lin Li, Mieradilijiang Maimaiti, Yusnur Muhtar:
Low-Resource Machine Translation with Different Granularity Image Features. 260-273 - Weinan Guan, Wei Wang

, Bo Peng, Jing Dong, Tieniu Tan:
ST-SBV: Spatial-Temporal Self-Blended Videos for Deepfake Detection. 274-288 - Zichun Wang, Xu Cheng:

Learning a Robust Synthetic Modality with Dual-Level Alignment for Visible-Infrared Person Re-identification. 289-303 - Ruitao Pu, Dezhong Peng, Fujun Hua:

Deep Noisy Multi-label Learning for Robust Cross-Modal Retrieval. 304-317 - Weitao Song

, Weiran Chen
, Jialiang Xu
, Yi Ji
, Ying Li
, Chunping Liu
:
Uncertainty-Aware with Negative Samples for Video-Text Retrieval. 318-332 - Suyan Cheng, Feifei Zhang, Haoliang Zhou, Changsheng Xu:

Multi-modal Knowledge-Enhanced Fine-Grained Image Classification. 333-346 - Jiaxi Wang, Wenhui Hu, Xueyang Liu, Beihu Wu, Yuting Qiu, Yingying Cai:

Bridging Modality Gap for Visual Grounding with Effecitve Cross-Modal Distillation. 347-363 - Chunlan Zhan, Wenhua Qian, Peng Liu:

EGSRNet: Emotion-Label Guiding and Similarity Reasoning Network for Multimodal Sentiment Analysis. 364-378 - Min Zhu, Guanming Liu, Zhihua Wei:

VL-MPFT: Multitask Parameter-Efficient Fine-Tuning for Visual-Language Pre-trained Models via Task-Adaptive Masking. 379-394 - Zhuzhu Zhang, Xian Fu, Tianrui Wu, Yu Sun, Ningning Zhang, Hui Zhang:

A Multimodal Fake News Detection Model Leveraging Image Frequency and Spatial Domain Analysis with Deep Dynamic Trade-Off Fusion. 395-409 - Min Zheng, Chunpeng Wu, Yue Wang, Weiwei Liu, Qinghe Ye, Ke Chang, Cuncun Shi, Fei Zhou:

Efficiency-Aware Fine-Grained Vision-Language Retrieval via a Global-Contextual Autoencoder. 410-423 - Xiaorui Shi

:
Towards Making the Most of Knowledge Across Languages for Multimodal Cross-Lingual Summarization. 424-438 - Zhengqing Gao, Xiang Ao, Xu-Yao Zhang, Cheng-Lin Liu:

Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning. 439-452 - Tianci Xun, Zhong Zheng, Yulin He, Wei Chen, Weiwei Zheng:

Unleashing the Class-Incremental Learning Potential of Foundation Models by Virtual Feature Generation and Replay. 453-467 - Jiaxuan Li, Likun Huang, Chuanhu Zhu, Song Zhang, Qiang Li:

Multimodal Feature Hierarchical Fusion for Text-Image Person Re-identification. 468-481 - Xiaoyu Liang, Jiayuan Yu, Lianrui Mu, Jiedong Zhuang, Jiaqi Hu, Yuchen Yang, Jiangnan Ye, Lu Lu, Jian Chen, Haoji Hu:

Mitigating Hallucination in Visual-Language Models via Re-balancing Contrastive Decoding. 482-496 - Shanshan Chen

, Dan Xu
, Kangjian He
:
Multimodal Medical Image Registration Using Optimized Phase Consistency Within Joint Frequency-Space Domain. 497-510 - Weichen Huang, Xinyue Ju, You Zhou, Yipeng Xu, Gang Yang:

Two Semantic Information Extension Enhancement Methods For Zero-Shot Learning. 511-525 - Yihan Zhao, Wei Xi, Gairui Bai, Xinhui Liu, Jizhong Zhao:

Robust Contrastive Learning Against Audio-Visual Noisy Correspondence. 526-540 - Xiaofan Wang, Xiuhong Li

, Zhe Li
, Chenyu Zhou
, Fan Chen, Dan Yang:
Enhancing Cross-Modal Alignment in Multimodal Sentiment Analysis via Prompt Learning. 541-554 - Zirui Shang, Shuo Yang

, Xinxiao Wu:
Efficient Language-Driven Action Localization by Feature Aggregation and Prediction Adjustment. 555-568 - Ran Xu:

Greedy Fusion Oriented Representations for Multimodal Sentiment Analysis. 569-581 - Zhiyun Chen, Qing Zhang, Jie Liu, Yufei Wang, Haocheng Lv, LanXuan Wang, Jianyong Duan, Mingying Xv, Hao Wang:

Counterfactual Multimodal Fact-Checking Method Based on Causal Intervention. 582-595 - Min Li, Feng Li, Enguang Zuo, Xiaoyi Lv, Chen Chen, Cheng Chen:

Rethinking the Necessity of Learnable Modal Alignment for Medical Image Fusion. 596-610 - Yanting Zhang, Jingyi Guo, Cairong Yan, Zhijun Fang:

Taming Diffusion for Fashion Clothing Generation with Versatile Condition. 611-625

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














