


default search action
Haoyu Cao 0001
Person information
- affiliation: Tencent YouTu Lab, Hefei, China
Other persons with the same name
- Haoyu Cao 0002
— Harbin Institute of Technology, Shenzhen, China - Haoyu Cao 0003
— Nanjing Normal University, School of Geography, China - Haoyu Cao 0004 — University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[i17]Chaoyou Fu, Haojia Lin, Xiong Wang, Yifan Zhang, Yunhang Shen, Xiaoyu Liu, Haoyu Cao, Zuwei Long, Heting Gao, Ke Li, Long Ma, Xiawu Zheng, Rongrong Ji, Xing Sun
, Caifeng Shan, Ran He:
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction. CoRR abs/2501.01957 (2025)
[i16]Yunhang Shen, Chaoyou Fu, Shaoqi Dong, Xiong Wang, Yifan Zhang, Peixian Chen, Mengdan Zhang, Haoyu Cao, Ke Li, Xiawu Zheng, Yan Zhang, Yiyi Zhou, Ran He, Caifeng Shan, Rongrong Ji, Xing Sun
:
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy. CoRR abs/2502.05177 (2025)
[i15]Zuwei Long, Yunhang Shen, Chaoyou Fu, Heting Gao, Lijiang Li, Peixian Chen, Mengdan Zhang, Hang Shao, Jian Li, Jinlong Peng, Haoyu Cao, Ke Li, Rongrong Ji, Xing Sun:
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model. CoRR abs/2505.03739 (2025)
[i14]Zhehan Kan, Yanlin Liu, Kun Yin, Xinghua Jiang, Xin Li, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun, Qingmin Liao, Wenming Yang:
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs. CoRR abs/2505.20777 (2025)
[i13]Xin Li, Mingming Gong, Yunfei Wu, Jianxin Dai, Antai Guo, Xinghua Jiang, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun:
DREAM: Document Reconstruction via End-to-end Autoregressive Model. CoRR abs/2507.05805 (2025)
[i12]Shaoqi Dong, Chaoyou Fu, Haihan Gao, Yifan Zhang, Chi Yan, Chu Wu, Xiaoyu Liu, Yunhang Shen, Jing Huo, Deqiang Jiang, Haoyu Cao, Yang Gao, Xing Sun, Ran He, Caifeng Shan:
VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation. CoRR abs/2510.09607 (2025)
[i11]YongXiang Hua, Haoyu Cao, Zhou Tao, Bocheng Li, Zihao Wu, Chaohu Liu, Linli Xu:
Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts. CoRR abs/2510.16448 (2025)
[i10]Xiaoyu Liu, Chaoyou Fu, Chi Yan, Chu Wu, Haihan Gao, Yifan Zhang, Shaoqi Dong, Chen Qian, Bin Luo, Xiuyong Yang, Guanwu Li, Yusheng Cai, Yunhang Shen, Deqiang Jiang, Haoyu Cao, Xing Sun, Caifeng Shan, Ran He:
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting. CoRR abs/2510.21817 (2025)- 2024
[j2]Mao Zhang, Tie Zhang, Yifei Cheng, Changcun Bao, Haoyu Cao, Deqiang Jiang, Linli Xu
:
Communication-efficient clustered federated learning via model distance. Mach. Learn. 113(6): 3869-3888 (2024)
[j1]Wenwen Yu
, Yuliang Liu
, Xingkui Zhu
, Haoyu Cao
, Xing Sun
, Xiang Bai
:
Turning a CLIP Model Into a Scene Text Spotter. IEEE Trans. Pattern Anal. Mach. Intell. 46(9): 6040-6054 (2024)
[c10]Haoqiu Yan, Yongxin Zhu, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu:
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction. ACL (1) 2024: 15009-15022
[c9]Bocheng Li, Zhujin Gao, Yongxin Zhu, Kun Yin, Haoyu Cao, Deqiang Jiang, Linli Xu:
Few-shot Temporal Pruning Accelerates Diffusion Models for Text Generation. LREC/COLING 2024: 7259-7269
[c8]Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun
, Linli Xu:
HRVDA: High-Resolution Visual Document Assistant. CVPR 2024: 15534-15545
[c7]Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun
:
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models. CVPR 2024: 15546-15555
[c6]Yubo Wang
, Chaohu Liu
, Yanqiu Qu
, Haoyu Cao
, Deqiang Jiang
, Linli Xu
:
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models. ACM Multimedia 2024: 1072-1081
[i9]Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun
:
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models. CoRR abs/2402.19014 (2024)
[i8]Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun
, Linli Xu:
HRVDA: High-Resolution Visual Document Assistant. CoRR abs/2404.06918 (2024)
[i7]Haoqiu Yan, Yongxin Zhu, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu:
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction. CoRR abs/2406.12707 (2024)
[i6]Yubo Wang, Chaohu Liu, Yanqiu Qu, Haoyu Cao, Deqiang Jiang, Linli Xu:
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models. CoRR abs/2410.06699 (2024)- 2023
[c5]Haoyu Cao, Changcun Bao, Chaohu Liu, Huang Chen, Kun Yin, Hao Liu, Yinsong Liu, Deqiang Jiang, Xing Sun
:
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration. ICCV 2023: 19460-19470
[c4]Wenwen Yu
, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, Mingyu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu
, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo
, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun
, Jingdong Wang, Xiang Bai:
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images. ICDAR (2) 2023: 536-552
[i5]Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, Mingyu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai:
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images. CoRR abs/2306.03287 (2023)
[i4]Wenwen Yu, Yuliang Liu
, Xingkui Zhu
, Haoyu Cao, Xing Sun
, Xiang Bai:
Turning a CLIP Model into a Scene Text Spotter. CoRR abs/2308.10408 (2023)
[i3]Haoyu Cao, Changcun Bao, Chaohu Liu, Huang Chen, Kun Yin, Hao Liu, Yinsong Liu, Deqiang Jiang, Xing Sun
:
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration. CoRR abs/2309.01131 (2023)- 2022
[c3]Haoyu Cao, Xin Li, Jiefeng Ma, Deqiang Jiang, Antai Guo, Yiqing Hu, Hao Liu, Yinsong Liu, Bo Ren:
Query-driven Generative Network for Document Information Extraction in the Wild. ACM Multimedia 2022: 4261-4271
[c2]Xin Li, Yan Zheng, Yiqing Hu, Haoyu Cao, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Bo Ren:
Relational Representation Learning in Visually-Rich Documents. ACM Multimedia 2022: 4614-4624
[c1]Haoyu Cao, Jiefeng Ma, Antai Guo, Yiqing Hu, Hao Liu, Deqiang Jiang, Yinsong Liu, Bo Ren:
GMN: Generative Multi-modal Network for Practical Document Information Extraction. NAACL-HLT 2022: 3768-3778
[i2]Xin Li, Yan Zheng, Yiqing Hu, Haoyu Cao, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Bo Ren:
Relational Representation Learning in Visually-Rich Documents. CoRR abs/2205.02411 (2022)
[i1]Haoyu Cao, Jiefeng Ma, Antai Guo, Yiqing Hu, Hao Liu, Deqiang Jiang, Yinsong Liu, Bo Ren:
GMN: Generative Multi-modal Network for Practical Document Information Extraction. CoRR abs/2207.04713 (2022)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-11-17 23:53 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







