default search action
28th ACM Multimedia 2020: Virtual Event (Seattle, WA), USA
- Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, Roger Zimmermann:
MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. ACM 2020, ISBN 978-1-4503-7988-5
Oral Session A1: Deep Learning for Multimedia
- Jin Wang, Chen Wang, Qingming Huang, Yunhui Shi, Jian-Feng Cai, Qing Zhu, Baocai Yin:
Image Inpainting Based on Multi-frequency Probabilistic Inference Model. 1-9 - Jianzhe Lin, Lichao Mou, Tianze Yu, Xiaoxiang Zhu, Z. Jane Wang:
Dual Adversarial Network for Unsupervised Ground/Satellite-to-Aerial Scene Adaptation. 10-18 - Yadan Luo, Zi Huang, Zijian Wang, Zheng Zhang, Mahsa Baktashmotlagh:
Adversarial Bipartite Graph Learning for Video Domain Adaptation. 19-27 - Peng Wang, Dongyang Liu, Hui Li, Qi Wu:
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge. 28-36 - Weijiang Yu, Jian Liang, Lu Li, Nong Xiao:
Single Image De-noising via Staged Memory Network. 37-45 - Xuanchi Ren, Haoran Li, Zijian Huang, Qifeng Chen:
Self-supervised Dance Video Synthesis Conditioned on Music. 46-54
Oral Session B1: Deep Learning for Multimedia
- Fanfan Ye, Shiliang Pu, Qiaoyong Zhong, Chao Li, Di Xie, Huiming Tang:
Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition. 55-63 - Peike Li, Yunchao Wei, Yi Yang:
Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning. 64-72 - Wei Li, Zhenting Wang, Xiao Wu, Ji Zhang, Qiang Peng, Hongliang Li:
CODAN: Counting-driven Attention Network for Vehicle Detection in Congested Scenes. 73-82 - Jingkang Yang, Weirong Chen, Litong Feng, Xiaopeng Yan, Huabin Zheng, Wayne Zhang:
Webly Supervised Image Classification with Metadata: Automatic Noisy Label Correction via Visual-Semantic Graph. 83-91 - Zeren Sun, Xian-Sheng Hua, Yazhou Yao, Xiu-Shen Wei, Guosheng Hu, Jian Zhang:
CRSSC: Salvage Reusable Samples from Noisy Data for Robust Learning. 92-101 - Jen-Chun Lin, Wen-Li Wei, Yen-Yu Lin, Tyng-Luh Liu, Hong-Yuan Mark Liao:
Learning From Music to Visual Storytelling of Shots: A Deep Interactive Learning Mechanism. 102-110
Oral Session C1: Deep Learning for Multimedia
- Fangfang Wang, Yifeng Chen, Fei Wu, Xi Li:
TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene Text Detection. 111-119 - Peng Lu, Jiahui Liu, Xujun Peng, Xiaojie Wang:
Weakly Supervised Real-time Image Cropping based on Aesthetic Distributions. 120-128 - Yuting Liu, Zheng Wang, Miaojing Shi, Shin'ichi Satoh, Qijun Zhao, Hongyu Yang:
Towards Unsupervised Crowd Counting via Regression-Detection Bi-knowledge Transfer. 129-137 - Yanlu Wei, Renshuai Tao, Zhangjie Wu, Yuqing Ma, Libo Zhang, Xianglong Liu:
Occluded Prohibited Items Detection: An X-ray Security Inspection Benchmark and De-occlusion Attention Module. 138-146 - Hsuan-Kai Kao, Li Su:
Temporally Guided Music-to-Body-Movement Generation. 147-155 - Yixiong Zou, Shanghang Zhang, Ke Chen, Yonghong Tian, Yaowei Wang, José M. F. Moura:
Compositional Few-Shot Recognition with Primitive Discovery and Enhancing. 156-164
Oral Session D1: Deep Learning for Multimedia
- Chen Gao, Si Liu, Defa Zhu, Quan Liu, Jie Cao, Haoqian He, Ran He, Shuicheng Yan:
InteractGAN: Learning to Generate Human-Object Interaction. 165-173 - Shijie Wang, Zhihui Wang, Haojie Li, Wanli Ouyang:
Category-specific Semantic Coherency Learning for Fine-grained Image Recognition. 174-183 - Che Sun, Yunde Jia, Yao Hu, Yuwei Wu:
Scene-Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos. 184-192 - Jing Jin, Junhui Hou, Jie Chen, Sam Kwong, Jingyi Yu:
Light Field Super-resolution via Attention-Guided Fusion of Hybrid Lenses. 193-201 - Wei-Cheng Lai, Zi-Xiang Xia, Hao-Siang Lin, Lien-Feng Hsu, Hong-Han Shuai, I-Hong Jhuo, Wen-Huang Cheng:
Trajectory Prediction in Heterogeneous Environment via Attended Ecology Embedding. 202-210 - Liang Sun, Xiang Guan, Yang Yang, Lei Zhang:
Text-Embedded Bilinear Model for Fine-Grained Visual Recognition. 211-219
Oral Session E1: Deep Learning for Multimedia
- Zhiheng Ma, Xing Wei, Xiaopeng Hong, Yihong Gong:
Learning Scales from Points: A Scale-aware Probabilistic Model for Crowd Counting. 220-228 - Bi Li, Chengquan Zhang, Zhibin Hong, Xu Tang, Jingtuo Liu, Junyu Han, Errui Ding, Wenyu Liu:
Learning Global Structure Consistency for Robust Object Tracking. 229-237 - Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, Raymond Huang:
Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene. 238-246 - Jun-Hyuk Kim, Soobeom Jang, Jun-Ho Choi, Jong-Seok Lee:
Instability of Successive Deep Image Compression. 247-255 - Akash Gupta, Abhishek Aich, Amit K. Roy-Chowdhury:
ALANET: Adaptive Latent Attention Network for Joint Video Deblurring and Interpolation. 256-264 - Shaotian Yan, Chen Shen, Zhongming Jin, Jianqiang Huang, Rongxin Jiang, Yaowu Chen, Xian-Sheng Hua:
PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation. 265-273
Oral Session F1: Deep Learning for Multimedia
- Peixi Peng, Yonghong Tian, Yangru Huang, Xiangqian Wang, Huilong An:
Discriminative Spatial Feature Learning for Person Re-Identification. 274-283 - Xiangping Wu, Qingcai Chen, Wei Li, Yulun Xiao, Baotian Hu:
AdaHGNN: Adaptive Hypergraph Neural Networks for Multi-Label Image Classification. 284-293 - Dawei Zhang, Zhonglong Zheng, Minglu Li, Xiaowei He, Tianxiang Wang, Liyuan Chen, Riheng Jia, Feilong Lin:
Reinforced Similarity Learning: Siamese Relation Networks for Robust Object Tracking. 294-303 - Ruoxi Deng, Shengjun Liu:
Deep Structural Contour Detection. 304-312 - Saurabh Sahu, Palash Goyal, Shalini Ghosh, Chul Lee:
Cross-modal Non-linear Guided Attention and Temporal Coherence in Multi-modal Deep Video Models. 313-321 - Zhenhuan Liu, Jincan Deng, Liang Li, Shaofei Cai, Qianqian Xu, Shuhui Wang, Qingming Huang:
IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning. 322-330
Oral Session G1: Deep Learning for Multimedia
- Xin Wang, Wei Huang, Qi Liu, Yu Yin, Zhenya Huang, Le Wu, Jianhui Ma, Xue Wang:
Fine-Grained Similarity Measurement between Educational Videos and Exercises. 331-339 - Mengli Cheng, Minghui Qiu, Xing Shi, Jun Huang, Wei Lin:
One-shot Text Field labeling using Attention and Belief Propagation for Structure Information Extraction. 340-348 - Yunzhuo Liu, Bo Jiang, Tian Guo, Ramesh K. Sitaraman, Don Towsley, Xinbing Wang:
Grad: Learning for Overhead-aware Adaptive Video Streaming with Scalable Video Coding. 349-357 - Yat Hong Lam, Alireza Zare, Francesco Cricri, Jani Lainema, Miska M. Hannuksela:
Efficient Adaptation of Neural Network Filter for Video Compression. 358-366 - Naoki Kimura, Keisuke Shiro, Yota Takakura, Hiromi Nakamura, Jun Rekimoto:
SonoSpace: Visual Feedback of Timbre with Unsupervised Learning. 367-374 - Bo Pang, Deming Zhai, Junjun Jiang, Xianming Liu:
Single Image Deraining via Scale-space Invariant Attention Neural Network. 375-383
Oral Session H1: Emerging Multimedia Applications
- Kaihao Zhang, Wenhan Luo, Björn Stenger, Wenqi Ren, Lin Ma, Hongdong Li:
Every Moment Matters: Detail-Aware Networks to Bring a Blurry Image Alive. 384-392 - Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei, Xiaolin Wei, Shuqiang Jiang:
ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network. 393-401 - Tianyu Zhang, Weiqing Min, Ying Zhu, Yong Rui, Shuqiang Jiang:
An Egocentric Action Anticipation Framework via Fusing Intuition and Analysis. 402-410 - Diangang Li, Jianquan Liu, Shoji Nishimura, Yuka Hayashi, Jun Suzuki, Yihong Gong:
Multi-Person Action Recognition in Microwave Sensors. 411-420 - Qi Jia, Xin Fan, Meiyu Yu, Yuqing Liu, Dingrong Wang, Longin Jan Latecki:
Coupling Deep Textural and Shape Features for Sketch Recognition. 421-429 - Huaizheng Zhang, Yong Luo, Qiming Ai, Yonggang Wen, Han Hu:
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning. 430-438
Oral Session A2: Emerging Multimedia Applications
- Komal Chugh, Parul Gupta, Abhinav Dhall, Ramanathan Subramanian:
Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization. 439-447 - Kai Cheng, Xin Liu, Yiu-ming Cheung, Rui Wang, Xing Xu, Bineng Zhong:
Hearing like Seeing: Improving Voice-Face Interactions and Associations via Adversarial Deep Semantic Matching Network. 448-455 - Ramit Sawhney, Puneet Mathur, Ayush Mangal, Piyush Khanna, Rajiv Ratn Shah, Roger Zimmermann:
Multimodal Multi-Task Financial Risk Forecasting. 456-465 - Jiahang Wang, Tong Sha, Wei Zhang, Zhoujun Li, Tao Mei:
Down to the Last Detail: Virtual Try-on with Fine-grained Details. 466-474 - Yifeng Zhou, Xing Xu, Fumin Shen, Lianli Gao, Huimin Lu, Heng Tao Shen:
Temporal Denoising Mask Synthesis Network for Learning Blind Video Temporal Consistency. 475-483 - K. R. Prajwal, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C. V. Jawahar:
A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild. 484-492
Oral Session B2: Emotional and Social Signals in Multimedia
- Guangyao Shen, Xin Wang, Xuguang Duan, Hongzhi Li, Wenwu Zhu:
MEmoR: A Dataset for Multimodal Emotion Reasoning in Videos. 493-502 - Dong Zhang, Weisheng Zhang, Shoushan Li, Qiaoming Zhu, Guodong Zhou:
Modeling both Intra- and Inter-modal Influence for Real-Time Emotion Detection in Conversations. 503-511 - Xincheng Ju, Dong Zhang, Junhui Li, Guodong Zhou:
Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection. 512-520 - Kaicheng Yang, Hua Xu, Kai Gao:
CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis. 521-528 - Xingkun Zuo, Jiyi Li, Qili Zhou, Jianjun Li, Xiaoyang Mao:
AffectI: A Game for Diverse, Reliable, and Efficient Affective Image Annotation. 529-537 - Shi Yin, Shangfei Wang, Xiaoping Chen, Enhong Chen, Cong Liang:
Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking. 538-546
Oral Session C2: Media Interpretation
- Xiaobin Liu, Shiliang Zhang:
Domain Adaptive Person Re-Identification via Coupling Optimization. 547-555 - Peipei Li, Yinglu Liu, Hailin Shi, Xiang Wu, Yibo Hu, Ran He, Zhenan Sun:
Dual-Structure Disentangling Variational Generation for Data-Limited Face Parsing. 556-564 - Chunhui Zhang, Shiming Ge, Kangkai Zhang, Dan Zeng:
Accurate UAV Tracking with Distance-Injected Overlap Maximization. 565-573 - Hongru Liang, Wenqiang Lei, Paul Yaozhu Chan, Zhenglu Yang, Maosong Sun, Tat-Seng Chua:
PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music. 574-582 - Guang Yu, Siqi Wang, Zhiping Cai, En Zhu, Chuanfu Xu, Jianping Yin, Marius Kloft:
Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events. 583-591 - Qian Bao, Wu Liu, Jun Hong, Lingyu Duan, Tao Mei:
Pose-native Network Architecture Search for Multi-person Human Pose Estimation. 592-600
Oral Session D2: Media Interpretation
- Xiruo Shi, Liutong Xu, Pengfei Wang, Yuanyuan Gao, Haifang Jian, Wu Liu:
Beyond the Attention: Distinguish the Discriminative and Confusable Features For Fine-grained Image Classification. 601-609 - Hao Tang, Zechao Li, Zhimao Peng, Jinhui Tang:
BlockMix: Meta Regularization and Self-Calibrated Inference for Metric-Based Meta-Learning. 610-618 - Dechao Meng, Liang Li, Shuhui Wang, Xingyu Gao, Zheng-Jun Zha, Qingming Huang:
Fine-grained Feature Alignment with Part Perspective Transformation for Vehicle ReID. 619-627 - Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Qiang Liu, Xiaojun Hu:
Compact Bilinear Augmented Query Structured Attention for Sport Highlights Classification. 628-636 - Jiacheng Li, Zhiwei Xiong, Dong Liu, Xuejin Chen, Zheng-Jun Zha:
Semantic Image Analogy with a Conditional Single-Image GAN. 637-645 - Yangchun Zhu, Zheng-Jun Zha, Tianzhu Zhang, Jiawei Liu, Jiebo Luo:
A Structured Graph Attention Network for Vehicle Re-Identification. 646-654
Oral Session E2: Media Interpretation
- Baoyu Fan, Li Wang, Runze Zhang, Zhenhua Guo, Yaqian Zhao, Rengang Li, Weifeng Gong:
Contextual Multi-Scale Feature Learning for Person Re-Identification. 655-663 - Zeyu Xiao, Zhiwei Xiong, Xueyang Fu, Dong Liu, Zheng-Jun Zha:
Space-Time Video Super-Resolution Using Temporal Profiles. 664-672 - Boqiang Xu, Lingxiao He, Xingyu Liao, Wu Liu, Zhenan Sun, Tao Mei:
Black Re-ID: A Head-shoulder Descriptor for the Challenging Problem of Person Re-Identification. 673-681 - Haoran Lv, Qin Yang, Chenglin Li, Wenrui Dai, Junni Zou, Hongkai Xiong:
SalGCN: Saliency Prediction for 360-Degree Images Based on Spherical Graph Convolutional Networks. 682-690 - Sai Praneeth Reddy Sunkesula, Rishabh Dabral, Ganesh Ramakrishnan:
LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos. 691-699 - Zhengqing Fang, Kun Kuang, Yuxiao Lin, Fei Wu, Yu-Feng Yao:
Concept-based Explanation for Fine-grained Images and Its Application in Infectious Keratitis Classification. 700-708
Oral Session F2: Mobile Multimedia & Multimedia HCI and Quality of Experience
- Yuanqiang Cai, Dawei Du, Libo Zhang, Longyin Wen, Weiqiang Wang, Yanjun Wu, Siwei Lyu:
Guided Attention Network for Object Detection and Counting on Drones. 709-717 - Jingchen Sun, Jiming Chen, Tao Chen, Jiayuan Fan, Shibo He:
PIDNet: An Efficient Network for Dynamic Pedestrian Intrusion Detection. 718-726 - Xing Cai, Lanqing Zhang, Chengyuan Li, Ge Li, Thomas H. Li:
VONAS: Network Design in Visual Odometry using Neural Architecture Search. 727-735 - Wenbo Zheng, Lan Yan, Fei-Yue Wang, Chao Gou:
Learning from the Past: Meta-Continual Learning with Knowledge Embedding for Jointly Sketch, Cartoon, and Caricature Face Recognition. 736-743 - Zijie Ye, Haozhe Wu, Jia Jia, Yaohua Bu, Wei Chen, Fanbo Meng, Yanfeng Wang:
ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit. 744-752 - Qiushi Li, Wenwu Zhu, Chao Wu, Xinglin Pan, Fan Yang, Yuezhi Zhou, Yaoxue Zhang:
InvisibleFL: Federated Learning over Non-Informative Intermediate Updates against Multimedia Privacy Leakages. 753-762 - Shu Zhao, Dayan Wu, Wanqian Zhang, Yu Zhou, Bo Li, Weiping Wang:
Asymmetric Deep Hashing for Efficient Hash Code Compression. 763-771
Oral Session G2: Multimedia HCI and Quality of Experience
- Yuen-Jen Lin, Hsuan-Kai Kao, Yih-Chih Tseng, Ming Tsai, Li Su:
A Human-Computer Duet System for Music Performance. 772-780 - Yujia Wang, Sifan Hou, Bing Ning, Wei Liang:
Photo Stand-Out: Photography with Virtual Character. 781-788 - Dingquan Li, Tingting Jiang, Ming Jiang:
Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment. 789-797 - Munan Xu, Jia-Xing Zhong, Yurui Ren, Shan Liu, Ge Li:
Context-aware Attention Network for Predicting Image Aesthetic Subjectivity. 798-806 - Nikolas Wehner, Michael Seufert, Sebastian Egger-Lampl, Bruno Gardlo, Pedro Casas, Raimund Schatz:
Scoring High: Analysis and Prediction of Viewer Behavior and Engagement in the Context of 2018 FIFA WC Live Streaming. 807-815 - Jingwen Hou, Sheng Yang, Weisi Lin:
Object-level Attention for Aesthetic Rating Distribution Prediction. 816-824 - Zhaohui Zhang, Haichao Zhu, Qian Zhang:
ARSketch: Sketch-Based User Interface for Augmented Reality Glasses. 825-833
Oral Session H2: Multimedia HCI and Quality of Experience & Multimedia Search and Recommendation
- Pengfei Chen, Leida Li, Lei Ma, Jinjian Wu, Guangming Shi:
RIRNet: Recurrent-In-Recurrent Network for Video Quality Assessment. 834-842 - Yiru Wang, Shen Huang, Gongfu Li, Qiang Deng, Dongliang Liao, Pengda Si, Yujiu Yang, Jin Xu:
Cognitive Representation Learning of Self-Media Online Article Quality. 843-851 - Jakub Nawala, Lucjan Janowski, Bogdan Cmiel, Krzysztof Rusek:
Describing Subjective Experiment Consistency by p-Value P-P Plot. 852-861 - Leonardo Galteri, Marco Bertini, Lorenzo Seidenari, Tiberio Uricchio, Alberto Del Bimbo:
Increasing Video Perceptual Quality with GANs and Semantic Coding. 862-870 - Yongxin Wang, Xin Luo, Xin-Shun Xu:
Label Embedding Online Hashing for Cross-Modal Retrieval. 871-879 - Zhaopeng Li, Qianqian Xu, Yangbangyan Jiang, Xiaochun Cao, Qingming Huang:
Quaternion-Based Knowledge Graph Network for Recommendation. 880-888
Oral Session A3: Multimedia Search and Recommendation
- Yongguo Ling, Zhun Zhong, Zhiming Luo, Paolo Rota, Shaozi Li, Nicu Sebe:
Class-Aware Modality Mix and Center-Guided Metric Learning for Visible-Thermal Person Re-Identification. 889-897 - Da Cao, Yawen Zeng, Xiaochi Wei, Liqiang Nie, Richang Hong, Zheng Qin:
Adversarial Video Moment Retrieval by Jointly Modeling Ranking and Localization. 898-906 - Xinchen Liu, Wu Liu, Jinkai Zheng, Chenggang Yan, Tao Mei:
Beyond the Parts: Learning Multi-view Cross-part Correlation for Vehicle Re-identification. 907-915 - Lu Jin, Zechao Li, Yonghua Pan, Jinhui Tang:
Weakly-Supervised Image Hashing through Masked Visual-Semantic Graph-based Reasoning. 916-924 - Heyu Zhou, Weizhi Nie, Dan Song, Nian Hu, Xuanya Li, An-An Liu:
Semantic Consistency Guided Instance Feature Alignment for 2D Image-Based 3D Shape Retrieval. 925-933 - Niluthpol Chowdhury Mithun, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar:
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization. 934-954
Oral Session B3: Multimedia Systems and Middleware & Media Transport and Delivery
- Weiming Zhuang, Yonggang Wen, Xuesen Zhang, Xin Gan, Daiying Yin, Dongzhan Zhou, Shuai Zhang, Shuai Yi:
Performance Optimization of Federated Person Re-identification via Benchmark Analysis. 955-963 - Hung-Min Hsu, Yizhou Wang, Jenq-Neng Hwang:
Traffic-Aware Multi-Camera Tracking of Vehicles Based on ReID and Camera Link Model. 964-972 - Jie Wu, Tianshui Chen, Lishan Huang, Hefeng Wu, Guanbin Li, Ling Tian, Liang Lin:
Active Object Search. 973-981 - Jun Yi, Md Reazul Islam, Shivang Aggarwal, Dimitrios Koutsonikolas, Y. Charlie Hu, Zhisheng Yan:
An Analysis of Delay in Live 360° Video Streaming Systems. 982-990 - Yuhang Li, Xuejin Chen, Binxin Yang, Zihan Chen, Zhihua Cheng, Zheng-Jun Zha:
DeepFacePencil: Creating Face Images from Freehand Sketches. 991-999 - Peilin Chen, Wenhan Yang, Long Sun, Shiqi Wang:
When Bitstream Prior Meets Deep Prior: Compressed Video Super-resolution with Learning from Decoding. 1000-1008 - Gang Yan, Jian Li:
RL-Bélády: A Unified Learning Framework for Content Caching. 1009-1017
Oral Session C3: Multimodal Analysis and Description &Summarization, Analytics, and Storytelling
- Zhizhong Han, Chao Chen, Yu-Shen Liu, Matthias Zwicker:
ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences. 1018-1027