


default search action
7th MIPR 2024: San Jose, CA, USA
- 7th IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2024, San Jose, CA, USA, August 7-9, 2024. IEEE 2024, ISBN 979-8-3503-5142-2

- Tsung-Shan Yang, Yun-Cheng Wang, Chengwei Wei, C.-C. Jay Kuo:

GHOI: A Green Human-Object-Interaction Detector. 1-7 - Mei Qiu, Wei Lin, Lauren Ann Christopher, Stanley Y. P. Chien, Yaobin Chen, Shu Hu:

Real-Time Lane-Wise Traffic Monitoring in Optimal ROIs. 8-14 - Mehmet Akif Özkanoglu, Ali C. Begen, Sedat Ozer:

SkyDataNet: An Object Detection Algorithm with 2D Gaussian Loss for UAV-Based Aerial Images. 21-27 - Yao-Hui Su, Ming-Der Shieh, Chia-Chi Tsai

:
Target-Aware Siamese Networks Based on Masked Attention Mechanism for Visual Object Tracking. 28-34 - Raju Shrestha, Hanne Korneliussen:

A Framework for Generating Images and Hashtags for Social Media Posts for Artificial Influencers. 42-48 - Ning Xu, Serhad Doken:

Automatic Visual Citation Generation for Text-to-Image Generation. 49-54 - Ryan Metcalfe, Garth Long, Charlie L. Wang, Iole Moccagatta:

Enhancing Local LLM Performance Through Heterogeneous Multi-Device Computing. 55-60 - Zhenfei Zhang, Tsung-Wei Huang, Guan-Ming Su, Ming-Ching Chang, Xin Li

:
Text-Driven Synchronized Diffusion Video and Audio Talking Head Generation. 61-67 - Haohong Wang, Daniel Smith, Malgorzata Kudelska:

10x Future of Filmmaking Empowered by AIGC. 68-74 - Daniel Kienzle

, Marco Kantonis, Robin Schön, Rainer Lienhart:
Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation. 75-81 - Haiyi Li, Xuejing Lei, Xinyu Wang, C.-C. Jay Kuo:

Green Image Label Transfer. 82-87 - Fei Zhao

, Jiawen Chen, Bin Huang, Chengcui Zhang, Gary Warner, Rushi Chen, Shaorou Tang, Yuanfei Ma, Zixi Nan:
GenCheck: A LoRA-Adapted Multimodal Large Language Model for Check Analysis. 88-94 - Yuwei Chen, Ming-Ching Chang, Xin Li

:
Leveraging Semantic Segmentation for Image Manipulation Detection and Localization. 95-101 - Avinash Anand, Raj Jaiswal, Abhishek Dharmadhikari, Atharva Marathe, Harsh Popat, Harshil Mital, Ashwin R. Nair, Kritarth Prasad, Sidharth Kumar, Astha Verma, Rajiv Ratn Shah, Roger Zimmermann:

GeoVQA: A Comprehensive Multimodal Geometry Dataset for Secondary Education. 102-108 - Debaleen Das Spandan

, Razib Iqbal:
ProxeGraph: Scene Graph Generation Utilizing Proxemics for Smart Homes. 109-115 - Junwen Chen, Yingcheng Wang, Keiji Yanai:

HOI as Embeddings: Advancements of Model Representation Capability in Human-Object Interaction Detection. 116-122 - Sheng-Jhou Lu, Hung-Wei Lee, Yu-Ming Han, Ji-Min Zhou, Ying Liu, Huang-Chia Shih:

Lightweight Schemes Fusion for Heatmap-based Human Pose Estimation. 123-126 - Michael R. Smith, Renee Gooding, Jonathan Bisila, Christina L. Ting:

Anomaly Detection in Video Using Compression. 127-133 - Kratika Bhagtani, Amit Kumar Singh Yadav, Paolo Bestagini, Edward J. Delp:

SSLCT: A Convolutional Transformer for Synthetic Speech Localization. 134-140 - Chun-Han Cheng, Ting-Yu Wei, Homer H. Chen:

Playlist Continuation of Cold-Start Songs. 141-147 - Hung-Jui Guo, Balakrishnan Prabhakaran:

Improved Standard-Based Motion Parallax Measurement in Mixed Reality. 148-154 - Kunal Sawarkar, Abhilasha Mangal, Shivam Raj Solanki:

Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers. 155-161 - Jacob Edward Galajda, Kien A. Hua:

Automated Thematic Composer Classification Using Segment Retrieval. 162-168 - Wen-Shiang Li, Yao-Cheng Lu, Wen-Kai Hsiao, Yu-Yao Tseng, Ming-Hung Wang:

DRM-SN: Detecting Reused Multimedia Content on Social Networks. 169-175 - Xiang Fang, Arvind Easwaran, Blaise Genest:

Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection. 176-182 - Chenhan Fu, Guoming Wang, Rongxing Lu, Siliang Tang

:
FastLearn: A Rapid Learning Agent for Chat Models to Acquire Latest Knowledge. 183-189 - Ryan Tan, Thanh Hong-Phuoc, Lei Gao, Randy Tan, Sagarjit Aujla, Adel Mohamed, Ling Guan, Karthikeyan Umapathy, Naimul Mefraz Khan:

Enhancement of Neonatal Lung Pathology Classification Using Multi-view Feature Representation. 190-195 - Junyu Chen, Jie An, Hanjia Lyu

, Christopher Kanan, Jiebo Luo
:
Holistic Visual-Textual Sentiment Analysis with Prior Models. 196-202 - Jashia Mitayeegiri, Shaohua Dong, Chenxi Qiu, Qing Yang, Xinrong Li, Heng Fan, Yan Huang:

Radio Map Estimation (RME) with Deep Progressive Network. 203-206 - Soheil Hor, Mostafa El-Khamy, Yanlin Zhou, Amin Arbabian, SukHwan Lim:

CM-ASAP: Cross-Modality Adaptive Sensing and Perception for Efficient Hand Gesture Recognition. 207-213 - Yu-Szu Wei, Yuan-Chun Sun, Shin-Yi Zheng, Hsun-Fu Hsu, Chun-Ying Huang, Cheng-Hsin Hsu:

Mitigating Privacy Threats Without Degrading Visual Quality of VR Applications: Using Re-Identification Attack as a Case Study. 214-220 - Omeed Ashtiani, Meghana Spurthi Maadugundu, Minhas Kamal, Balakrishnan Prabhakaran:

Device-Agnostic Remote Range-of-motion Assessment using Data Abstraction. 221-226 - Franz Louis Cesista, Rui Aguiar, Jason Kim, Paolo Acilo:

Retrieval Augmented Structured Generation: Business Document Information Extraction as Tool Use. 227-230 - Charlie Hsu, Yuan-Chun Sun, Kuan-Yu Lee, Chun-Ying Huang:

Will Neural 3D Object Representations be the Silver Bullet for Improving VR Experience in HMDs? 231-234 - Vijay John, Yasutomo Kawanishi:

Frame-Level Latent Embedding Using Weak Labels for Multi-View Action Recognition. 235-238 - Muhammad Arslan, Muhammad Mubeen, Arslan Akram

, Saadullah Farooq Abbasi
, Muhammad Salman Ali, Muhammad Usman Tariq:
A Deep Features Based Approach Using Modified ResNet50 and Gradient Boosting for Visual Sentiments Classification. 239-242 - Yang Xing, Peixi Liao, Reem AwdhE Alasleh, Vissuta Khampatee, Farshid Alizadeh-Shabdiz:

Dental X-ray Segmentation and Auto Implant Design Based on Convolutional Neural Network. 243-246 - Jie Cai, Yuan Lin, Jiang Li, Jiaming Ding, Ling Ouyang, Chiu Man Ho, Zibo Meng:

Joint HDR Denoising and Fusion on Mobile Devices. 247-252 - Rex Liu, Xin Liu:

MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning. 253-259 - Siddhant Garg, Lijun Zhang, Hui Guan:

Structured Pruning for Multi-Task Deep Neural Networks. 260-266 - Ting Yu Tsai, Li Lin, Shu Hu, Ming-Ching Chang, Hongtu Zhu, Xin Wang:

UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation. 267-273 - Mohammad Abu-Shaira, Weishi Shi:

Unveiling Statistical Significance of Online Regression Over Multiple Datasets. 274-279 - Minghao Li, Junjie Qiu, Weishi Shi:

Macro-AUC-Driven Active Learning Strategy for Multi-Label Classification Enhancement. 280-286 - Dae Yeol Lee, Geonsun Lee, Guan-Ming Su:

Viewing Comfort Enhancement on Head-Mounted Displays Using Stereo Disparity Control. 287-293 - Avinash Anand, Avni Mittal, Laavanaya Dhawan, Juhi Krishnamurthy, Mahisha Ramesh, Naman Lal, Astha Verma, Pijush Bhuyan, Himani, Rajiv Ratn Shah, Roger Zimmermann, Shin'ichi Satoh:

ExCEDA: Unlocking Attention Paradigms in Extended Duration E-Classrooms by Leveraging Attention-Mechanism Models. 301-307 - Avinash Anand, Sarthak Jain, Shashank Sharma, Akhil P. Dominic, Aman Gupta, Astha Verma, Raj Jaiswal, Naman Lal, Rajiv Ratn Shah, Roger Zimmermann:

Pulse of the Crowd: Quantifying Crowd Energy through Audio and Video Analysis. 308-314 - Yiwei Han, Kaiyi Qi, Jiebo Luo

:
Plastic Surgery Image Classification and Generation. 315-320 - Rui Deng, Tianpei Gu:

CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration. 328-334 - Chih-Chung Hsu

, Wei-Hao Huang, Wen-Hai Tseng, Ming-Hsuan Wu, Ren-Jung Xu, Chia-Ming Lee:
OmniDet: Omnidirectional Object Detection via Fisheye Camera Adaptation. 335-341 - Most Husne Jahan, Abdelhak Bentaleb:

GESA: Exploring Loss-based Adversarial Attacks in Volumetric Media Streaming. 342-348 - Omkar N. Kulkarni, Aryan Mishra, Shashank Arora, Vivek K. Singh, Pradeep K. Atrey:

LivePics-24: A Multi-person, Multi-camera, Multi-settings Live Photos Dataset. 349-354 - Abhineet Kumar Pandey, Ming-Ching Chang, Xin Li

:
TextSleuth: A New Dataset and Baseline for Scene Text Manipulation Detection. 362-368 - Md. Atik Ahamed

, Qiang Shawn Cheng:
MambaTab: A Plug-and-Play Model for Learning Tabular Data. 369-375 - Bishwa Karki, Chun-Hua Tsai, Pei-Chi Huang, Xin Zhong:

Deep Learning-based Text-in-Image Watermarking. 376-382 - Haoran Tong, Xu Cui, Laiyun Qing:

Single-frame Supervised Action Temporal Localization Based on Multi-view Contrastive Learning. 383-389 - Hadi Hadizadeh, S. Faegheh Yeganli, Bahador Rashidi, Ivan V. Bajic:

Mutual Information Analysis in Multimodal Learning Systems. 390-395 - Ling Guan, Lei Gao, Kai Liu, Zheng Guo:

Mathematics-Inspired Learning: A Green Learning Model with Interpretable Properties. 396-402 - Tejas Duseja, K. M. Annervaz, Jeevithiesh Duggani, Shyam Zacharia, Michael Free, Ambedkar Dukkipati:

Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers. 403-409 - Wala Elsharif, Marco Agus, Mahmood Alzubaidi, James She:

Cultural Relevance Index: Measuring Cultural Relevance in AI-Generated Images. 410-416 - Fei Zhao

, Chengcui Zhang:
Parameter-Efficient Adaptation of Foundation Models for Damaged Building Assessment. 417-422 - Shengtai Ju, Amy R. Reibman:

Exploring the Impact of Hand Pose and Shadow on Hand-Washing Action Recognition. 423-429 - Prasun Datta

, Chau-Wai Wong
, Min Wu:
Enabling Paper-Based Surface Authentication via Digital Twin and Experimental Verification. 430-438 - Yan Ju, Chengzhe Sun, Shan Jia, Shuwei Hou, Zhaofeng Si, Soumyya Kanti Datta

, Lipeng Ke, Riky Zhou, Anita Nikolich, Siwei Lyu:
DeepFake-o-meter v2.0: An Open Platform for DeepFake Detection. 439-445 - Vikram Patil, Sharmilee Rajkumar Rajan, Pradeep K. Atrey:

GeoSecure-B: A Method for Secure Bearing Calculation. 446-451 - Narendra Kumar

, Gaurav Bhatnagar:
Clearing Text Images: A Non-blind Deblurring with Convex Total Variation Regularization Model. 452-457 - Craig Rainey, Min Chen:

Algorithmic Stock Trading Strategies. 458-464 - Xiaoqiong Liu, Yunhe Feng, Shu Hu, Xiaohui Yuan, Heng Fan:

Benchmarking the Robustness of UAV Tracking Against Common Corruptions. 465-470 - Gowtham Medisetti

, Zacchaeus Compson, Heng Fan, Huaxiao Yang, Yunhe Feng:
LitAI: Enhancing Multimodal Literature Understanding and Mining with Generative AI. 471-476 - Beitong Tian, Mingyuan Wu, Ruixiao Zhang, Haozhen Zheng, Bo Chen, Yaohui Wang, Shiv Trivedi, Shanbo Zhang, Robert Bruce Kaufman, Leah Espenhahn, Gianni Pezzarossi, Mauro Sardela, John Dallesasse, Klara Nahrstedt:

GaugeTracker: AI - Powered Cost-Effective Analog Gauge Monitoring System. 477-483 - Md. Abdullah Al Forhad

, Weishi Shi:
Balancing Explanations and Adaptation in Offline Continual Learning Systems Using Active Augmented Reply. 484-490 - Shijun Liang, Dongdong Fu:

Controllable Universal Edge-Preserving Image Filtering. 491-494 - Nguyen Gia Bach, Chanh Minh Tran, Eiji Kamioka, Phan Xuan Tan:

Attenuation-Aware Weighted Optical Flow with Medium Transmission Map for Learning-Based Visual Odometry in Underwater Terrain. 495-498 - Shanker Ram, Sambhu Ganesan, Yajat Nagaraj Kiran:

Harmful Brain Activity Classification of Spectrograms with Transfer Deep Learning. 499-502 - Vadim Abronin, Aleksei Naumov, Denis Mazur, Dmitriy Bystrov, Katerina Tsarova, Artem Melnikov, Sergey Dolgov, Reuben Brasher, Michael Perelshtein:

TQCompressor: Improving Tensor Decomposition Methods in Neural Networks Via Permutations. 503-506 - Fei Zhao

, Chengcui Zhang, Maya Shah, Nitesh Saxena:
BubbleSig: Same-Hand Ballot Stuffing Detection. 507-510 - Sushmita Chandel, Preeti Dwivedi, Gaurav Bhatnagar, Marcin Kowalski

:
Towards a Novel Blob Detection Approach for Concealed Object Detection in Passive Terahertz Imaging. 511-514 - Andrea Caruso

, Giovanni Schembra:
A VR 360°-Video Encoding Framework with Differentiated Tile Compression Based on Digital-Twin Technology. 515-521 - Katsuaki Nakano, Michael Zuzak, Cory E. Merkel, Alexander C. Loui:

Trustworthy and Robust Machine Learning for Multimedia: Challenges and Perspectives. 522-528 - Mohit Prabhushankar, Ghassan AlRegib:

Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks. 529-535 - Zachary McBride Lazri, Dae Yeol Lee, Guan-Ming Su:

A Framework for Single-View Multi-Plane Image Inpainting. 536-541 - Qingyang Zhou, Jiawei Yu, Shan Liu, C.-C. Jay Kuo:

GPSR: A Green Point Cloud Surface Reconstruction Method. 542-548 - Edward Y. Chang:

Behavioral Emotion Analysis Model for Large Language Models. 549-556 - Chih-Chung Hsu

, Chia-Ming Lee:
MISS: Memory-efficient Instance Segmentation for Sport-Scenes with Visual Inductive Priors. 557-561 - Ming-Wen Kuan, Wei-Yang Lin, Chia-Ling Tsai, Shih-Jen Chen, Paisan Ruamviboonsuk, Dong-Jie Jiang:

Simultaneous Classification and Segmentation of Subretinal Lesions on ICGA Images. 562-565 - Chen-Wei Wang, Hwai-Jung Hsu:

Automatic Clipping and Text Logging for Baseball Game Videos Using Deep Learning. 566-571 - Alnur Alimanov, Md Baharul Islam

:
Advancing Retinal Image Segmentation: A Denoising Diffusion Probabilistic Model Perspective. 572-578 - Kaixuan Li, Wei-bang Chen, Yongjin Lu, Xiaoliang Wang, He Gao:

Automated Recognition of Optic Disc and Blood Vessels in Diabetic Fundoscopy Images Using Real-Time Image Analysis. 579-585 - Li Lin, Yamini Sri Krubha, Zhenhuan Yang, Cheng Ren, Thuc Duy Le

, Irene Amerini, Xin Wang, Shu Hu:
Robust COVID-19 Detection in CT Images with CLIP. 586-592 - Aparna Tiwari, Hitika Tiwari, K. S. Venkatesh, Anuj Kumar Sharma:

Enhancing Video Stability with Object-Centric Stabilization. 593-599 - Mohamed Benkedadra, Dany Rimez, Tiffanie Godelaine, Natarajan Chidambaram, Hamed Razavi Khosroshahi, Horacio Tellez, Matei Mancas, Benoît Macq, Sidi Ahmed Mahmoudi:

CIA: Controllable Image Augmentation Framework Based on Stable Diffusion. 600-606 - Li Lin, Sarah Papabathini, Xin Wang, Shu Hu:

Robust Light-Weight Facial Affective Behavior Recognition with CLIP. 607-611 - Dingzong Zhang, Khushi Jain, Priyanka Singh

:
Guarding Against ChatGPT Threats: Identifying and Addressing Vulnerabilities. 612-615 - Fayadh Alenezi:

Advection-Diffusion for Feature-based Cancer Diagnosis. 616-621 - Quoc Hoan Vu, Priyanka Singh

:
Exploiting Correlation Between Facial Action Units for Detecting Deepfake Videos. 622-625 - Luoxu Jin, Hiroshi Watanabe

:
Perceptual Image Compression via Stable Diffusion at Low Bitrate. 626-629 - Benny Jörg Stein

, Niklas Beck, Daniel Becker, Dennis Wegener:
Building a Generative AI Showroom for Foundation Models with Different Modalities. 630-633 - Omkar N. Kulkarni, Thomas Lloyd-Jones

, My Tran, Gregory Vincent, Vivek K. Singh, Pradeep K. Atrey:
Where You Look Matters in Group Photos: A Demo of GARGI iOS App. 634-637 - Dominic Baker, Wei-bang Chen, He Gao:

Early Alzheimer's Detection: The Promise of AI-Powered MRI Analysis. 638-641 - He Gao, Wei-Bang Chen:

ProSchedule: A Comprehensive Mobile Solution for Seamless Academic Scheduling. 642-645 - Hieu Hanh Le, Yuki Yasumitsu, Ryosuke Matsuo, Tomoyoshi Yamazaki, Haruo Yokota:

A Clustering-based Sequence Variants Analysis Method for Electronic Medical Records of Multimedical Institutions. 653-659 - Khushi Jain, Priyanka Singh

, Xue Li
:
Privacy-Preserving Disease Prediction with Secure Data Deduplication on Untrusted Cloud Servers. 660-666 - Chih-Yuan Li, Jun-Ting Wu, Chan Hsu, Ming-Yen Lin, Yihuang Kang:

Understanding eGFR Trajectories and Kidney Function Decline via Large Multimodal Models. 667-673 - Sukhan Lee, Soojin Lee, Yaejin Lee:

Self-Monitoring the Mental-Health State of a Focused Population with Multiple Self-Questionnaires and Sentiment Descriptions. 674-680 - Nisha Daga, George Kodimattam Joseph:

Big Data and Bigger Dilemmas: Ethical Concerns of Data in Healthcare. 681-684 - Vishakha Pareek, Shreyansh Sharma

, Vibhor Singh, Shashwat Singh:
Patient 3D Data Visualisation with AR-based Interactive Technology for Brain MRI. 685-690

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














