- ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
- Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
- The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-To-Video Generation
- SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
- Towards Smart Point-and-Shoot Photography
- MMRL: Multi-Modal Representation Learning for Vision-Language Models
- NoT: Federated Unlearning via Weight Negation
- TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification
- T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
- MetricGrids: Arbitrary Nonlinear Approximation with Elementary Metric Grids based Implicit Neural Representation
- KAC: Kolmogorov-Arnold Classifier for Continual Learning
- DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
- Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond
- Wearable System for Elderly People Monitoring in Multi-Resident Scenario
- Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
- Style Quantization for Data-Efficient GAN Training
- The design of a multi-parameter intelligent monitoring system for marine ranching based on cloud platforms
- RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
- Intelligent Coordination System for Autonomous Domestic Heating: An AI-Driven Test-Bench
- SEFR: A Mashup Recommendation Approach for Crossover Service Convergence
- Beam Squint Effect: a Friend or a Foe in Physical Layer Authentication for RIS-assisted Systems?
- Enhancing Graph Transformer Training through Adaptive Graph Parallelism
- Edge-Computing Framework for Human-Robot Collaboration in Industry 5.0: Enhancing Operator Well-Being and Efficiency in Manufacturing
- A Novel Assessment and Optimization Method of 6G Distributed Network Topology Resilience Based on Groupwise Collaborative Algorithm
- Energy-Efficient Aerial Base Station Enabled MBSFN: A Multi-Agent Reinforcement Learning Approach
- Unsupervised Transfer NLOS Identification Model Based on Adversarial Discriminative Domain Adaptation in UWB Positioning
- Segmenting Maxillofacial Structures in CBCT Volumes
- StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
- Improving energy efficiency of HPC applications using unbalanced GPU power capping
- Global-Edge Dual-Path Semantic Image Segmentation for Transparent Objects
- A Network of Influence: Agency and Stakeholder Relationships in Sustainability Strategy Implementation
- Analysis of Project Management Models: An Investigation of Structure-, Process- and Function-Oriented Elements for the Tailoring of Project Design
- PreciseCam: Precise Camera Control for Text-to-Image Generation
- A Hybrid Technique for Detecting Cyber Threats Through Network Traffic Analysis
- A Sophisticated Authentication Coupled with A Multi Modal Security Integration and Role based Access Control for Secure File Management
- RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
- Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
- PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter
- Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
- DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
- Research on Power-Computing Coordinated Scheduling Based on Dual Demand Coupling and Multi-Agent Learning
- Design and Verification of SiC Amplifiers for Extreme Temperature Applications Based on ANN Modeling of 4H-SiC MOSFETs
- BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
- PAVE: Patching and Adapting Video Large Language Models
- Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
- GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
- β-FFT: Nonlinear Interpolation and Differentiated Training Strategies for Semi-Supervised Medical Image Segmentation
- AI-based Device for Fall Impact Reduction in Elder People
- Microwave Based Non-Invasive Blood Glucose Sensors: Key Design Parameters and Case-Informed Evaluation
- Following Is All You Need: Robot Crowd Navigation Using People As Planners
- TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
- A Hybrid CNN-LSTM-Transformer Model for IoT Networks Anomaly Detection
- A Reduced Switch Series Topology 15-level and 27level Multilevel Inverter
- Assessing the Environmental Impact of IoT Devices - Hotspots and Guidelines for a Better Understanding
- Multi-party Collaborative Attention Control for Image Customization
- Olympus: A Universal Task Router for Computer Vision Tasks
- Universal Actions for Enhanced Embodied Foundation Models
- Enhanced YOLOv5 for Human Hand Recognition
- Multivariate Template Attack against NTT-based Polynomial Multiplication of Dilithium
- Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
- PolarFree: Polarization-based Reflection-Free Imaging
- Identity-Preserving Text-To-Video Generation by Frequency Decomposition
- Zero-Shot RGB-D Point Cloud Registration with Pre-Trained Large Vision Model
- Lightweight Cryptography and IDS for Edge Networks
- Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning
- Motion Prompting: Controlling Video Generation with Motion Trajectories
- FIFA: Fine-grained Inter-frame Attention for Driver’s Video Gaze Estimation
- Data Analysis of Lightning Activity in High-Precision Lightning Monitoring System of Yunnan Distribution Network
- High-Resolution Raman-OTDR Distributed Temperature Sensors Based on Fast-Non Local Means Denoising Algorithm
- Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy
- Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model
- Detection of Proton Irradiation Damage in 4H-SiC Schottky Diodes via Electrically Detected Magnetic Resonance and Near-Zero-Field Magnetoresistance
- Boosting UNet Performance Via VGG-Based Encoder for Medical Image Segmentation
- Parallel Processing for Distributed Machine Learning: A Taxonomy of Techniques and Associated Security Risks
- Towards Generalizable Scene Change Detection
- DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI
- Research on Energy Control Strategies for a Type Ship Microgrid
- Evaluating Vision-Language Models as Evaluators in Path Planning
- HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition
- Collaborative Localization with Multiple Intelligent Reflecting Surfaces: Framework and Algorithm
- VideoGigaGAN: Towards Detail-rich Video Super-Resolution
- Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
- Multi-objective energy management of virtual power plant with electric vehicle parking lots, and carbon capture and storage facility
- Multi-Scale Citrus Chlorophyll Transfer Learning Based on Convolutional Neural Network
- Handling Spatial-Temporal Data Heterogeneity for Federated Continual Learning via Tail Anchor
- Switching Frequency Prediction Control for FB/HB Morphed LLC DC-DC Converter with Online-Offline Balance
- Few-Shot Implicit Function Generation via Equivariance
- Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
- Damage Mechanism Analysis of ZnO Varistor Under Multi-Pulse Lightning Current Experiment
- Accurate Differential Operators for Hybrid Neural Fields
- Zero-Shot Styled Text Image Generation, but Make It Autoregressive
- EduPar 2025 Posters
- Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
- Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
- EASEMVC: Efficient Dual Selection Mechanism for Deep Multi-View Clustering
- VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging
- Deep Learning-Driven Vulnerability Detection Models for Software Security
- Diagnosis of Retinal Disorder by using Deep Learning Algorithm
- Chebyshev Attention Depth Permutation Texture Network with Latent Texture Attribute Loss
- CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation