- MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output
- Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality
- Heartify: A Federated Learning-Based Application for Early Heart Disease Detection
- Paint by Inpaint: Learning to Add Image Objects by Removing Them First
- Automated Parking Trajectory Generation Using Deep Reinforcement Learning
- Energy Flexibility Optimization in Industry: A Hybrid Approach with Synthetic Data Evaluation
- Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
- Joint Scheduling of Causal Prompts and Tasks for Multi-Task Learning
- Series Active Filtering Technique for L-type Single-Phase Bridge Inverter
- MEGA: Masked Generative Autoencoder for Human Mesh Recovery
- Research on Information Fusion Algorithm for Wireless Sensor Homogeneous Networks
- Dual Exposure Stereo for Extended Dynamic Range 3D Imaging
- One-Step Event-Driven High-Speed Autofocus
- Spectral Informed Mamba for Robust Point Cloud Processing
- Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
- Three-level Boost integrated Five-level Active Neutral Point Clamped Inverter for improved DC-link utilisation
- Multi-Sensor System for Optimum Irrigation and Plant Disease Detection Using Multilayer Perceptron Model on Mango Plant
- GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
- Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
- Rapid Random Packing of Poly-disperse Spheres using Adam Stochastic Optimization
- A Time-Domain Integration Comparison Scheme With Noise Immunity for Wake-Up Receivers
- GenFusion: Closing the Loop between Reconstruction and Generation via Videos
- What Makes a Good Dataset for Knowledge Distillationƒ
- VBSF: Vulnerability Behavior Scanning Framework for Intelligent Autonomous Transport Systems
- Precision Medicine in Cardiology: ML-based Heart Disease Prediction
- DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
- A Systematic Literature Review of Innovations, Challenges, and Future Directions in Telemonitoring and Wearable Health Technologies
- Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
- 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
- PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
- A Modified Carrier-Based PWM with High DC Voltage Utilization for Three-Level Inverters with Unbalanced Neutral-Point Voltage
- Focal Split: Untethered Snapshot Depth from Differential Defocus
- Foveated Instance Segmentation
- Soil Image De-Noising using Hyper Wavelet Double Window Median Filter (HWDWM)
- Adaptive Neural Optimal Backstepping Control for Heterogeneous Multi-Agent Systems With Non-Cooperative Target via Identifier-Critic-Actor Algorithm
- Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
- MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots
- Query Efficient Black-Box Visual Prompting with Subspace Learning
- Dense Match Summarization for Faster Two-view Estimation
- Dual Active Bridge-Based Isolated Single-phase Single-Stage DC-link AC/AC Converter with Minimum Charge Storage Requirement
- R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization
- V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection
- DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
- ISL-based Multi-Satellite Collaborative Computation Offloading and Resource Allocation in ISTN
- FIction: 4D Future Interaction Prediction from Video
- Power Factor Correction in Medium Voltage CSR-CSI-fed Drives with VSI-assisted SCR Commutation
- Magma: A Foundation Model for Multimodal AI Agents
- Research on the Architecture of Knowledge-Based Intelligent Command and Control System
- InsTaG: Learning Personalized 3D Talking Head from Few-Second Video
- Camera resection from known line pencils and a radially distorted scanline
- ZeroVO: Visual Odometry with Minimal Assumptions
- An Effective Action Recognition Method Based on Image Coding and a Dual-Channel Fusion Network
- Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
- Decision SpikeFormer: Spike-Driven Transformer for Decision Making
- Fault Cause Analysis and Treatment of Vxworks System Computer Interface Not Updating
- FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing
- DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation
- ABC-Former: Auxiliary Bimodal Cross-domain Transformer with Interactive Channel Attention for White Balance
- Auto Cherry-Picker : Learning from High-quality Generative Data Driven by Language
- Remote System for Monitoring Failures in the Cubicle Type High Voltage Receiving Equipment
- ShowMak3r: Compositional TV Show Reconstruction
- Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection
- 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement
- Design Optimization of a 3kW Bi-Directional Dual Active Bridge Converter for Battery Energy Storage Application
- Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
- An Autonomous Driving Vehicle-Road Collaborative Heterogeneous Fusion Localization Method Based on the Epipolar Plane Model and Graph Optimization
- Implicit Correspondence Learning for Image-to-Point Cloud Registration
- ReWind: Understanding Long Videos with Instructed Learnable Memory
- Enhancing EV Charging Infrastructure with Vanadium Redox Flow Batteries: A Comprehensive Study of Design and Implementation
- SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer
- Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
- DucDiff: Dual-consistent Diffusion for Uncertainty-aware Information Diffusion Prediction
- SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation
- UniAlign: Scaling Multimodal Alignment within One Unified Model
- UBiGTLoc: A Unified BiLSTM-Graph Transformer Localization Framework for IoT Sensor Networks
- A U-Net Framework with Dice Loss for High-Precision Retinal Vessel Segmentation
- FineCaption: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
- Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations
- EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
- MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
- Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement
- Performance and Portability in Multi-GPU Branch-and-Bound: Chapel Versus CUDA and HIP for Tree-Based Optimization
- Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
- On the Capacity of an Asynchronous MIMO System with Oversampling Reception
- Neural Hierarchical Decomposition for Single Image Plant Modeling
- Develop a Versatile ECM Framework Capable of Accurately Representing Multiple Cell Types
- Glossy Object Reconstruction with Cost-effective Polarized Acquisition
- FLAVC: Learned Video Compression with Feature Level Attention
- MangaNinja: Line Art Colorization with Precise Reference Following
- LM-Offload: Performance Model-Guided Generative Inference of Large Language Models with Parallelism Control
- Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
- Predictive Analytics in Endodontics: Machine Learning Approaches for Treatment Success and Failure Prediction
- K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
- OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
- A Hubness Perspective on Representation Learning for Graph-Based Multi-View Clustering
- AvatarArtist: Open-Domain 4D Avatarization
- Guiding Human-Object Interactions with Rich Geometry and Relations
- Efficient Visual State Space Model for Image Deblurring
- Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
- 3D-SLNR: A Super Lightweight Neural Representation for Large-scale 3D Mapping