- Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining
- DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds
- MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
- Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
- Improving the accuracy of FEM simulations of time-domain inductive sensors through separation of secondary field effects
- Biomechanical Effects of Arm Endpoint Stiffness During Three-Dimensional Isometric Force Maintenance
- TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting
- A Unidirectional Current-Fed DC-DC Converter with Low Cost and High Efficiency for EV Systems
- dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis
- CareerAlly: An Intelligent NLP-Driven Chatbot
- Motion Planning and Tracking MPC for Multiagent Systems: A Dynamic Affine Formation Approach
- Autoregressive Distillation of Diffusion Transformers
- AI and Digital Twin Applications in Building Energy Management: A State-of-the-Art Review
- Active Neutral Point Clamped Voltage Source Inverter fed Five Phase Induction Motor Drive Using Carrier Based PWM
- Physics-informed Anomaly Detection for Unmanned Aerial Vehicles
- Improving Object Counting Accuracy with Adaptive CNN Models and Meta-Level Routing
- Emotion Detection using Voice Analysis Utilising EfficientNet and BiLSTM
- HandOS: 3D Hand Reconstruction in One Stage
- Research on Lightweight Adaptive Median Image Filtering Based on Row-Column Separation Strategy
- DiSRT-In-Bed: Diffusion-Based Sim-To-Real Transfer Framework for In-Bed Human Mesh Recovery
- Embodied Scene Understanding for Vision Language Models via MetaVQA
- LUCAS: Layered Universal Codec Avatars
- Towards a Fast and Generalizable Neural Inference Scheme for Tabular Data
- FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation
- LoCoRe: Image Re-Ranking with Long-Context Sequence Modeling
- AKiRa: Augmentation Kit on Rays for optical video generation
- MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
- IoT based Prior Detection and Alert System for Landslide
- Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
- Video Summarization using 3D CNNs: A Convolutional Approach to Spatial-Temporal Feature Extraction
- Towards Generalizable Trajectory Prediction using Dual-Level Representation Learning and Adaptive Prompting
- Model Predictive Control based Adaptive Phase Shift Modulation for Neutral Point Clamped Dual Active Bridge Converter System
- SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
- LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
- Mimir: Improving Video Diffusion Models for Precise Text Understanding
- Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
- Enhancing Healthcare Data Integrity and Access Control Using Blockchain and Industry 5.0
- Triple Switch Flexible Step-Up Converter for Fuel Cell Electric Vehicle
- V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
- Event fields: Capturing light fields at high speed, resolution, and dynamic range
- DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
- Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment
- SnapGen: Taming High-Resolution Text-To-Image Models for Mobile Devices with Efficient Architectures and Training
- Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
- Adaptive Rectangular Convolution for Remote Sensing Pansharpening
- GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction
- Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
- Towards Tactile Communication of English Language: A Visual Handbook Enhances Letter Learning
- MambaOut: Do We Really Need Mamba for Vision?*
- FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion
- TransPixeler: Advancing Text-to-Video Generation with Transparency
- Video Summarization with Large Language Models
- Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising
- Adaptive Mixture-of-Experts Distillation for Cross-Satellite Generalizable Incremental Remote Sensing Scene Classification
- Rethinking Correspondence-based Category-Level Object Pose Estimation
- Disentangling Safe and Unsafe Image Corruptions via Anisotropy and Locality
- Robust Multi-Object 4D Generation for In-the-wild Videos
- Three/Single-Phase Switchable DAB Matrix Converter and Active Power Decoupling Method with Center-Tapped Transformer
- MoACNN-XGNet: Interpretable Multi-Omics Convolutional Network for Breast Cancer Subtyping and Prognostic Genes Identification
- Pathology-Guided AI System for Accurate Segmentation and Diagnosis of Cervical Spondylosis
- EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection
- Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance
- UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models
- CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
- EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
- Study on Throughput Testing and Optimization of Ceph File System for ARM Platforms
- Language-Guided Image Tokenization for Generation
- Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
- RAD: Region-Aware Diffusion Models for Image Inpainting
- Energy-Efficient Embedded Camera Trap with Sensor Fusion Triggering for Predator Monitoring in Grazing Areas
- A 20-Year Retrospective on Power and Thermal Modeling and Management
- One2Any: One-Reference 6D Pose Estimation for Any Object
- DiffFNO: Diffusion Fourier Neural Operator
- DynaMoDe-NeRF: Motion-aware Deblurring Neural Radiance Field for Dynamic Scenes
- A Comprehensive Approach of LCL Filter Design for High Switching Frequency Inverters Tied to Weak Grid
- The Design and Implementation of an Open Laboratory Reservation Mini-Program Based on PHP
- A Structured Tool Landscape for Data-Driven ProductManagernent
- A Study on Predicting Ship Hull Structural Responses in Collisions Based on Machine Learning
- CUBO-to-QUBO Conversion: Reducing Cubic Formulations to Quadratic Formulations
- Design Method of Solder Joint in Surge Protective Device by Simulation
- Research and Design of Triple Modular Redundancy Technology for Processors Oriented to RISC-V Architecture
- Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
- RelationField: Relate Anything in Radiance Fields
- 3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations
- Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
- Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
- Parallel Sequence Modeling via Generalized Spatial Propagation Network
- Learning with Dynamics: Autonomous Regulation of UAV Based Communication Networks with Dynamic UAV Crew
- Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
- ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
- D 3 -Human: Dynamic Disentangled Digital Human from Monocular Video
- The RaspGrade Dataset: Towards Automatic Raspberry Ripeness Grading with Deep Learning
- Spatio-Temporal Graph Neural Network for Fault Diagnosis Modeling of Industrial Robot
- MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
- Tiled Diffusion
- Action Detail Matters: Refining Video Recognition with Local Action Queries
- Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
- MOSFET Noise Source-Based Closed-Form Solution for Mixed-Mode Conducted Emission EMI Noise in a Single-phase PFC
- Identifying Semantic Component for Robust Molecular Property Prediction
- Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos