- Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
- A Multi-Mode Sub-Synchronous Damping Controller for Renewable-Thermal-Bundled Power Transmission by LCC-HVDC System
- HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
- AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
- Efficient Detection of Relaxed Maximal Cliques in Large-Scale IoT Networks
- Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
- Neural Inverse Rendering from Propagating Light
- Image Captioning with Multi-Scale Dilated Attention Mechanism
- Design of Transformer Turn-ratio for Maximizing the ZVS Region of Dual Active Bridge Converter
- A Single-Stage Admittance Control Network Based Misalignment Tolerant Inductive Power Transfer System for EV Application
- Question-Aware Gaussian Experts for Audio-Visual Question Answering
- CrossOver: 3D Scene Cross-Modal Alignment
- MambaIC: State Space Models for High-Performance Learned Image Compression
- A Cross-Residual BP Decoding Algorithm for Dual-Polar JSCC
- SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
- Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather
- Deep Fair Multi-View Clustering with Attention KAN
- HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis
- Foundations of the Theory of Performance-Based Ranking
- Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
- An Improved RFR Method for Enhancing Large-Signal Stability of Grid-Following Inverter Under Weak and Faulty Grid Conditions
- Dynamic Estimation of Mental Workload and Operator Accuracy for Time-Constrained Binary Classification Tasks
- SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
- IndoorGS: Geometric Cues Guided Gaussian Splatting for Indoor Scene Reconstruction
- A Comparative Analysis of Challenges in Wireless Sensor Networks using Machine Learning Algorithms
- Towards Scalable Human-aligned Benchmark for Text-guided Image Editing
- DNF: Unconditional 4D Generation with Dictionary-based Neural Fields
- Statistical Model Limitations of Ground Flash Density for Lightning Risk Assessment
- 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
- Continuous 3D Perception Model with Persistent State
- Make them Socialites: Supporting Social Entrepreneurs
- Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
- Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
- Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction
- NTFR: A Network Traffic Feature Reduction Method Based on Relational Analysis
- Automated Calculation of Algorithm Statement Execution Frequency Based on Abstract Syntax Tree
- Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
- SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models
- TCFG: Tangential Damping Classifier-free Guidance
- TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
- MotionBench: Benchmarking and Improving Fine-Grained Video Motion Understanding for Vision Language Models
- AD-LDB: A Modality-Incomplete Learning Model for Alzheimer's Disease Diagnosis
- UAV-Relay-Aided Secure Maritime Networks Coexisting with Satellite Networks: Robust Beamforming and Trajectory Optimization
- GenVDM: Generating Vector Displacement Maps From a Single Image
- Statistical Characteristics of Cloud-to-Ground Lightning Across Different Regions in China
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
- Methodology for GPU Frequency Switching Latency Measurement
- Multi-Modal Aerial-Ground Cross-View Place Recognition with Neural ODEs
- Parameterized Programming and Simulation Processing of Cam Curve in Ship Valve System
- Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
- Remaining Useful Life Prediction of Bearings under Complex Operating Conditions: A DAENet-XLSTM Transfer Learning Model
- High-Voltage Pulse Modulator for S-Band Klystron in IR-FEL RF System
- StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
- CoLLM: A Large Language Model for Composed Image Retrieval
- CGMatch: A Different Perspective of Semi-supervised Learning
- Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
- A Recursive Approach to Representation in Hilbert Spaces of Increasing Dimension: Applications to Quantum-centric HPC tool development
- Everything to the Synthetic: Diffusion-Driven Test-Time Adaptation via Synthetic-Domain Alignment
- Research on Topological Layout Algorithm of Mine Ventilation Network Diagram
- RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
- ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts
- Cancer Survival Prognosis From Whole Slide Images Using Hopfield Network
- Three-view Focal Length Recovery From Homographies
- Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
- SEEN-DA: SEmantic ENtropy guided Domain-aware Attention for Domain Adaptive Object Detection
- Design Optimization of Synchronous Reluctance Motor for Electric Two Wheeler Application
- MaDCoW: Marginal Distortion Correction for Wide-Angle Photography with Arbitrary Objects
- SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection
- An Intelligent Prediction Method for Safety Margins of Flexible Thermal Power Units Based on PipeLine Creep Life Damage
- Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information
- RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
- From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
- Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
- Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
- VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
- Bridging Viewpoint Gaps: Geometric Reasoning Boosts Semantic Correspondence
- ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
- Interleaved-Modal Chain-of-Thought
- ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
- Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion
- Methodology for Business Value Analysis of Innovative IT in a Business Sector. The Case of the Material Supply Chain
- The Application Progress of Power Batteries in New Energy Ships
- COFFEE: Mitigating Hallucination in LVLMs via COllaborative Filtering for Enhanced Eyes
- LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model
- Dual Prompting Image Restoration with Diffusion Transformers
- LogiCzsl: Exploring Logic-induced Representation for Compositional Zero-shot Learning
- VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
- Research on the MC/DC Test on Civil Aircraft Software Robust Requirements Based on DO-178C
- Quantization without Tears
- Artificial Intelligence Adoption, Enterprise Capabilities and Performance
- Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields
- A mixed-precision quantum-classical algorithm for solving linear systems
- Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection
- Design and Analysis of a PSFB Current Doubler for VRFB: Impact of Magnetic Components and Snubber Circuit Requirements
- SEC-Prompt:SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning
- Differentially-Fed Harmonic RFID Tag for Multi-Tag Detection
- Object-aware Sound Source Localization via Audio-Visual Scene Understanding
- A Low-Profile Shared-Aperture Antenna Using Electromagnetic Transparent Structure and AMC
- Analytical Study on Fault-Tolerant Control of Five-Phase Induction Motor Drive
- AIpparel: A Multimodal Foundation Model for Digital Garments