Zendy | AI-Powered Research Library

AI Assistant Blog

Home ZAIA Blog

Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
A Multi-Mode Sub-Synchronous Damping Controller for Renewable-Thermal-Bundled Power Transmission by LCC-HVDC System
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
Efficient Detection of Relaxed Maximal Cliques in Large-Scale IoT Networks
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
Neural Inverse Rendering from Propagating Light
Image Captioning with Multi-Scale Dilated Attention Mechanism
Design of Transformer Turn-ratio for Maximizing the ZVS Region of Dual Active Bridge Converter
A Single-Stage Admittance Control Network Based Misalignment Tolerant Inductive Power Transfer System for EV Application
Question-Aware Gaussian Experts for Audio-Visual Question Answering
CrossOver: 3D Scene Cross-Modal Alignment
MambaIC: State Space Models for High-Performance Learned Image Compression
A Cross-Residual BP Decoding Algorithm for Dual-Polar JSCC
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather
Deep Fair Multi-View Clustering with Attention KAN
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis
Foundations of the Theory of Performance-Based Ranking
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
An Improved RFR Method for Enhancing Large-Signal Stability of Grid-Following Inverter Under Weak and Faulty Grid Conditions
Dynamic Estimation of Mental Workload and Operator Accuracy for Time-Constrained Binary Classification Tasks
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
IndoorGS: Geometric Cues Guided Gaussian Splatting for Indoor Scene Reconstruction
A Comparative Analysis of Challenges in Wireless Sensor Networks using Machine Learning Algorithms
Towards Scalable Human-aligned Benchmark for Text-guided Image Editing
DNF: Unconditional 4D Generation with Dictionary-based Neural Fields
Statistical Model Limitations of Ground Flash Density for Lightning Risk Assessment
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Continuous 3D Perception Model with Persistent State
Make them Socialites: Supporting Social Entrepreneurs
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction
NTFR: A Network Traffic Feature Reduction Method Based on Relational Analysis
Automated Calculation of Algorithm Statement Execution Frequency Based on Abstract Syntax Tree
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models
TCFG: Tangential Damping Classifier-free Guidance
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
MotionBench: Benchmarking and Improving Fine-Grained Video Motion Understanding for Vision Language Models
AD-LDB: A Modality-Incomplete Learning Model for Alzheimer's Disease Diagnosis
UAV-Relay-Aided Secure Maritime Networks Coexisting with Satellite Networks: Robust Beamforming and Trajectory Optimization
GenVDM: Generating Vector Displacement Maps From a Single Image
Statistical Characteristics of Cloud-to-Ground Lightning Across Different Regions in China
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
Methodology for GPU Frequency Switching Latency Measurement
Multi-Modal Aerial-Ground Cross-View Place Recognition with Neural ODEs
Parameterized Programming and Simulation Processing of Cam Curve in Ship Valve System
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Remaining Useful Life Prediction of Bearings under Complex Operating Conditions: A DAENet-XLSTM Transfer Learning Model
High-Voltage Pulse Modulator for S-Band Klystron in IR-FEL RF System
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
CoLLM: A Large Language Model for Composed Image Retrieval
CGMatch: A Different Perspective of Semi-supervised Learning
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
A Recursive Approach to Representation in Hilbert Spaces of Increasing Dimension: Applications to Quantum-centric HPC tool development
Everything to the Synthetic: Diffusion-Driven Test-Time Adaptation via Synthetic-Domain Alignment
Research on Topological Layout Algorithm of Mine Ventilation Network Diagram
RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts
Cancer Survival Prognosis From Whole Slide Images Using Hopfield Network
Three-view Focal Length Recovery From Homographies
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
SEEN-DA: SEmantic ENtropy guided Domain-aware Attention for Domain Adaptive Object Detection
Design Optimization of Synchronous Reluctance Motor for Electric Two Wheeler Application
MaDCoW: Marginal Distortion Correction for Wide-Angle Photography with Arbitrary Objects
SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection
An Intelligent Prediction Method for Safety Margins of Flexible Thermal Power Units Based on PipeLine Creep Life Damage
Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
Bridging Viewpoint Gaps: Geometric Reasoning Boosts Semantic Correspondence
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
Interleaved-Modal Chain-of-Thought
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion
Methodology for Business Value Analysis of Innovative IT in a Business Sector. The Case of the Material Supply Chain
The Application Progress of Power Batteries in New Energy Ships
COFFEE: Mitigating Hallucination in LVLMs via COllaborative Filtering for Enhanced Eyes
LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model
Dual Prompting Image Restoration with Diffusion Transformers
LogiCzsl: Exploring Logic-induced Representation for Compositional Zero-shot Learning
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Research on the MC/DC Test on Civil Aircraft Software Robust Requirements Based on DO-178C
Quantization without Tears
Artificial Intelligence Adoption, Enterprise Capabilities and Performance
Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields
A mixed-precision quantum-classical algorithm for solving linear systems
Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection
Design and Analysis of a PSFB Current Doubler for VRFB: Impact of Magnetic Components and Snubber Circuit Requirements
SEC-Prompt:SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning
Differentially-Fed Harmonic RFID Tag for Multi-Tag Detection
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
A Low-Profile Shared-Aperture Antenna Using Electromagnetic Transparent Structure and AMC
Analytical Study on Fault-Tolerant Control of Five-Phase Induction Motor Drive
AIpparel: A Multimodal Foundation Model for Digital Garments