- Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
- Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
- Security of Dynamically Reconfigurable RISC-V Systems: I/O Attack Focus
- Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable
- Overcoming Shortcut Problem in VLM for Robust Out-of-Distribution Detection
- VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis
- Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
- INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
- ADU: Adaptive Detection of Unknown Categories in Black-Box Domain Adaptation
- Video Language Model Pretraining with Spatio-temporal Masking
- Domain Adaptive Diabetic Retinopathy Grading with Model Absence and Flowing Data
- Energy Efficient Scheduling of AI/ML Workloads on Multi-Instance GPUs with Dynamic Repartitioning
- Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
- Low Latency Depth of Field Fusion System and Method Employing Fpga for Autonomous Driving
- Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal
- AI-Enhanced Detection of Dynamic Structural Changes in Inflammatory Protein Interfaces: A Case Study of CD11b/Mac-1 Interactions
- VisionArena: 230K Real World User-VLM Conversations with Preference Labels
- SCSGuardian: A Practical Hardware Defense against Speculative Cache Side-Channel Attacks
- Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
- A Method for Evaluating a Series Hybrid System Using a DC-Input Direct Electric-Power Converter (D-EPC) in Mode Driving with a Virtual Vehicle Model
- GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
- Input Series Output Parallel Connection based Fault Tolerant LV Power Supply in Automotive Applications
- Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment
- DiffCAM: Data-Driven Saliency Maps by Capturing Feature Differences
- HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
- ML Enabled Parallel R-C Sensor for Level and Electrical Conductivity Measurement
- MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
- FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning
- Mdct-Dpanet: Dual-Path Attention Network for Multi-Channel Speech Separation
- Foggy Target Detection Algorithm Based on CBAM-FE and SPD-Conv
- Relationship Between GDT Follow-Current Phenomenon and Active Gases
- BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions
- Goku: Flow Based Video Generative Foundation Models
- Control of Utility Interfaced PEM Fuel Cell, Solar Energy Conversion System and Battery Storage
- CADDreamer: CAD Object Generation from Single-view Images
- HC-PCL: A Hierarchical Cross-Camera Prototypical Contrastive Learning Framework for Unsupervised Object Re-Identification
- Parallel Scan on Ascend AI Accelerators
- Dynamic Integration of Task-Specific Adapters for Class Incremental Learning
- Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
- Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
- AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning
- A Switchable Transmissive-Reflective Metasurface Unit for Full-Space Continuous Phase Modulation
- Spec-YOLO: An Efficient Deep Network for Spectrogram-Based Signals Identification
- MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning
- Z-Magic: Zero-shot Multiple Attributes Guided Image Creator
- Secure Audio Processing: Facial Encryption with Speech Transcription and Translation
- FoundationStereo: Zero-Shot Stereo Matching
- A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models
- Decoupled Motion Expression Video Segmentation
- Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
- Relation3D: Enhancing Relation Modeling for Point Cloud Instance Segmentation
- VideoGuide: Improving Video Diffusion Models without Training Through a Teacher’s Guide
- Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
- A Self-Resonant Boost Converter IC Supporting Multimode Operation for Wide-Range TEG Energy Harvesting in 28-nm CMOS
- Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
- Accelerating Triangle Counting with Real Processing-in-Memory Systems
- Design and Performance Analysis of Planar Antenna for Ground Penetrating Radar Applications
- Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection
- Enhancing Time-Domain Shielding Effectiveness of Cables Using Metal-Coated Aramid-Fiber Composites
- Leveraging L-Moments to Characterize Traffic Behavior in 4G and 5G Networks
- MotionPro: A Precise Motion Controller for Image-to-Video Generation*
- GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
- Matrix-Free Shared Intrinsics Bundle Adjustment
- EVOS: Efficient Implicit Neural Training via EVOlutionary Selector
- Research and Implementation of an Automatic Secondary Security Verification Method for System Operation Permissions
- HaaS - A Platform for Password Cracking in Distributed Heterogeneous Systems
- DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion
- Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
- h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform
- ROLL: Robust Noisy Pseudo-label Learning for Multi-View Clustering with Noisy Correspondence
- VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction
- Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
- Synchronization and Pinning Control on Circulating Directed Hypergraphs
- ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
- Stability Assessment of a Weak Island System Connected to Two HVDC Links
- Optimization of Fiber Attenuation Prediction Based on GA-CNN-BiLSTM-Attention
- CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
- Open Set Label Shift with Test Time Out-of-Distribution Reference
- Unconditionally Stable Leapfrog Complying Divergence Implicit FDTD Method with Lumped Elements
- DPCT: Efficient High-Resolution Depth Prediction via Cross-Covariance Attention Transformers
- Context-Aware Multimodal Pretraining
- SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
- MaSS13K: A Matting-level Semantic Segmentation Benchmark
- Adaptive Protein Design Protocols and Middleware
- Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation
- Efficient GPU Memory Resource Scheduling Algorithm for Vehicle Detection Tasks in High Concurrent Scenarios
- A Learning Algorithm Based on Similarity Identification and Knowledge Transfer for Dynamic Multi-Objective Optimization
- Frequency-Domain Analysis of Contaminant Effects on Leakage Current and Harmonic Distortion for Transmission Line Diagnostics
- Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
- SketchVideo: Sketch-based Video Generation and Editing
- MotiF: Making Text Count in Image Animation with Motion Focal Loss
- Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models
- Robust Message Embedding via Attention Flow-Based Steganography
- DA-Distill: Dual-Alignment Distillation for Multimodal Knowledge Transfer on Edge Devices
- Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction
- GliaNet: Adaptive Neural Network Structure Learning with Glia-Driven
- Concept Lancet: Image Editing with Compositional Representation Transplant
- High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight
- Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References
- VSNet: Focusing on the Linguistic Characteristics of Sign Language