- Single Receiver Positioning Method Based on TDOA/AOA
- Dark Noise Diffusion: Noise Synthesis for Low-Light Image Denoising
- MADDPG-Based Collaborative Anti-Jamming Strategy for Joint Frequency-Power Allocation in Networked Radars
- ROLL: Robust Noisy Pseudo-label Learning for Multi-View Clustering with Noisy Correspondence
- SketchVideo: Sketch-based Video Generation and Editing
- Revisiting and Extending the Estimation of Parasitic Capacitance in Inductors
- HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion
- MotiF: Making Text Count in Image Animation with Motion Focal Loss
- A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets
- Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
- Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models
- Robust Message Embedding via Attention Flow-Based Steganography
- Toward Efficient Asynchronous Single-Source Shortest Path
- Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation
- Linear Attention Modeling for Learned Image Compression
- Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
- CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation
- RFSFeat: Advanced Feature Extraction and Recognition for Zhuang Brocade
- Condensing Action Segmentation Datasets via Generative Network Inversion
- ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
- Toward Performance Prediction in Large-Scale Systems through Temporal System and Application Log Analysis
- DA-Distill: Dual-Alignment Distillation for Multimodal Knowledge Transfer on Edge Devices
- Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction
- GliaNet: Adaptive Neural Network Structure Learning with Glia-Driven
- Concept Lancet: Image Editing with Compositional Representation Transplant
- Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?
- A Universal Quantum Phase Slip Logic Gate for Implementing Basic Boolean Functions
- ViiNeuS: Volumetric Initialization for Implicit Neural Surface reconstruction of urban scenes with limited image overlap
- MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities
- CCIFE: Channel-Resilient Ensemble Adversarial Attack Against DNN-Based Modulation Classifiers
- MetaCast: Generalizing HPC Application Runtime Prediction
- High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight
- Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
- The Datasets Crawling Based on Search Engine in Minor Fields AI Application
- EventGPT: Event Stream Understanding with Multimodal Large Language Models
- FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
- Can Vision Feel Touch? Tactile-aware Visual Grasping for Transparent Objects
- SF 2 T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
- DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
- EgoLife: Towards Egocentric Life Assistant
- Optimal Design of Planar Transformer for DAB Converters Based on Model-Free Reinforcement Learning
- Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
- VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis
- OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation
- Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling
- Development of a Microwave Interferometer System for Corona Discharge Mapping
- BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
- EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation
- Event-Based Adaptive Fault-Tolerant Control for Nonlinear Cyber-Physical Systems via Intermittent Available Signals
- SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
- Power Balancing Controller Design of Multiple Resonant Converters by PFM and PWM Methods
- Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving
- Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning
- Active Non-Line-Of-Sight Imaging Based on Fusion of Physical Prior and Deep Learning
- Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses
- Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
- SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
- Preconditioners for the Stochastic Training of Neural Fields
- Impact of Image Resolution on Controlling Drones Using Remote VR Headset Visualization and a Cloud Architecture
- TinyFusion: Diffusion Transformers Learned Shallow
- A Self-healing Electrical Impedance Tomography Sensor for the Selective Localization of Compression and Damage Based on a Diels-Alder Conductive Composite
- Shape Evaluation Test of Diverter Strip for Lightning Protection of Wind Turbines
- DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
- Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
- Mdct-Dpanet: Dual-Path Attention Network for Multi-Channel Speech Separation
- Relationship Between GDT Follow-Current Phenomenon and Active Gases
- CrossSDF: 3D Reconstruction of Thin Structures From Cross-Sections
- Adaptive GPU Resource Allocation in Online Scenarios
- Wind Turbine Drivetrain Fault Diagnosis Based On Tower-Base Vibration Measurements: A Sensitivity
- Design of Intelligent Home Environment Monitoring System Based on Deep Learning
- BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions
- Control of Utility Interfaced PEM Fuel Cell, Solar Energy Conversion System and Battery Storage
- CNN-Transformer Feature Aggregation for Underwater Self-Supervised Multi-Frame Monocular Depth Estimation
- Novel Implementation Method of Selective Harmonic Elimination on a Low Cost FPGA Controller for V2G Application
- Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
- Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning
- PrEditor3D: Fast and Precise 3D Shape Editing
- Sketchy Bounding-Box Supervision for 3D Instance Segmentation
- Multimodal Sentiment Analysis Based on Multi-View Attention and Cross-Modal Contrastive Learning
- EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
- GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
- ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
- HC-PCL: A Hierarchical Cross-Camera Prototypical Contrastive Learning Framework for Unsupervised Object Re-Identification
- Optimization of Secure Offloading Data for Space-Air-Ground Integrated Networks Oriented to Mobile Edge Computing
- Keyframe-Guided Creative Video Inpainting
- Lightweight Hybrid Attention Network for Edge Deployment in Crop Disease Recognition
- Uplink Spectral Efficiency Performance of Cell-Free RAN System Under Imperfect CSI
- Cross-Sector, Efficient, Trusted Data Sharing in Dataspaces
- CaMuViD: Calibration-Free Multi-View Detection
- ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
- Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction
- Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping
- Image Referenced Sketch Colorization Based on Animation Creation Workflow
- DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
- The Impact Label Noise and Choice of Threshold has on Cross-Entropy and Soft-Dice in Image Segmentation
- Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval
- MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning
- DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
- Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
- Servitization in the B2B Manufacturing Context A Practice-Based Research Agenda