- Construction of an Information Service Platform for Overseas Chinese Affairs with Digital Humanities
- Balanced Rate-Distortion Optimization in Learned Image Compression
- CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
- CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation
- A Lightweight and High-Precision Target Detection Algorithm for Unmanned Surface Vehicles
- A Study on EMI Analysis and Mitigation in Three-Phase DAB Converters for Electric Vehicle Applications
- EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
- SeCap: Self-Calibrating and Adaptive Prompts for Cross-View Person Re-Identification in Aerial-Ground Networks
- Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning Approach with Policy Distillation
- Enhanced Sugarcane Leaf Disease Prediction Using Deep Learning Models
- UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion
- DiC: Rethinking Conv3x3 Designs in Diffusion Models
- Analytical Modeling and Loss Estimation of Triple Active Bridge Converters
- An Orthogonal Quad-Beam Scanning Antenna Using 1-Bit Dielectric Modulation in Plasmonic Metamaterial Transmission Line for Traffic Monitoring Applications
- Robust Out-of-Distribution Detection Based on Effective Points Select
- Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
- Learning to Filter Outlier Edges in Global SfM
- TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion
- DreamText: High Fidelity Scene Text Synthesis
- Design, Analysis and Operation of a Long-primary Short-secondary TF-DSLIM
- Application of Data-Driven Method in Fault Prediction of Intelligent Operation and Maintenance System of Photovoltaic Power Station
- Optimizing Monolithic CFET Middle-of-Line Contact Architectures at A10 Node: A DTCO Simulation Study
- ONDA-Pose: Occlusion-Aware Neural Domain Adaptation for Self-Supervised 6D Object Pose Estimation
- ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
- SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
- Improving Editability in Image Generation with Layer-wise Memory
- A Physics-Informed Blur Learning Framework for Imaging Systems
- Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
- A Design Methodology for a Partial Power PSFB DC-DC Converter for Battery Charging
- FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts
- Extreme Rotation Estimation in the Wild
- Towards Interpretable Energy Estimation for Edge AI Applications
- Digital Transformation in Small and Medium Size Enterprises in Germany - A Use Case. Digital Twins for Virtuel Commissioning
- Effect of Winding Design on the Energy Efficiency of Pole-Changing Induction Motors
- Complex Valued Linear Discriminant Analysis on mmWave Radar Face Signatures for Task-Oriented Semantic Communication
- Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
- DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
- Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
- SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
- JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems
- LP-Diff: Towards Improved Restoration of Real-World Degraded License Plate
- FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video
- Controllable Human Image Generation with Personalized Multi-Garments
- JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
- Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds
- Associative Transformer
- Leveraging Convolutional Neural Networks for Accurate Skin Cancer Classification
- Think, Prune, Train: Can Small Models Teach Themselves to Reason?
- Functionality understanding and segmentation in 3D scenes
- Flexible Frame Selection for Efficient Video Reasoning
- Subspace and DOA estimation under coarse quantization
- Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking
- Real-Time Avocado Plant Health and Disease Detection Using UAV Imagery with Faster R-CNN Algorithm
- Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
- ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary
- StarVector: Generating Scalable Vector Graphics Code from Images and Text
- A Unified Latent Schrödinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization
- Named Entity Recognition for Smart City Data Streams: Enhancing Visualization and Interaction
- Visual Lexicon: Rich Image Features in Language Space
- Simpler Diffusion: 1.5 FID on ImageNet512 with pixel-space diffusion
- Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
- The Method for Steel Surface Defect Detection and Classification Based on the Improved YOLOv7
- DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
- I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
- 3D Student Splatting and Scooping
- HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison
- PICO: Reconstructing 3D People In Contact with Objects
- JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
- Pose Priors from Language Models
- Analysis of encrypted wireless traffic for identification of IoT devices
- EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
- Integrating Advanced Feature Extraction with Deep Learning Models for Accurate Forecasting of Peak Load Demand and Solar Power Generation
- 3D transcranial Dynamic Ultrasound Localization Microscopy in the mouse brain using a Row-Column Array
- A Minimal Model for Emergent Collective Behaviors in Autonomous Robotic Multi-Agent Systems
- Unsupervised SAR Image Change Detection via Structure Feature-based Self-Representation Learning
- Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
- Dynamic configuration of Kubernetes containers resources with SLA classes
- HandJoKe: Joint-Guided Keypoint Denoising Transformer for Depth-based 3D Hand Pose Estimation
- A Compact Actuator for Lower-Limb Exoskeletons With High Torque Density and High Backdrivability
- Failure Detection in De-energized GaN-HEMT Switching Cells using Gate Driver-Induced Residual Voltage
- PhyS-EdiT: Physics-aware Semantic Image Editing with Text Description
- Logits DeConfusion with CLIP for Few-Shot Learning
- Dynamic Group Normalization: Spatio-Temporal Adaptation to Evolving Data Statistics
- Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning
- Differentiable Inverse Rendering with Interpretable Basis BRDFs
- D 3 CTTA: Domain-Dependent Decorrelation for Continual Test-Time Adaption of 3D LiDAR Segmentation
- Multi-UAV Multi-Task Path Planning Based on DDE-SA Algorithm
- Link-based Contrastive Learning for One-Shot Unsupervised Domain Adaptation
- VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
- Enhancing Dance-To-Music Generation via Negative Conditioning Latent Diffusion Model
- Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting
- Hash3D: Training-free Acceleration for 3D Generation
- Battery Integrated 1-phase DC-AC Inverter for Peak Load Shaving Application
- Capacitated vehicle routing model with time limit for waste collection and recycling in a university campus
- Bearing Remaining Useful Life Prediction Based on CICAE and ResConv1D-LSTM
- CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
- FiRe: Fixed-points of Restoration Priors for Solving Inverse Problems
- NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery
- VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
- A Novel Start-up Methodology for GaN HEMT-Based Ripple Power Compensation Integrated Totem-Pole PFC Converters