- Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
- DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
- Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
- SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
- JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems
- LP-Diff: Towards Improved Restoration of Real-World Degraded License Plate
- FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video
- Controllable Human Image Generation with Personalized Multi-Garments
- JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
- Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds
- Associative Transformer
- Leveraging Convolutional Neural Networks for Accurate Skin Cancer Classification
- Think, Prune, Train: Can Small Models Teach Themselves to Reason?
- Functionality understanding and segmentation in 3D scenes
- Flexible Frame Selection for Efficient Video Reasoning
- Subspace and DOA estimation under coarse quantization
- Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking
- Real-Time Avocado Plant Health and Disease Detection Using UAV Imagery with Faster R-CNN Algorithm
- Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
- ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary
- StarVector: Generating Scalable Vector Graphics Code from Images and Text
- A Unified Latent Schrödinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization
- Named Entity Recognition for Smart City Data Streams: Enhancing Visualization and Interaction
- Visual Lexicon: Rich Image Features in Language Space
- Simpler Diffusion: 1.5 FID on ImageNet512 with pixel-space diffusion
- Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
- The Method for Steel Surface Defect Detection and Classification Based on the Improved YOLOv7
- DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
- I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
- 3D Student Splatting and Scooping
- HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison
- PICO: Reconstructing 3D People In Contact with Objects
- JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
- Pose Priors from Language Models
- Analysis of encrypted wireless traffic for identification of IoT devices
- EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
- Integrating Advanced Feature Extraction with Deep Learning Models for Accurate Forecasting of Peak Load Demand and Solar Power Generation
- 3D transcranial Dynamic Ultrasound Localization Microscopy in the mouse brain using a Row-Column Array
- A Minimal Model for Emergent Collective Behaviors in Autonomous Robotic Multi-Agent Systems
- Unsupervised SAR Image Change Detection via Structure Feature-based Self-Representation Learning
- Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
- Dynamic configuration of Kubernetes containers resources with SLA classes
- HandJoKe: Joint-Guided Keypoint Denoising Transformer for Depth-based 3D Hand Pose Estimation
- A Compact Actuator for Lower-Limb Exoskeletons With High Torque Density and High Backdrivability
- Failure Detection in De-energized GaN-HEMT Switching Cells using Gate Driver-Induced Residual Voltage
- PhyS-EdiT: Physics-aware Semantic Image Editing with Text Description
- Logits DeConfusion with CLIP for Few-Shot Learning
- Dynamic Group Normalization: Spatio-Temporal Adaptation to Evolving Data Statistics
- Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning
- Differentiable Inverse Rendering with Interpretable Basis BRDFs
- D 3 CTTA: Domain-Dependent Decorrelation for Continual Test-Time Adaption of 3D LiDAR Segmentation
- Multi-UAV Multi-Task Path Planning Based on DDE-SA Algorithm
- Link-based Contrastive Learning for One-Shot Unsupervised Domain Adaptation
- VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
- Enhancing Dance-To-Music Generation via Negative Conditioning Latent Diffusion Model
- Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting
- Hash3D: Training-free Acceleration for 3D Generation
- Battery Integrated 1-phase DC-AC Inverter for Peak Load Shaving Application
- Capacitated vehicle routing model with time limit for waste collection and recycling in a university campus
- Bearing Remaining Useful Life Prediction Based on CICAE and ResConv1D-LSTM
- CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
- FiRe: Fixed-points of Restoration Priors for Solving Inverse Problems
- NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery
- VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
- A Novel Start-up Methodology for GaN HEMT-Based Ripple Power Compensation Integrated Totem-Pole PFC Converters
- Boost Your Human Image Generation Model via Direct Preference Optimization
- ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks
- Steepest Descent Density Control for Compact 3D Gaussian Splatting
- RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds
- Layer-Interaction DeepONet for modeling ultrafast nonlinear dynamics in optical fibers
- Agentic AI for Microservices: Autonomous Optimization of High-Volume Financial Transactions in Cloud Native Environments
- Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
- Quaffure: Real-Time Quasi-Static Neural Hair Simulation
- NADER: Neural Architecture Design via Multi-Agent Collaboration
- CleanDIFT: Diffusion Features without Noise
- A Prevention and Control Method of Bird Harm Based on Bird's Nest Refined Detection and Bird Harm Level Assessment
- Model Predictive Control Strategy for Optimal Operation of Dual Active Bridge Converter Based Battery Energy Storage System
- Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
- Rate Splitting Multiple Access for RIS-aided URLLC MIMO Broadcast Channels
- Impact of Tower Grounding Resistance on Insulator Flashover Risk in 500 kV HVDC Transmission Systems Under Lightning Strikes
- FluxSpace: Disentangled Semantic Editing in Rectified Flow Models
- End-to-End HOI Reconstruction Transformer with Graph-based Encoding
- A Unified Model for Compressed Sensing MRI Across Undersampling Patterns
- Filtering-Based Segmentation of Overlapping Food Items: A Case Study in Automated Chicken Breast Handling
- FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones
- NOMA-ISAC Empowered UAV Networks: Joint Clustering and Power Allocation Optimization
- Joint Optimization of RIS-User Association and Beamforming Design for Multi-RIS-Assisted Multi-User Systems
- Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility
- Generative Sparse-View Gaussian Splatting
- A New Paradigm for Hospital Data : Integrating Human-Centered Feedback to Foster Ecosystem Transformation
- Data Transfer Schemes in the High-Level Communication Library LAIK
- Research on Ship Traffic Flow Analysis Based on Big Data Technology
- Beyond Human Perception: Understanding Multi-Object World from Monocular View
- Generative Modeling of Class Probability for Multi-Modal Representation Learning
- Integrated Techniques to Support Decision-making in Production Planning and Control
- Design and Deployment of a Remaining Useful Life Estimation Algorithm of Power Switches in a Cloud Computing Environment
- Machine Learning-aided Sensing in Private mmWave Networks for Industrial Applications
- Omni-ID: Holistic Identity Representation Designed for Generative Tasks
- Dual Focus-Attention Transformer for Robust Point Cloud Registration
- Enhanced then Progressive Fusion with View Graph for Multi-View Clustering