- Toward Efficient Power Scene Detection via Topology-Preserved Knowledge Distillation
- Test-time augmentation improves efficiency in conformal prediction
- AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
- HumanMM: Global Human Motion Recovery from Multi-shot Videos
- Improving the Training of Data-Efficient GANs via Quality Aware Dynamic Discriminator Rejection Sampling
- EdgeMovingNet: Edge-preserving Point Cloud Reconstruction via Joint Geometry Features
- Application of Optimization Methods to obtain Switching Angles for Selective Harmonic Minimization Pulse Amplitude Modulation (SHMPAM) Technique for 3-Phase Seven Level CHB Multilevel Inverter
- Relative Representations of Latent Spaces enable Efficient Semantic Channel Equalization
- The Importance and Impact of Adaptability for the Success of Manufacturing Companies and Production-Related Service Providers in a Rapidly Changing World
- Few-shot Personalized Scanpath Prediction
- Incomplete Multi-modal Brain Tumor Segmentation via Learnable Sorting State Space Model
- SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
- Conductive Noise Modeling using GA Parameter Fitting and Effective Validation of Noise Reduction Filter
- Foggy Weather Scene Object Detection Algorithm: FREFog-Yolov8s
- DEEP: Edge-Based Dataflow Processing with Hybrid Docker Hub and Regional Registries
- Observer-based dynamic event-triggered resilient control for heterogeneous multi-agent systems under DoS attacks
- PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting
- Video Depth without Video Models
- M-LLM Based Video Frame Selection for Efficient Video Understanding
- DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal
- 6G Infrastructures for Edge AI: An Analytical Perspective
- A Bi-Level Multi-Objective System for Renewable Energy Self-Consumption: A Resident-Aware Approach to Leveraging Energy Flexibility
- Simulation Research on Lightning Withstand Level of $500 \text{kV} / \pm 800 \text{kV}$ Hybrid Tower Large Crossing Transmission Line Considering Leader Current
- Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
- Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
- RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability
- Analyzing 16,193 LLM Papers for Fun and Profits
- FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video
- AI-Driven Stress Detection Systems Tailored for IT Industry Challenges
- Co-designing a Variable Reluctance Energy Harvester and Power Management System for Smart Bearing Applications
- Lightweight Semantic Segmentation of Road Cracks Based on Improved DeepLabV3+
- Observability and Incident Response in Managed Serverless Environments Using Ontology-Based Log Monitoring
- TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
- FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis
- RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
- Spectral Informed Mamba for Robust Point Cloud Processing
- Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
- A Multiple Access Channel Game with Trade-Off between SINR and Energy Saving
- Resistance Switching Properties of Stoichiometric and Nitrogen Implanted Silicon Nitride Nanolayers on N and P-Type Si Substrates
- ORIDa: Object-centric Real-world Image Composition Dataset
- ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models
- HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving
- Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
- A TBD Fuzzy C-Means Clustering Algorithm Based on Quadratic Polynomial for Cardiac Image Segmentation
- Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding
- ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
- SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing
- DefMamba: Deformable Visual State Space Model
- TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
- Single Domain Generalization for Few-Shot Counting via Universal Representation Matching
- Predicting ocular diseases using squeezenet as feature maps with Convolutional Neural Networks
- Point Cloud Upsampling Using Conditional Diffusion Module with Adaptive Noise Suppression
- APT: Adaptive Personalized Training for Diffusion Models with Limited Data
- 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes
- GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
- A Novel Torque Sensing Approach to Eliminate Stiction in Haptic Devices with Hybrid Motor/Brake Actuation
- Invisible Backdoor Attack against Self-supervised Learning
- Auto Cherry-Picker : Learning from High-quality Generative Data Driven by Language
- Real-Time Environmental Monitoring using ESP-NOW-based Wireless Sensor Network for Sustainable Agriculture
- High-Performance Computing for Graph AI: A Top-Down Perspective
- 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement
- SACB-Net: Spatial-Awareness Convolutions for Medical Image Registration
- FSHNet: Fully Sparse Hybrid Network for 3D Object Detection
- HistoFS: Non-IID Histopathologic Whole Slide Image Classification via Federated Style Transfer with RoI-Preserving
- Enhancing EV Charging Infrastructure with Vanadium Redox Flow Batteries: A Comprehensive Study of Design and Implementation
- Dangerous Scenarios Accelerated Search Method for Automated Driving System Based on Parallel Architecture
- VEU-Bench: Towards Comprehensive Understanding of Video Editing
- A U-Net Framework with Dice Loss for High-Precision Retinal Vessel Segmentation
- MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
- NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics
- Performance and Portability in Multi-GPU Branch-and-Bound: Chapel Versus CUDA and HIP for Tree-Based Optimization
- Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
- On the Capacity of an Asynchronous MIMO System with Oversampling Reception
- Neural Hierarchical Decomposition for Single Image Plant Modeling
- Develop a Versatile ECM Framework Capable of Accurately Representing Multiple Cell Types
- Glossy Object Reconstruction with Cost-effective Polarized Acquisition
- FLAVC: Learned Video Compression with Feature Level Attention
- MangaNinja: Line Art Colorization with Precise Reference Following
- LM-Offload: Performance Model-Guided Generative Inference of Large Language Models with Parallelism Control
- Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
- Predictive Analytics in Endodontics: Machine Learning Approaches for Treatment Success and Failure Prediction
- K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
- A Nine-Level Switched-Capacitor Based Quadruple Boost Inverter Using Finite Control Set MPC
- Multi-Kernel Enhanced Receding-Horizon Reinforcement Learning for Steering Control of Intelligent Vehicles
- Change Detection Network Based on Deformable Spatiotemporal Convolution
- Generating 3D-Consistent Videos from Unposed Internet Photos
- Features Combination Selection for Metro Ridership Prediction
- A Node Failure Prediction Methodology for Cloud-Native Clusters
- SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization
- Analysis Method and Three-Dimensional Distribution Characteristics of Cloud-to-Ground Lightning Under Mesoscale Topography
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
- HSI: A Holistic Style Injector for Arbitrary Style Transfer
- Fault Localization and Severity Estimation in Power Systems
- LLM-Based Multi-Agent Framework for Troubleshooting Distributed Systems
- Pedestrian Trajectory Prediction Based on Multi-Relational Graph Convolution and Dynamic Attention
- AvatarArtist: Open-Domain 4D Avatarization
- Guiding Human-Object Interactions with Rich Geometry and Relations
- Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning
- Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
- FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering