- Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors
- VITED : Video Temporal Evidence Distillation
- Research on Multi-Channel Metal Bottle Cap Defect Detection System Based on Machine Vision
- FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error
- STING-BEE : Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
- Supervising Sound Localization by In-the-wild Egomotion
- LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
- Method for Image Restoration in Oil Well Drilling Fluid
- HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics
- Beyond Clean Training Data: A Versatile and Model-Agnostic Framework for Out-of-Distribution Detection with Contaminated Training Data
- CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
- Advancing Manga Analysis: Comprehensive Segmentation Annotations for the Manga109 Dataset
- GA3CE: Unconstrained 3D Gaze Estimation with Gaze-Aware 3D Context Encoding
- Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection
- AI-Enhanced Detection of Dynamic Structural Changes in Inflammatory Protein Interfaces: A Case Study of CD11b/Mac-1 Interactions
- VisionArena: 230K Real World User-VLM Conversations with Preference Labels
- A Novel Subpixel Detection Method Based on Improved Hampel Circle Fitting for Ball Grid Array Components
- PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
- The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
- Mask-Enhanced Edge-Aware Knowledge Distillation for Medical Image Segmentation
- Collaborative Forecasting with Reinforcement Learning to Enhance Resilience
- RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting
- AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
- GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
- RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos
- Black Start Strategy for Modern Power Systems Using Inverter-Based Resources
- PerLA: Perceptive 3D language assistant
- GSM based LPG Leakage Detector with SMS Alert System
- OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
- IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
- Loss-Optimized Inverter Modulation in Battery-Electric Powertrains Based on Harmonic Loss Models
- Foggy Target Detection Algorithm Based on CBAM-FE and SPD-Conv
- Comparison of Inductor Current Ripples between GaN and Si-based CI-SIDO Buck Converter
- Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
- Setchain Algorithms for Blockchain Scalability (Extended Abstract)
- Automatic Expansion and Contraction of Trapping Resources Based on Electric Power Environment
- Efficient Transfer Learning for Video-language Foundation Models
- MAD: Memory-Augmented Detection of 3D Objects
- GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections
- Liquid Metal Elastomer Foam-based Sensor Array using Transmission Line for Continuous Spatial Stress Measurement
- A Real Time Data Acquisition of Photovoltaic Solar Panel Monitoring System based on Internet of Things using Arduino UNO
- On the Landscape of Graph Clustering at Scale
- AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
- Security and Resilience in Cyber-Physical Systems: Detection, Estimation, and Control [Bookshelf]
- Fault Identification Scheme for HVDC Systems Based on Single-Sided Trigger Angle Signal and CWT-CNN
- Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
- Optical-Flow Guided Prompt Optimization for Coherent Video Generation
- CADDreamer: CAD Object Generation from Single-view Images
- KMD: Koopman Multi-modality Decomposition for Generalized Brain Tumor Segmentation under Incomplete Modalities
- 3D Modeling of Coal Bunkers Based on LiDAR
- STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification
- Overview of Gaps in LCA Data Quality and Future Perspectives
- RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
- Underwater Image Recovery Using Low-Frequency Filtering and Polarization Imaging Modeling
- Investigation of Stability Challenges in MEA Onboard DC Microgrids using MTPA based Direct Torque Control
- An Improved YOLOv8 Based Unsafe Behavior Detection Algorithm for Coal Mine Underground Personnel
- T2ICount: Enhancing Cross-Modal Understanding for Zero-Shot Counting
- Spec-YOLO: An Efficient Deep Network for Spectrogram-Based Signals Identification
- WonderWorld: Interactive 3D Scene Generation from a Single Image
- An Enhanced Topic Analysis Method for Mooc Comments Based on Multi-Dimensional Feature Fusion
- In-Band Full-Duplex System for Semantic Communication
- WildAvatar: Learning In-the-wild 3D Avatars from the Web
- Deep Integration Analysis of MEC Computing Nodes and CDN PoP Nodes
- Hybrid Machine Learning Approaches for Enhanced Grid Stability Prediction in Modern Energy Systems
- Deep RL-based Resource Allocation for User Fairness in STAR-RIS–assisted NOMA-enabled B5G Networks
- RIS-Assisted Communications: A Comprehensive Study for Far-and Near-Field Scenarios
- Leveraging Global Stereo Consistency for Category-Level Shape and 6D Pose Estimation from Stereo Images
- Enhanced Multi-Class Driver Behavior Detection in IoMT Environments Using Hybrid LSTM-GRU Model
- Research on Engine Lubrication Oil Temperature Prediction Based on WOA-LSTM Algorithm
- CocoER: Aligning Multi-Level Feature by Competition and Coordination for Emotion Recognition
- PB-TABL: Task Incremental Learning Strategy via Applying Piggyback Architecture on Temporal Attention-Augmented Bilinear Networks for Financial Time-Series Classification
- GraphGPT-o: Synergistic Multimodal Comprehension and Generation on Graphs
- Application Research of Lightning Warning Device for Transmission Lines in the Prediction of Severe Convection Thunderstorm Activities
- Spatial-Spectral Texture-Preserved Total Variation: A Novel Regularization for Hyperspectral Image Denoising
- Relation3D: Enhancing Relation Modeling for Point Cloud Instance Segmentation
- Can Large Vision-Language Models Correct Semantic Grounding Errors By Themselves?
- SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
- Large Language Models for Spatio-Temporal Mobile Traffic Predictions
- Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
- An Efficient Cross-Domain Trusted Authentication Scheme for Microgrids
- ViStream: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network
- Maximizing Grid Forming Capabilities of Solar Inverters with Energy Storage Under Partial Shading Conditions
- Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
- Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis
- Symbiotic Federated Learning for Giant AI Threat Detection in 6G-IoT Infrastructures
- Polar Dense Ice Layer Ship Path Planning Based on DI-IVYA-A* Algorithm
- PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
- Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
- Probabilistic Generative Approach for Ambiguity-Aware Parameter Extraction
- BLADE: Single-View Body Mesh Estimation through Accurate Depth Estimation
- VSNet: Focusing on the Linguistic Characteristics of Sign Language
- No Pains, More Gains: Recycling Sub-Salient Patches for Efficient High-Resolution Image Recognition
- Activating Sparse Part Concepts for 3D Class Incremental Learning
- SparkPerf: A Benchmarking Framework for Evaluating the Performance of Spark Data Analytics Projects
- Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
- Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
- Anthropomorphic Grasp Motion Planning for Humanoid Robots via Learned Riemannian Metric and Dextrous Grasp Evaluator
- Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
- Improved Model-Free Adaptive Load Frequency Control for Multi-Area Power Systems