- Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
- Optical-Flow Guided Prompt Optimization for Coherent Video Generation
- CADDreamer: CAD Object Generation from Single-view Images
- KMD: Koopman Multi-modality Decomposition for Generalized Brain Tumor Segmentation under Incomplete Modalities
- 3D Modeling of Coal Bunkers Based on LiDAR
- STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification
- Overview of Gaps in LCA Data Quality and Future Perspectives
- RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
- Underwater Image Recovery Using Low-Frequency Filtering and Polarization Imaging Modeling
- Investigation of Stability Challenges in MEA Onboard DC Microgrids using MTPA based Direct Torque Control
- An Improved YOLOv8 Based Unsafe Behavior Detection Algorithm for Coal Mine Underground Personnel
- T2ICount: Enhancing Cross-Modal Understanding for Zero-Shot Counting
- Spec-YOLO: An Efficient Deep Network for Spectrogram-Based Signals Identification
- WonderWorld: Interactive 3D Scene Generation from a Single Image
- An Enhanced Topic Analysis Method for Mooc Comments Based on Multi-Dimensional Feature Fusion
- In-Band Full-Duplex System for Semantic Communication
- WildAvatar: Learning In-the-wild 3D Avatars from the Web
- Deep Integration Analysis of MEC Computing Nodes and CDN PoP Nodes
- Hybrid Machine Learning Approaches for Enhanced Grid Stability Prediction in Modern Energy Systems
- Deep RL-based Resource Allocation for User Fairness in STAR-RIS–assisted NOMA-enabled B5G Networks
- RIS-Assisted Communications: A Comprehensive Study for Far-and Near-Field Scenarios
- Leveraging Global Stereo Consistency for Category-Level Shape and 6D Pose Estimation from Stereo Images
- Research on Engine Lubrication Oil Temperature Prediction Based on WOA-LSTM Algorithm
- CocoER: Aligning Multi-Level Feature by Competition and Coordination for Emotion Recognition
- PB-TABL: Task Incremental Learning Strategy via Applying Piggyback Architecture on Temporal Attention-Augmented Bilinear Networks for Financial Time-Series Classification
- GraphGPT-o: Synergistic Multimodal Comprehension and Generation on Graphs
- Application Research of Lightning Warning Device for Transmission Lines in the Prediction of Severe Convection Thunderstorm Activities
- Spatial-Spectral Texture-Preserved Total Variation: A Novel Regularization for Hyperspectral Image Denoising
- Relation3D: Enhancing Relation Modeling for Point Cloud Instance Segmentation
- Can Large Vision-Language Models Correct Semantic Grounding Errors By Themselves?
- SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
- Large Language Models for Spatio-Temporal Mobile Traffic Predictions
- Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
- An Efficient Cross-Domain Trusted Authentication Scheme for Microgrids
- ViStream: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network
- Maximizing Grid Forming Capabilities of Solar Inverters with Energy Storage Under Partial Shading Conditions
- Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
- Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis
- Symbiotic Federated Learning for Giant AI Threat Detection in 6G-IoT Infrastructures
- Polar Dense Ice Layer Ship Path Planning Based on DI-IVYA-A* Algorithm
- PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
- Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
- Probabilistic Generative Approach for Ambiguity-Aware Parameter Extraction
- BLADE: Single-View Body Mesh Estimation through Accurate Depth Estimation
- VSNet: Focusing on the Linguistic Characteristics of Sign Language
- No Pains, More Gains: Recycling Sub-Salient Patches for Efficient High-Resolution Image Recognition
- Activating Sparse Part Concepts for 3D Class Incremental Learning
- SparkPerf: A Benchmarking Framework for Evaluating the Performance of Spark Data Analytics Projects
- Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
- Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
- Anthropomorphic Grasp Motion Planning for Humanoid Robots via Learned Riemannian Metric and Dextrous Grasp Evaluator
- Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
- Improved Model-Free Adaptive Load Frequency Control for Multi-Area Power Systems
- Application of Bayesian Price Clearing Auction Model in Enhancing Transactive Energy Systems Vulnerabilities to Cyber Attacks
- Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis
- Video Language Model Pretraining with Spatio-temporal Masking
- Parkinson’s Disease Detection Using Multi-Scale Frequency-Sharing Channel Attention Network With Smartwatch Movement Recordings
- v-CLR: View-Consistent Learning for Open-World Instance Segmentation
- Gate Efficient Composition of Hamiltonian Simulation and Block-Encoding with its Application on HUBO, Chemistry and Finite Difference Method
- OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection
- Enhancing SRAM Efficiency and Stability with Self Pull Up Mechanism and Bitline Charge Sharing
- Re-thinking Temporal Search for Long-Form Video Understanding
- Font-Agent: Enhancing Font Understanding with Large Language Models
- MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
- SimVS: Simulating World Inconsistencies for Robust View Synthesis
- Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding
- Solving Instance Detection from an Open-World Perspective
- SDGOCC: Semantic and Depth-Guided Bird’s-Eye View Transformation for 3D Multimodal Occupancy Prediction
- The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation
- Towards Automated Certification Framework of Composite Systems: A SWRL-Based Approach
- VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
- Specific emitter identification for open-set satellite TT&C signals based on multi-task learning
- Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages
- Experimental Characterization of High-frequency Transformers for Isolated DC-DC Converters
- LiSu: A Dataset and Method for LiDAR Surface Normal Estimation
- UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
- Research on the Shore-Based Monitoring System for Marine Diesel Engines Based on Digital Twin Technology
- Autonomous Navigation for Mobile Robots in Dynamic Environments Based on Deep Reinforcement Learning
- IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement
- Shape and Texture: What Influences Reliable Optical Flow Estimation?
- Indirect Lightning Strike Analysis of Blocking Diodes in a Large-Scale Photovoltaic System
- Intelligent Image Classification and Emergency Detection for Enhanced CCTV Surveillance Systems using CNN
- Decay Time Processing Technique by Digital Architecture for Noninvasive Optical Monitoring of Biosignals in Healthcare Applications
- CacheQuant: Comprehensively Accelerated Diffusion Models
- DreamTrack: Dreaming the Future for Multimodal Visual Object Tracking
- A Hierarchical Patch Feature Distribution Network for Industrial Multiscale Defect Detection
- Analysis of Lightning Grounding Values Using Variations in Frequency, Distance Ratio, and Measurement Methods
- BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis
- Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways
- MICAS: Multi-grained In-Context Adaptive Sampling for 3D Point Cloud Processing
- Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
- Input Series Output Parallel Connection based Fault Tolerant LV Power Supply in Automotive Applications
- DiffCAM: Data-Driven Saliency Maps by Capturing Feature Differences
- Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment
- Kernel Instruction Optimization Based on the Triton Compiler
- Research on Active Braking Control Strategy of EHB Based on Finite-Time Adaptive Control for Intelligent Vehicle
- HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
- ML Enabled Parallel R-C Sensor for Level and Electrical Conductivity Measurement
- Study and Validation of a Novel dq-axes Equivalent Circuit Model for PMSM Considering the Iron Loss