- Taillight Detection for Driving Intention Recognition in Multi-Scene Autonomous Driving
- Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
- SDFLMQ: A Semi-Decentralized Federated Learning Framework over MQTT
- World-consistent Video Diffusion with Explicit 3D Modeling
- SpiritSight Agent: Advanced GUI Agent with One Look
- SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models
- Diffusion-Assisted Distillation for Self-Supervised Graph Representation Learning with MLPs
- Charon: An End-to-End Infrastructure for Connecting AI@Edge to HPC
- Medical Image Object Detection via Layout-Aware Convolution and Optimal Transport Collaboration
- ABC-Former: Auxiliary Bimodal Cross-domain Transformer with Interactive Channel Attention for White Balance
- Encoder-Aware Video Downscaling Using Encoding Parameters
- Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection
- SfM-Free 3D Gaussian Splatting via Hierarchical Training
- Design Optimization of a 3kW Bi-Directional Dual Active Bridge Converter for Battery Energy Storage Application
- An Autonomous Driving Vehicle-Road Collaborative Heterogeneous Fusion Localization Method Based on the Epipolar Plane Model and Graph Optimization
- Implicit Correspondence Learning for Image-to-Point Cloud Registration
- Evaluating Expansion Memory for Optimizer State Offloading for Large Transformer Models
- Allocating Battery Energy Storage System in Droop Controlled Islanded Microgrid Considering Uncertainties
- Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory
- SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer
- MINIMA: Modality Invariant Image Matching
- A High-Density, Deadline-Aware, and Scalable Serverless Platform for Sub-Millisecond Functions at the Edge
- F 3 OCUS - Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics
- Uncertainty-Aware Neighbor Calibration for Positive and Unlabeled Learning in Large Machine Learning Models
- GenFusion: Closing the Loop between Reconstruction and Generation via Videos
- LSNet: See Large, Focus Small
- Practical solutions to the relative pose of three calibrated cameras
- Receiver-Agnostic Radio Frequency Fingerprinting via Domain-Invariant Feature Learning
- PS-EIP: Robust Photometric Stereo Based on Event Interval Profile
- DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
- Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
- ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
- Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
- Sample- and Parameter-Efficient Auto-Regressive Image Models
- Treetap9: a forestry tool for measuring standing tree stiffness
- Multi-Objective Tertiary Layer Optimization for DC Microgrids
- Enhancing Agricultural Decision Making with Machine Learning
- Reversing Flow for Image Restoration
- Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
- Amplified OFF State Voltage Stress across SiC MOSFETs of 4-Quadrant Switch
- DEFOM-Stereo: Depth Foundation Model Based Stereo Matching
- Explicit Correspondence Matching for Generalizable Neural Radiance Fields
- Toward Efficient Power Scene Detection via Topology-Preserved Knowledge Distillation
- Test-time augmentation improves efficiency in conformal prediction
- A Light-weighted Fusion Vision Mamba for Multimodal Remote Sensing Data Classification
- Fusion-Based Additive Manufacturing of Hastelloy C-Series: A Comparative Study on Microstructure, Mechanical Properties, and Residual Stress
- AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
- HumanMM: Global Human Motion Recovery from Multi-shot Videos
- Improving the Training of Data-Efficient GANs via Quality Aware Dynamic Discriminator Rejection Sampling
- EdgeMovingNet: Edge-preserving Point Cloud Reconstruction via Joint Geometry Features
- Application of Optimization Methods to obtain Switching Angles for Selective Harmonic Minimization Pulse Amplitude Modulation (SHMPAM) Technique for 3-Phase Seven Level CHB Multilevel Inverter
- Classifying Workers for Mitigating Adversarial Attacks in Crowdsourcing
- Relative Representations of Latent Spaces enable Efficient Semantic Channel Equalization
- The Importance and Impact of Adaptability for the Success of Manufacturing Companies and Production-Related Service Providers in a Rapidly Changing World
- Few-shot Personalized Scanpath Prediction
- Incomplete Multi-modal Brain Tumor Segmentation via Learnable Sorting State Space Model
- SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
- Conductive Noise Modeling using GA Parameter Fitting and Effective Validation of Noise Reduction Filter
- Foggy Weather Scene Object Detection Algorithm: FREFog-Yolov8s
- DEEP: Edge-Based Dataflow Processing with Hybrid Docker Hub and Regional Registries
- Observer-based dynamic event-triggered resilient control for heterogeneous multi-agent systems under DoS attacks
- PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting
- Video Depth without Video Models
- M-LLM Based Video Frame Selection for Efficient Video Understanding
- DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal
- 6G Infrastructures for Edge AI: An Analytical Perspective
- A Bi-Level Multi-Objective System for Renewable Energy Self-Consumption: A Resident-Aware Approach to Leveraging Energy Flexibility
- Simulation Research on Lightning Withstand Level of $500 \text{kV} / \pm 800 \text{kV}$ Hybrid Tower Large Crossing Transmission Line Considering Leader Current
- Patient-Specific Digital Twins for Personalized Healthcare: A Hybrid AI and Simulation-Based Framework
- Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
- Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
- RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability
- Analyzing 16,193 LLM Papers for Fun and Profits
- FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video
- AI-Driven Stress Detection Systems Tailored for IT Industry Challenges
- Co-designing a Variable Reluctance Energy Harvester and Power Management System for Smart Bearing Applications
- Lightweight Semantic Segmentation of Road Cracks Based on Improved DeepLabV3+
- Observability and Incident Response in Managed Serverless Environments Using Ontology-Based Log Monitoring
- TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
- FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis
- RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
- Spectral Informed Mamba for Robust Point Cloud Processing
- Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
- A Multiple Access Channel Game with Trade-Off between SINR and Energy Saving
- Resistance Switching Properties of Stoichiometric and Nitrogen Implanted Silicon Nitride Nanolayers on N and P-Type Si Substrates
- ORIDa: Object-centric Real-world Image Composition Dataset
- ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models
- HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving
- Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
- A TBD Fuzzy C-Means Clustering Algorithm Based on Quadratic Polynomial for Cardiac Image Segmentation
- Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding
- ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
- SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing
- DefMamba: Deformable Visual State Space Model
- TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
- Single Domain Generalization for Few-Shot Counting via Universal Representation Matching
- Predicting ocular diseases using squeezenet as feature maps with Convolutional Neural Networks
- Point Cloud Upsampling Using Conditional Diffusion Module with Adaptive Noise Suppression
- APT: Adaptive Personalized Training for Diffusion Models with Limited Data
- 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes