Research Library

open-access-imgOpen AccessAnalyzing Information Leakage on Video Object Detection Datasets by Splitting Images into Clusters with High Spatiotemporal Correlation
Author(s)
Ravi B. D. Figueiredo,
Hugo A. Mendes
Publication year2024
Publication title
ieee access
Resource typeMagazines
PublisherIEEE
Random splitting strategy is a common approach for training, testing, and validating object detection algorithms based on deep learning. Is common for datasets to have images extracted from video sources, in which there are frames with high spatial correlation, i.e., frames with rotated positions or different view angles of the same object. These highly correlated frames may lead to information leakage in training, if these frames are not well-distributed. In this work, it is shown that datasets created with highly spatial correlation frames from the same video have information leakage if using the random splitting strategy to distribute the image into the sub-datasets. It proposed a clustering dataset split algorithm in which images are distributed randomly in the sub-datasets in a pack or clusters instead of a single image at the time. The clusters are created by extracting the image features from a video of the dataset using an image-text pre-trained model, CLIP, and reducing the feature vector dimensionality with t-Distributed Stochastic Neighbor embedding (t-SNE). In this reduced dimensional representation, images are separated into clusters using a clustering algorithms like DBSCAN, OPTICS, and Agglomerative Clustering. These clusters are distributed into the train, test, and validation datasets randomly to avoiding information leakage by highly spatial correlation frames. YOLOv8 is used as the object detector algorithm to test the dataset splitting.
Subject(s)aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Keyword(s)Feature extraction, Clustering algorithms, Vectors, Information leakage, Correlation, Training, Spatiotemporal phenomena, Data preprocessing, clustering, information leakage, supervised training, video annotation
Language(s)English
SCImago Journal Rank0.587
H-Index127
eISSN2169-3536
DOI10.1109/access.2024.3383047

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here