z-logo
open-access-imgOpen Access
Robust Multi-Object Detection Based on Data Augmentation with Realistic Image Synthesis for Point-of-Sale Automation
Author(s) -
Saiprasad Koturwar,
Soma Shiraishi,
Kota Iwamoto
Publication year - 2019
Publication title -
proceedings of the aaai conference on artificial intelligence
Language(s) - English
Resource type - Journals
eISSN - 2374-3468
pISSN - 2159-5399
DOI - 10.1609/aaai.v33i01.33019492
Subject(s) - computer science , detector , automation , artificial intelligence , computer vision , object detection , task (project management) , product (mathematics) , object (grammar) , code (set theory) , set (abstract data type) , point (geometry) , image (mathematics) , pattern recognition (psychology) , mathematics , engineering , mechanical engineering , telecommunications , geometry , systems engineering , programming language
As an alternative to bar-code scanning, we are developing a real-time retail product detector for point-of-sale automation. The major challenge associated with image based object detection arise from occlusion and the presence of other objects in close proximity. For robust product detection under such conditions, it is crucial to train the detector on a rich set of images with varying degrees of occlusion and proximity between the products, which fairly represents a wide range of customer tendencies of placing products together. However, generating a fairly large database of such images traditionally requires a large amount of human effort. On the other hand, acquiring individual object images with their corresponding masks is a relatively easy task. We propose an realistic image synthesis approach which uses individual object images and their corresponding masks to create training images with desired properties (occlusion and congestion among the products). We train our product detector over images thus generated and achieve a consistent performance improvement across different types of test data. With the proposed approach, detector achieves an improvement of 46.2% (from 0.67 to 0.98) and 40% (from 0.60 to 0.84) over precision and recall respectively, compared to using a basic training dataset containing one product per image.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom