3D Facial Reconstruction from 2D Portrait Imagery | Zendy

Matthew V. Caruana | Zendy; Joseph G. Vella | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

3D Facial Reconstruction from 2D Portrait Imagery

Author(s) -

Matthew V. Caruana,

Joseph G. Vella

Publication year - 2020

Publication title -

information and security an international journal

Language(s) - English

Resource type - Journals

eISSN - 1314-2119

pISSN - 0861-5160

DOI - 10.11610/isij.4724

Subject(s) - portrait , computer vision , artificial intelligence , art , computer science , visual arts

3D facial images are reconstructed from 2D portraits using regression trees for facial landmark alignment and 3D morphable models. Two generic regression trees were adopted, one being based on the widely used 68-landmark structure, and the other based on a 74-landmark structure. The FaceWarehouse dataset was used to create a novel 74-landmark regression tree and during the system’s evaluation. The accuracy of the models generated was computed through the Root Mean Square, 75 Percentile and Arithmetic Mean comparison metrics. Two different datasets of 2D images were reconstructed. The evaluation results demonstrate that a higher level of accuracy and precision was attained from the models reconstructed using 68-landmark regression tree when compared to the 74 developed here. The accuracy produced by the 68-landmark regression tree applied to two sets was 85 % and 90 % as opposed to the 82 % and 83 % produced by the 74-landmark regression tree on the same model subsets; thus justifying its wide adoption. A R T I C L E I N F O : RECEIVED: 11 JUNE 2020 REVISED: 23 AUG 2020 ONLINE: 14 SEP 2020 K E Y W O R D S : digital forensics, forensic facial reconstruction, landmark alignment Creative Commons BY-NC 4.0 Introduction Facial Reconstruction is a vast area and even though it is researched extensively, there is much scope for improvement. Facial Reconstruction is being used in Facial Rehabilitation after extensive face traumas, and Facial Recognition in forensic investigations. Moreover, it has other applications that may be of benefit to our society, namely, in locating missing people, where it could be useful to have 3D facial models to complement the available material to base their 3D Facial Reconstruction from 2D Portrait Imagery 329 searching on. In academia and industry various approaches have been put forward in order to achieve the best results possible to meet these domain requirements; these approaches’ main function is to compare an input portrait image that is in a 2D format to a set of 3D models and find a 3D model that is the better fit. The production of a specific 3D model is generated by a set of algorithms that are applied to the input image. These ensure that the specific 3D model produced is devoid of any occultation (e.g. specs), disguise (e.g. beard), skin texture and facial expressions. As described by Blanz and Vetter, “Reconstructing the true 3D shape and texture of a face from a single image is an ill-posed problem.” Consequently, this problem does not have a specific solution as it relies heavily on the input data provided. Nonetheless, the inclusion of datasets that contain images annotated with appropriate landmarks is beneficial for the best results to be extracted through the models generated. Furthermore, reconstruction of faces from images is currently considered to be a computationally heavy procedure. Finally, setting up and testing of the various available algorithms is required when accepting a wide array of input 2D images. Consequently, it is expected training dataset choice does affect the performance and results generated by the same algorithms. Aims & Objectives The aim of this application is to address the computation of facial reconstruction and provide a platform that generates and evaluates a 3D model from a single 2D input image. To undertake reconstruction the system has to create regression trees fitting features extracted from the landmarks found within the 2D input image. The texture from the facial image is also extracted for its later application on the produced 3D model. To achieve a good regression, various datasets that contain portrait images and their annotated landmarks need to be available. Moreover, the developed system needs an efficient and an effective approach to reconstruct a 3D model from an input 2D image of a human face. An important part of the system is to ensure tests are administered using various unit test suites during its development, and during its use any 3D model created from the 2D image is evaluated using 2D and 3D model comparison techniques. Background & Related Work Landmark Alignment A crucial part of most facial reconstruction projects is the ability to identify and locate key human facial landmarks in a portrait image (see Figure 1). In most cases, having too many landmarks requires inordinate overall execution time for extracting them, however, having few landmarks might make the overall accuracy or quality of data yield subpar results. Therefore, consideration must be undertaken with regards to the number of landmarks chosen to ensure that the best results possible are extracted. M. Caruana & J.G. Vella, ISIJ 47, no. 3 (2020): 328-340 330 Figure 1: Results of Landmark Alignment. To capture the information shown in Table 1, regression trees, as proposed by Kazemi and Sullivan, are applied to efficiently estimate the location of these facial landmarks. The regression tree takes the training set consisting of the images and their landmark positions to regress a model based on a set of hyperparameters to be able to align the same landmark collection to the input image. Table 1 describes these hyper-parameters; to produce an acceptable regression tree one must balance speed, accuracy and model size and this is done through these parameters. Active Shape Models are often used instead of regression trees for the alignment of facial landmarks. 3D Model Generation Throughout the various methods quoted in the literature, facial reconstruction projects make use of a number of 2D portrait images rather than a single image, hence making them even more computationally expensive processes for reconstructing a 3D model.; 11 3D morphable models as proposed by Blanz and Vetter, make use of facial landmarks identifiable on a 2D image and to then have them applied to a reference 3D model based on a morphable model. A 3D morphable model is a database of 3D surfaces, where each vertex within a model corresponds to the same vertex of all the other surfaces within the database. While the morphable model can be a database of any structure, i.e. not only human faces, it is generally aimed towards the morphing of 3D human faces. With this approach, each morphable model (i.e. Surrey Face Model and Basel Face Model) makes use of Principle Component Analysis to learn from the database of 3D surfaces that the morphable models are based on. For the deformations to be applied to the respective morphable model a mapping from the landmarks to their corresponding vertex in the reference 3D model is required. 3D Facial Reconstruction from 2D Portrait Imagery 331 Table 1: Hyper-parameter Descriptions for Regression Trees Hyper-Parameter Name Hyper-Parameter Description Tree Depth The total depth of the trees in each cascade. Regularisation By how much the algorithm will generalise on the training data. Cascade Depth The number of cascades that the model will be train with. Feature Pool Size The number of pixels that are used to generate features for the random trees. Test Split Amount The number of times that the training set is used during training. Oversampling Amount The amount of times the training data is augmented when training the regression tree. Oversampling Translation Jitter The amount of augmentation that is applied to the images in the dataset. Other approaches that are attributed for model generation are Shape from Shading (SFS) and the use of Convolutional Neural Networks (CNNs). Existing Systems A number of existing projects tackle facial reconstruction from 2D imagery. Currently, these systems heavily rely on the inclusion of a number of input images from different angles, which in most cases is difficult to have. Moreover, many rely on landmark alignment by using regression trees with the 68 landmark structure,; ; 17 meaning results are limited in showing the effects that the set of landmarks generated has on the overall accuracy of the model in relation to the input image. Establishing which implementation provides the better results is difficult due to the tests being reported using different datasets, images, morphable models, and evaluation techniques. Datasets In conjunction to the methodologies surrounding facial reconstruction, the choice of the dataset with regards to training and evaluation is an important one. The FaceWarehouse dataset, maintained by Chen et al. has 150 different individuals and with each individual having 20 distinct images showcasing different poses from a frontal angle. Supplementing the images, FaceWarehouse also provides 74 facial landmarks per image, and a 3D model blend shape for that person’s specific pose. The Celebrity Face Recognition dataset is another dataset. It is collated by Prateek Mehta and focuses on 1100 celebrities and M. Caruana & J.G. Vella, ISIJ 47, no. 3 (2020): 328-340 332 contains 800 thousand images of them. Both these datasets are available and have been extensively used here. Design & Implementation The system was developed using the Python development environment and availing of well-known packages. These include, OpenCV, dlib, and eos, that focus on image processing, landmark alignment, and 3D model morphing respectively. The system follows the structures and processes depicted in Figure 2. Is Database Set up? Start Any Shape Predictors? Load Input 2D Image Facial Detection on Input Image Yes

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research