닫기

Ex) Article Title, Author, Keywords

Original Article

Split Viewer

J Cosmet Med 2023; 7(2): 60-65

Published online December 31, 2023

https://doi.org/10.25056/JCM.2023.7.2.60

Artificial-intelligence-automated machine learning as a tool for evaluating facial rhytid images

Alejandro Espaillat , MD

South Florida Eye Institute, Fort Lauderdale, FL, USA

Correspondence to :
Alejandro Espaillat
E-mail: drespaillat@evnmiami.com

Received: October 12, 2023; Revised: November 30, 2023; Accepted: December 6, 2023

© Korean Society of Korean Cosmetic Surgery & Medicine

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background: The growing demand for nonsurgical cosmetic treatments necessitates a reliable diagnostic tool to assess the extent of aging, severity of facial wrinkles, and effectiveness of minimally invasive aesthetic procedures. This is crucial to accurately predict the need for botulinum neurotoxin type A neuromodulator injections during facial aesthetic rejuvenation.
Objective: This study aimed to determine the accuracy of artificial intelligence-based machine learning algorithms in analyzing facial rhytid images during facial aesthetic evaluation.
Methods: A prospective validation model was implemented using a dataset of 3,000 de-identified facial rhytid images from 600 patients in a community private medical spa aesthetic screening program. A neural architecture based on Google Cloud’s artificial intelligence-automated machine learning was developed to detect dynamic hyperkinetic skin lines in various facial muscles. Images were captured using a handheld iPad camera and labeled by an American board-certified ophthalmologist using established quantitative grading scales. The dataset was divided into training (80%), validation (10%), and testing (10%) sets. The model’s performance was evaluated using the following metrics: area under the precision–recall curve, sensitivity, specificity, precision, and accuracy.
Results: Facial rhytid images were detected in 79.9%, 10.7%, and 9.3% of the training sets, respectively. The model achieved an area under the precision–recall curve of 0.943, with an accuracy of 91.667% and a recall of 81.881% at a threshold score of 0.5.
Conclusion: This study demonstrates the successful application of artificial-intelligence-automated machine learning in identifying facial rhytid images captured using simple photographic devices in a community-based private medical spa program. Thus, the potential value of machine-learning algorithms for evaluating the need for minimally invasive injectable procedures for facial aesthetic rejuvenation was established.

Keywords: artificial intelligence, BoNT/A neuromodulator injections, facial aesthetic rejuvenation, facial rhytid images, machine-learning algorithms

The global aesthetic market has witnessed remarkable growth in recent decades, with an evident increase in the annual number of aesthetic procedures performed [1]. According to a survey by the American Society of Plastic Surgeons, the top 5 minimally invasive cosmetic procedures sought by patients in 2021–2022, ranked in the order of priority, were botulinum neurotoxin type A (BoNT/A), soft tissue fillers, non-invasive fat reduction, non-surgical skin tightening, and skin care treatments [1]. However, since the onset of the coronavirus disease-2019 pandemic, the demand for cosmetic procedures has increased to unprecedented levels. The same survey revealed that over 76% of cosmetic surgeons reported a surge in patients seeking these procedures. This was mainly attributed to individuals choosing to expend their travel budget on cosmetic enhancements rather than traveling. Moreover, several patients were willing to pay higher fees to enhance their self-esteem and confidence because they aspired to regain their sense of normalcy following the pandemic. Since the year 2000, procedures involving BoNT/A and soft tissue dermal fillers have increased by 459% and 174%, respectively, with an estimated total expenditure of over 2 billion dollars, particularly in 2020 [2].

Facial wrinkles, known as rhytids, are considered the most noticeable signs of facial skin aging. However, these wrinkles can appear for many other reasons [3]. These wrinkles are attributable to repeated contractions of the underlying facial muscles, resulting in thinning of the outer skin layer and the formation of creases. BoNT/A injections are commonly used to chemically relax specific facial muscles to rejuvenate the face and combat the signs of aging. Selective chemodenervation of these muscles provides a targeted treatment of dynamic wrinkles, thus offering a minimally invasive approach for facial rejuvenation [4]. The growing demand for non-surgical cosmetic treatments has led to the introduction of validated and non-validated scales to assess the extent of aging, severity of facial wrinkles, and effectiveness of minimally invasive aesthetic procedures [5,6].

Artificial intelligence (AI) is a rapidly advancing field of computer science that enables computers to learn from large data sets. This study demonstrated a significant breakthrough in expanding the application of AI in oculoplastic surgery, contrary to the current applications that are limited to radiological studies and medical records [7]. In addition, this study demonstrated the potential of an AI-based machine learning model as a reliable tool for evaluating facial rhytid images and accurately predicting the need for BoNT/A neuromodulator injections during facial aesthetic rejuvenation.

This study aimed to develop and validate an AI-automated machine-learning (AI-AutoML) model for classifying dynamic wrinkles using handheld images. We used 3,000 facial images of 600 patients obtained from a privately owned medical spa in South Florida, United States. The images were de-identified for the study. All the images were captured using an iPad Pro (OS V.16.5) (Apple Inc., Cupertino, CA, USA). The study adhered to the ethical standards outlined in the 1964 Declaration of Helsinki and the institutional review board approved the study protocol in compliance with the Health Insurance Portability and Accountability Act of 1996 (IRB no. #1119).

Wrinkle severity grading: facial imaging protocol

A facial image database was developed using previously validated grading scales for everyday clinical use [8-11]. These scales were designed to assess the outcomes of BoNT/A treatment in clinical trials and consist of a 5-point rating system. An American board-certified ophthalmologist identified anatomical changes caused by aging and incorporated them into the photographs. These images accurately displayed the steps of variation in the desired aspect. Both static and dynamic images were recorded to capture facial lines caused by different muscle group movements. However, only dynamic photographs were included in the study. Dynamic wrinkles were graded on a scale of 0 to 4 as follows: 0 (none), 1 (minimal), 2 (mild), 3 (moderate), and 4 (severe). This study focused on dynamic wrinkles graded 3 or 4 in which the expression of facial lines was maximized, thus improving the training, validation, and testing of the proposed AI-AutoML model dataset (Fig. 1).

Fig. 1.Dynamic wrinkle images graded from 3 (moderate) to 4 (severe). (A) Corrugator lines; (B) orbicularis oris lines; (C) mentalis lines; (D) orbicularis oculi lines; (E) nasalis lines; (F) frontalis lines; (G) procerus lines.

AI-AutoML

Fig. 2 presents a flowchart outlining the segmentation process applied to the initial facial photograph dataset comprising 3,000 images. Initially, the images were classified into the following three categories: ungradable photos (112; 3.73%), pictures rated between 0 and 2 (2,446; 81.53%), and photos rated between 3 and 4 (442; 14.73%). Ungradable images and those rated between 0 and 2 were excluded from further analysis, and the remaining pictures rated as 3 or 4 were assessed using the AI-AutoML platform to evaluate the image quality. The image quality assessment feature is an integral component of the AI-AutoML platform and includes checks for significant blurring and contrast problems. Consequently, 153 images were deemed unsuitable and rejected.

Fig. 2.Dataset splits for artificial intelligence-automated machine learning (AI-AutoML) using a handheld iPad (Apple Inc., Cupertino, CA, USA) camera. The mechanical AI model assessment evaluated the image quality for blue, noise, artifacts, contrast, and distortion, and 153 images were deemed unsuitable to be included in the model. A total of 289 images were considered suitable by the AI-AutoML platform and utilized for the model’s training, validation, and testing.

A total of 289 images were deemed suitable by the AI-AutoML platform and used for model testing, validation, and training. The selected photographs were uploaded to Google Cloud’s Vertex AI-AutoML platform (Alphabet Inc., Mountain View, CA, USA) to construct and deploy the models. The platform recommends 8 node hours for this process. The images were randomly divided into the following 3 sets: training (231; 80%), validation (31; 10%), and testing (27; 10%). AI-AutoML efficiently managed data pre-processing and selected the best network structure and parameters to accurately detect patterns in the training subset. The algorithm was fine-tuned using the validation set and the performance and effectiveness of the model were evaluated using the testing set.

Statistical analyses

The AI-AutoML platform provided a comprehensive statistical analysis of the model’s performance using the testing set. The AI model is a binary classifier that predicts categories based on the assigned probabilities of each image. This algorithm employs confidence thresholds that represent the minimum confidence levels required to classify images into specific categories. To assess the certainty of the model, a confidence score was assigned to each prediction, indicating that the predicted label was correct. The confidence threshold returned positive predictions if the confidence score was more significant than or equal to the selected value.

A higher confidence threshold increases precision but lowers recall, and vice versa. To determine the overall accuracy of the model, the area under the precision–recall curve (AUPRC) was computed by evaluating the testing set at confidence levels ranging from 0.0 to 1.0. A higher value indicated a higher quality model. A confidence threshold of 0.5 was applied in this study. The AUPRC demonstrates a trade-off between precision and recall when considering varying confidence thresholds. A lower threshold tends to yield higher recall but lower precision, whereas a higher threshold often results in lower recall but higher accuracy. Each point on the curve represents a pair of precision–recall values at different confidence thresholds. The precision–recall metrics assess the performance of the model for the highest-ranking label across various confidence threshold values.

It is difficult to determine the correct training sample size for machine learning models applied to medical imaging data, which studies have shown to be unknown most of the time [12]. Vertex AI-AutoML suggests using at least 100 images per category for an ideal training set; however, a minimum of 27 images in most classes can still be effectively used to train the model used in this study and accurately identify each label.

The distribution of facial images in the training set was as follows: 16.2%, 20.4%, 12.4%, 19.7%, 9.3%, 16.2%, and 7.9% for the corrugator, orbicularis oris, mentalis, orbicularis oculi, nasalis, frontalis, and procerus lines, respectively. The model achieved an AUPRC of 0.943, with a precision of 91.7% and recall of 81.5% when using a confidence threshold of 0.5 (Fig. 3). An increase in the confidence threshold resulted in a decrease in recall and an increase in precision. When evaluating the 10% testing set, the confusion matrix showed that all images with dynamic lines from the corrugator, orbicularis oris, mentalis, orbicularis oculi, and nasalis muscles were accurately categorized (100% true positives and an average precision of 0.999). However, images with dynamic lines from the frontalis and procerus muscles had favorable accuracy rates of 75% and 50%, respectively, and false-negative rates of 25% and 50%, with an average precision of 0.804 and 0.417, respectively (Table 1).

Table 1 . Model dataset percentages of precision, recall, distribution, number of images, and confidence threshold

Average precisionPrecision (%)Recall (%)Distribution (%)Images dataset
CL0.9991008016.247
OorL0.99910010020.459
ML0.99910066.712.436
OocL0.99983.310019.757
NL0.9991001009.327
FL0.804757516.247
PL0.417100507.923
CT 0.25-83.3392.59--
CT 0.5-91.681.48--
CT 0.75-95.070.37--

CL, corrugator lines; OorL, orbicularis oris lines; ML, mentalis lines; OocL, orbicularis oculi lines; NL, nasalis lines; FL, frontalis lines; PL, procerus lines; CT, confidence threshold; -, not available.



Fig. 3.Overall accuracy of the model, confidence threshold, precision, recall, and area under the precision–recall curve (AUPRC). (A) Corrugator lines; (B) orbicularis oris lines; (C) mentalis lines; (D) orbicularis oculi lines; (E) nasalis lines; (F) frontalis lines; (G) procerus lines; (H) average AUPRC.

Accurate diagnosis and understanding of the likelihood of an illness or changes in the physical body rely on data collection and analysis. Healthcare providers learn from past diagnostic patterns and train on real cases in supervised clinical settings to obtain accurate results. In AI, machine learning uses artificial neural networks, which resemble biological neural networks, as computational models. These networks reveal complex structures and patterns in large datasets, such as medical images [13]. These networks can be fine-tuned and evolve based on experience. Consequently, they can be used as practical tools for pattern recognition, classification, and prediction.

Ophthalmologists, dermatologists, and radiologists use AI-AutoML algorithms to perform clinical classification tasks at a level comparable to that of traditional learning models for detecting diabetic retinopathy, skin cancer, and pneumonia [14-17]. AI-AutoML has proven to be a valuable tool for plastic surgeons; it aids in preoperative and postoperative decision-making processes, determines burn depth, and estimates healing time with an impressive accuracy rate of 86% [18]. This technology is particularly well suited for oculoplastic, aesthetic medicine, and plastic surgery research with an abundance of unstructured (unlabeled) visual data from easily accessible technologies [19]. A neural network developed using a sizable dataset of 18,000 images before and after rhinoplasty, exhibited an accuracy rate of 85% in classifying the rhinoplasty status of the tested photos, matching the sensitivity and specificity levels of ear, nose, and throat and plastic surgery professional, including residents and attendees [19]. The findings of this study demonstrate the effectiveness of a code-free AI-AutoML learning model as a reliable diagnostic tool for evaluating facial images and determining the need for BoNT/A neuromodulator injections for facial rejuvenation. The model achieved an impressive AUPRC value of 0.943.

Furthermore, with a confidence threshold of 0.5, the model exhibited a precision rate of 91.7% and a recall rate of 81.5%. These results emphasize the importance of AI applications and their potential to positively impact the quality of life and medical care of patients in specific clinical areas, particularly in procedures such as smoothing hyperkinetic lines and shaping the face through minimally invasive aesthetic procedures, such as injecting BoNT/A neuromodulators for facial rejuvenation. It is important to emphasize that the objective of this technology is to optimize the patient–physician relationship rather than replacing it.

Study limitations, future research, and ethics

This study has certain limitations that must be acknowledged. First, a limited number of images were available for the training, validation, and testing of our AI-AutoML model because of the significant cost associated with private funding of the Google Vertex AI public platform. In addition, a similar open-source and prospectively collected validation database was not available for comparison. Furthermore, initial photograding was performed by the same ophthalmologist who developed the model, which may have introduced some bias.

It is important to highlight that AI is a rapidly evolving technology lacking standardized ethical regulations. Numerous ethical considerations must be addressed to ensure the correct implementation of AI, including machine training ethics, data ownership, data protection, AI cybersecurity, machine accuracy, data transparency, image and research biases, and patient-related ethics. Unfortunately, in many cases, gold standards for these areas are not yet available. The focus of the ongoing research is to address these ethical questions and provide further insights into the implementation of AI in ophthalmology and medicine.

  1. American Society of Plastic Surgeons (ASPS). Inaugural ASPS insights and trends report: cosmetic surgery 2022. Arlington Heights (IL): ASPS; 2022.
  2. American Society of Plastic Surgeons (ASPS). Plastic surgery statistics report: ASPS national clearinghouse of plastic surgery procedural statistics. Arlington Heights (IL): ASPS; 2020.
  3. Fisher GJ, Kang S, Varani J, Bata-Csorgo Z, Wan Y, Datta S, et al. Mechanisms of photoaging and chronological skin aging. Arch Dermatol 2002;138:1462-70.
    Pubmed CrossRef
  4. Small R. Botulinum toxin injection for facial wrinkles. Am Fam Physician 2014;90:168-75.
    Pubmed
  5. Carruthers A, Carruthers J. A validated facial grading scale: the future of facial ageing measurement tools? J Cosmet Laser Ther 2010;12:235-41.
    Pubmed CrossRef
  6. Honeck P, Weiss C, Sterry W, Rzany B; Gladys Study Group. Reproducibility of a four-point clinical severity score for glabellar frown lines. Br J Dermatol 2003;149:306-10.
    Pubmed CrossRef
  7. Choi HI, Jung SK, Baek SH, Lim WH, Ahn SJ, Yang IH, et al. Artificial intelligent model with neural network machine learning for the diagnosis of orthognathic surgery. J Craniofac Surg 2019;30:1986-9. Erratum in: J Craniofac Surg 2020;31:1156.
    Pubmed CrossRef
  8. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer R, Jones D, et al. A validated brow positioning grading scale. Dermatol Surg 2008;34 Suppl 2:S150-4.
    Pubmed CrossRef
  9. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer R, Jones D, et al. A validated grading scale for forehead lines. Dermatol Surg 2008;34 Suppl 2:S155-60.
    Pubmed CrossRef
  10. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer R, Jones D, et al. A validated grading scale for marionette lines. Dermatol Surg 2008;34 Suppl 2:S167-72.
    Pubmed CrossRef
  11. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer R, Jones D, et al. A validated grading scale for crow's feet. Dermatol Surg 2008;34 Suppl 2:S173-8.
    CrossRef
  12. Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, et al. Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J 2019;70:344-53.
    Pubmed CrossRef
  13. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-44.
    Pubmed CrossRef
  14. Faes L, Wagner SK, Fu DJ, Liu X, Korot E, Ledsam JR, et al. Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study. Lancet Digit Health 2019;1:e232-42.
    Pubmed CrossRef
  15. Jacoba CMP, Doan D, Salongcay RP, Aquino LAC, Silva JPY, Salva CMG, et al. Performance of automated machine learning for diabetic retinopathy image classification from multi-field handheld retinal images. Ophthalmol Retina 2023;7:703-12.
    Pubmed CrossRef
  16. O'Byrne C, Abbas A, Korot E, Keane PA. Automated deep learning in ophthalmology: AI that can build AI. Curr Opin Ophthalmol 2021;32:406-12.
    Pubmed CrossRef
  17. Haenssle HA, Fink C, Schneiderbauer R, Toberer F, Buhl T, Blum A, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol 2018;29:1836-42.
    Pubmed CrossRef
  18. Yeong EK, Hsiao TC, Chiang HK, Lin CW. Prediction of burn healing time using artificial neural networks and reflectance spectrometer. Burns 2005;31:415-20.
    Pubmed CrossRef
  19. Borsting E, DeSimone R, Ascha M, Ascha M. Applied deep learning in plastic surgery: classifying rhinoplasty with a mobile app. J Craniofac Surg 2020;31:102-6.
    Pubmed CrossRef

Article

Original Article

J Cosmet Med 2023; 7(2): 60-65

Published online December 31, 2023 https://doi.org/10.25056/JCM.2023.7.2.60

Copyright © Korean Society of Korean Cosmetic Surgery & Medicine.

Artificial-intelligence-automated machine learning as a tool for evaluating facial rhytid images

Alejandro Espaillat , MD

South Florida Eye Institute, Fort Lauderdale, FL, USA

Correspondence to:Alejandro Espaillat
E-mail: drespaillat@evnmiami.com

Received: October 12, 2023; Revised: November 30, 2023; Accepted: December 6, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background: The growing demand for nonsurgical cosmetic treatments necessitates a reliable diagnostic tool to assess the extent of aging, severity of facial wrinkles, and effectiveness of minimally invasive aesthetic procedures. This is crucial to accurately predict the need for botulinum neurotoxin type A neuromodulator injections during facial aesthetic rejuvenation.
Objective: This study aimed to determine the accuracy of artificial intelligence-based machine learning algorithms in analyzing facial rhytid images during facial aesthetic evaluation.
Methods: A prospective validation model was implemented using a dataset of 3,000 de-identified facial rhytid images from 600 patients in a community private medical spa aesthetic screening program. A neural architecture based on Google Cloud’s artificial intelligence-automated machine learning was developed to detect dynamic hyperkinetic skin lines in various facial muscles. Images were captured using a handheld iPad camera and labeled by an American board-certified ophthalmologist using established quantitative grading scales. The dataset was divided into training (80%), validation (10%), and testing (10%) sets. The model’s performance was evaluated using the following metrics: area under the precision–recall curve, sensitivity, specificity, precision, and accuracy.
Results: Facial rhytid images were detected in 79.9%, 10.7%, and 9.3% of the training sets, respectively. The model achieved an area under the precision–recall curve of 0.943, with an accuracy of 91.667% and a recall of 81.881% at a threshold score of 0.5.
Conclusion: This study demonstrates the successful application of artificial-intelligence-automated machine learning in identifying facial rhytid images captured using simple photographic devices in a community-based private medical spa program. Thus, the potential value of machine-learning algorithms for evaluating the need for minimally invasive injectable procedures for facial aesthetic rejuvenation was established.

Keywords: artificial intelligence, BoNT/A neuromodulator injections, facial aesthetic rejuvenation, facial rhytid images, machine-learning algorithms

Introduction

The global aesthetic market has witnessed remarkable growth in recent decades, with an evident increase in the annual number of aesthetic procedures performed [1]. According to a survey by the American Society of Plastic Surgeons, the top 5 minimally invasive cosmetic procedures sought by patients in 2021–2022, ranked in the order of priority, were botulinum neurotoxin type A (BoNT/A), soft tissue fillers, non-invasive fat reduction, non-surgical skin tightening, and skin care treatments [1]. However, since the onset of the coronavirus disease-2019 pandemic, the demand for cosmetic procedures has increased to unprecedented levels. The same survey revealed that over 76% of cosmetic surgeons reported a surge in patients seeking these procedures. This was mainly attributed to individuals choosing to expend their travel budget on cosmetic enhancements rather than traveling. Moreover, several patients were willing to pay higher fees to enhance their self-esteem and confidence because they aspired to regain their sense of normalcy following the pandemic. Since the year 2000, procedures involving BoNT/A and soft tissue dermal fillers have increased by 459% and 174%, respectively, with an estimated total expenditure of over 2 billion dollars, particularly in 2020 [2].

Facial wrinkles, known as rhytids, are considered the most noticeable signs of facial skin aging. However, these wrinkles can appear for many other reasons [3]. These wrinkles are attributable to repeated contractions of the underlying facial muscles, resulting in thinning of the outer skin layer and the formation of creases. BoNT/A injections are commonly used to chemically relax specific facial muscles to rejuvenate the face and combat the signs of aging. Selective chemodenervation of these muscles provides a targeted treatment of dynamic wrinkles, thus offering a minimally invasive approach for facial rejuvenation [4]. The growing demand for non-surgical cosmetic treatments has led to the introduction of validated and non-validated scales to assess the extent of aging, severity of facial wrinkles, and effectiveness of minimally invasive aesthetic procedures [5,6].

Artificial intelligence (AI) is a rapidly advancing field of computer science that enables computers to learn from large data sets. This study demonstrated a significant breakthrough in expanding the application of AI in oculoplastic surgery, contrary to the current applications that are limited to radiological studies and medical records [7]. In addition, this study demonstrated the potential of an AI-based machine learning model as a reliable tool for evaluating facial rhytid images and accurately predicting the need for BoNT/A neuromodulator injections during facial aesthetic rejuvenation.

Materials and methods

This study aimed to develop and validate an AI-automated machine-learning (AI-AutoML) model for classifying dynamic wrinkles using handheld images. We used 3,000 facial images of 600 patients obtained from a privately owned medical spa in South Florida, United States. The images were de-identified for the study. All the images were captured using an iPad Pro (OS V.16.5) (Apple Inc., Cupertino, CA, USA). The study adhered to the ethical standards outlined in the 1964 Declaration of Helsinki and the institutional review board approved the study protocol in compliance with the Health Insurance Portability and Accountability Act of 1996 (IRB no. #1119).

Wrinkle severity grading: facial imaging protocol

A facial image database was developed using previously validated grading scales for everyday clinical use [8-11]. These scales were designed to assess the outcomes of BoNT/A treatment in clinical trials and consist of a 5-point rating system. An American board-certified ophthalmologist identified anatomical changes caused by aging and incorporated them into the photographs. These images accurately displayed the steps of variation in the desired aspect. Both static and dynamic images were recorded to capture facial lines caused by different muscle group movements. However, only dynamic photographs were included in the study. Dynamic wrinkles were graded on a scale of 0 to 4 as follows: 0 (none), 1 (minimal), 2 (mild), 3 (moderate), and 4 (severe). This study focused on dynamic wrinkles graded 3 or 4 in which the expression of facial lines was maximized, thus improving the training, validation, and testing of the proposed AI-AutoML model dataset (Fig. 1).

Figure 1. Dynamic wrinkle images graded from 3 (moderate) to 4 (severe). (A) Corrugator lines; (B) orbicularis oris lines; (C) mentalis lines; (D) orbicularis oculi lines; (E) nasalis lines; (F) frontalis lines; (G) procerus lines.

AI-AutoML

Fig. 2 presents a flowchart outlining the segmentation process applied to the initial facial photograph dataset comprising 3,000 images. Initially, the images were classified into the following three categories: ungradable photos (112; 3.73%), pictures rated between 0 and 2 (2,446; 81.53%), and photos rated between 3 and 4 (442; 14.73%). Ungradable images and those rated between 0 and 2 were excluded from further analysis, and the remaining pictures rated as 3 or 4 were assessed using the AI-AutoML platform to evaluate the image quality. The image quality assessment feature is an integral component of the AI-AutoML platform and includes checks for significant blurring and contrast problems. Consequently, 153 images were deemed unsuitable and rejected.

Figure 2. Dataset splits for artificial intelligence-automated machine learning (AI-AutoML) using a handheld iPad (Apple Inc., Cupertino, CA, USA) camera. The mechanical AI model assessment evaluated the image quality for blue, noise, artifacts, contrast, and distortion, and 153 images were deemed unsuitable to be included in the model. A total of 289 images were considered suitable by the AI-AutoML platform and utilized for the model’s training, validation, and testing.

A total of 289 images were deemed suitable by the AI-AutoML platform and used for model testing, validation, and training. The selected photographs were uploaded to Google Cloud’s Vertex AI-AutoML platform (Alphabet Inc., Mountain View, CA, USA) to construct and deploy the models. The platform recommends 8 node hours for this process. The images were randomly divided into the following 3 sets: training (231; 80%), validation (31; 10%), and testing (27; 10%). AI-AutoML efficiently managed data pre-processing and selected the best network structure and parameters to accurately detect patterns in the training subset. The algorithm was fine-tuned using the validation set and the performance and effectiveness of the model were evaluated using the testing set.

Statistical analyses

The AI-AutoML platform provided a comprehensive statistical analysis of the model’s performance using the testing set. The AI model is a binary classifier that predicts categories based on the assigned probabilities of each image. This algorithm employs confidence thresholds that represent the minimum confidence levels required to classify images into specific categories. To assess the certainty of the model, a confidence score was assigned to each prediction, indicating that the predicted label was correct. The confidence threshold returned positive predictions if the confidence score was more significant than or equal to the selected value.

A higher confidence threshold increases precision but lowers recall, and vice versa. To determine the overall accuracy of the model, the area under the precision–recall curve (AUPRC) was computed by evaluating the testing set at confidence levels ranging from 0.0 to 1.0. A higher value indicated a higher quality model. A confidence threshold of 0.5 was applied in this study. The AUPRC demonstrates a trade-off between precision and recall when considering varying confidence thresholds. A lower threshold tends to yield higher recall but lower precision, whereas a higher threshold often results in lower recall but higher accuracy. Each point on the curve represents a pair of precision–recall values at different confidence thresholds. The precision–recall metrics assess the performance of the model for the highest-ranking label across various confidence threshold values.

It is difficult to determine the correct training sample size for machine learning models applied to medical imaging data, which studies have shown to be unknown most of the time [12]. Vertex AI-AutoML suggests using at least 100 images per category for an ideal training set; however, a minimum of 27 images in most classes can still be effectively used to train the model used in this study and accurately identify each label.

Results

The distribution of facial images in the training set was as follows: 16.2%, 20.4%, 12.4%, 19.7%, 9.3%, 16.2%, and 7.9% for the corrugator, orbicularis oris, mentalis, orbicularis oculi, nasalis, frontalis, and procerus lines, respectively. The model achieved an AUPRC of 0.943, with a precision of 91.7% and recall of 81.5% when using a confidence threshold of 0.5 (Fig. 3). An increase in the confidence threshold resulted in a decrease in recall and an increase in precision. When evaluating the 10% testing set, the confusion matrix showed that all images with dynamic lines from the corrugator, orbicularis oris, mentalis, orbicularis oculi, and nasalis muscles were accurately categorized (100% true positives and an average precision of 0.999). However, images with dynamic lines from the frontalis and procerus muscles had favorable accuracy rates of 75% and 50%, respectively, and false-negative rates of 25% and 50%, with an average precision of 0.804 and 0.417, respectively (Table 1).

Table 1 . Model dataset percentages of precision, recall, distribution, number of images, and confidence threshold.

Average precisionPrecision (%)Recall (%)Distribution (%)Images dataset
CL0.9991008016.247
OorL0.99910010020.459
ML0.99910066.712.436
OocL0.99983.310019.757
NL0.9991001009.327
FL0.804757516.247
PL0.417100507.923
CT 0.25-83.3392.59--
CT 0.5-91.681.48--
CT 0.75-95.070.37--

CL, corrugator lines; OorL, orbicularis oris lines; ML, mentalis lines; OocL, orbicularis oculi lines; NL, nasalis lines; FL, frontalis lines; PL, procerus lines; CT, confidence threshold; -, not available..



Figure 3. Overall accuracy of the model, confidence threshold, precision, recall, and area under the precision–recall curve (AUPRC). (A) Corrugator lines; (B) orbicularis oris lines; (C) mentalis lines; (D) orbicularis oculi lines; (E) nasalis lines; (F) frontalis lines; (G) procerus lines; (H) average AUPRC.

Discussion

Accurate diagnosis and understanding of the likelihood of an illness or changes in the physical body rely on data collection and analysis. Healthcare providers learn from past diagnostic patterns and train on real cases in supervised clinical settings to obtain accurate results. In AI, machine learning uses artificial neural networks, which resemble biological neural networks, as computational models. These networks reveal complex structures and patterns in large datasets, such as medical images [13]. These networks can be fine-tuned and evolve based on experience. Consequently, they can be used as practical tools for pattern recognition, classification, and prediction.

Ophthalmologists, dermatologists, and radiologists use AI-AutoML algorithms to perform clinical classification tasks at a level comparable to that of traditional learning models for detecting diabetic retinopathy, skin cancer, and pneumonia [14-17]. AI-AutoML has proven to be a valuable tool for plastic surgeons; it aids in preoperative and postoperative decision-making processes, determines burn depth, and estimates healing time with an impressive accuracy rate of 86% [18]. This technology is particularly well suited for oculoplastic, aesthetic medicine, and plastic surgery research with an abundance of unstructured (unlabeled) visual data from easily accessible technologies [19]. A neural network developed using a sizable dataset of 18,000 images before and after rhinoplasty, exhibited an accuracy rate of 85% in classifying the rhinoplasty status of the tested photos, matching the sensitivity and specificity levels of ear, nose, and throat and plastic surgery professional, including residents and attendees [19]. The findings of this study demonstrate the effectiveness of a code-free AI-AutoML learning model as a reliable diagnostic tool for evaluating facial images and determining the need for BoNT/A neuromodulator injections for facial rejuvenation. The model achieved an impressive AUPRC value of 0.943.

Furthermore, with a confidence threshold of 0.5, the model exhibited a precision rate of 91.7% and a recall rate of 81.5%. These results emphasize the importance of AI applications and their potential to positively impact the quality of life and medical care of patients in specific clinical areas, particularly in procedures such as smoothing hyperkinetic lines and shaping the face through minimally invasive aesthetic procedures, such as injecting BoNT/A neuromodulators for facial rejuvenation. It is important to emphasize that the objective of this technology is to optimize the patient–physician relationship rather than replacing it.

Study limitations, future research, and ethics

This study has certain limitations that must be acknowledged. First, a limited number of images were available for the training, validation, and testing of our AI-AutoML model because of the significant cost associated with private funding of the Google Vertex AI public platform. In addition, a similar open-source and prospectively collected validation database was not available for comparison. Furthermore, initial photograding was performed by the same ophthalmologist who developed the model, which may have introduced some bias.

It is important to highlight that AI is a rapidly evolving technology lacking standardized ethical regulations. Numerous ethical considerations must be addressed to ensure the correct implementation of AI, including machine training ethics, data ownership, data protection, AI cybersecurity, machine accuracy, data transparency, image and research biases, and patient-related ethics. Unfortunately, in many cases, gold standards for these areas are not yet available. The focus of the ongoing research is to address these ethical questions and provide further insights into the implementation of AI in ophthalmology and medicine.

Conflicts of interest

Alphabet Inc; stock owner.

Fig 1.

Figure 1.Dynamic wrinkle images graded from 3 (moderate) to 4 (severe). (A) Corrugator lines; (B) orbicularis oris lines; (C) mentalis lines; (D) orbicularis oculi lines; (E) nasalis lines; (F) frontalis lines; (G) procerus lines.
Journal of Cosmetic Medicine 2023; 7: 60-65https://doi.org/10.25056/JCM.2023.7.2.60

Fig 2.

Figure 2.Dataset splits for artificial intelligence-automated machine learning (AI-AutoML) using a handheld iPad (Apple Inc., Cupertino, CA, USA) camera. The mechanical AI model assessment evaluated the image quality for blue, noise, artifacts, contrast, and distortion, and 153 images were deemed unsuitable to be included in the model. A total of 289 images were considered suitable by the AI-AutoML platform and utilized for the model’s training, validation, and testing.
Journal of Cosmetic Medicine 2023; 7: 60-65https://doi.org/10.25056/JCM.2023.7.2.60

Fig 3.

Figure 3.Overall accuracy of the model, confidence threshold, precision, recall, and area under the precision–recall curve (AUPRC). (A) Corrugator lines; (B) orbicularis oris lines; (C) mentalis lines; (D) orbicularis oculi lines; (E) nasalis lines; (F) frontalis lines; (G) procerus lines; (H) average AUPRC.
Journal of Cosmetic Medicine 2023; 7: 60-65https://doi.org/10.25056/JCM.2023.7.2.60

Table 1 . Model dataset percentages of precision, recall, distribution, number of images, and confidence threshold.

Average precisionPrecision (%)Recall (%)Distribution (%)Images dataset
CL0.9991008016.247
OorL0.99910010020.459
ML0.99910066.712.436
OocL0.99983.310019.757
NL0.9991001009.327
FL0.804757516.247
PL0.417100507.923
CT 0.25-83.3392.59--
CT 0.5-91.681.48--
CT 0.75-95.070.37--

CL, corrugator lines; OorL, orbicularis oris lines; ML, mentalis lines; OocL, orbicularis oculi lines; NL, nasalis lines; FL, frontalis lines; PL, procerus lines; CT, confidence threshold; -, not available..


References

  1. American Society of Plastic Surgeons (ASPS). Inaugural ASPS insights and trends report: cosmetic surgery 2022. Arlington Heights (IL): ASPS; 2022.
  2. American Society of Plastic Surgeons (ASPS). Plastic surgery statistics report: ASPS national clearinghouse of plastic surgery procedural statistics. Arlington Heights (IL): ASPS; 2020.
  3. Fisher GJ, Kang S, Varani J, Bata-Csorgo Z, Wan Y, Datta S, et al. Mechanisms of photoaging and chronological skin aging. Arch Dermatol 2002;138:1462-70.
    Pubmed CrossRef
  4. Small R. Botulinum toxin injection for facial wrinkles. Am Fam Physician 2014;90:168-75.
    Pubmed
  5. Carruthers A, Carruthers J. A validated facial grading scale: the future of facial ageing measurement tools? J Cosmet Laser Ther 2010;12:235-41.
    Pubmed CrossRef
  6. Honeck P, Weiss C, Sterry W, Rzany B; Gladys Study Group. Reproducibility of a four-point clinical severity score for glabellar frown lines. Br J Dermatol 2003;149:306-10.
    Pubmed CrossRef
  7. Choi HI, Jung SK, Baek SH, Lim WH, Ahn SJ, Yang IH, et al. Artificial intelligent model with neural network machine learning for the diagnosis of orthognathic surgery. J Craniofac Surg 2019;30:1986-9. Erratum in: J Craniofac Surg 2020;31:1156.
    Pubmed CrossRef
  8. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer R, Jones D, et al. A validated brow positioning grading scale. Dermatol Surg 2008;34 Suppl 2:S150-4.
    Pubmed CrossRef
  9. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer R, Jones D, et al. A validated grading scale for forehead lines. Dermatol Surg 2008;34 Suppl 2:S155-60.
    Pubmed CrossRef
  10. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer R, Jones D, et al. A validated grading scale for marionette lines. Dermatol Surg 2008;34 Suppl 2:S167-72.
    Pubmed CrossRef
  11. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer R, Jones D, et al. A validated grading scale for crow's feet. Dermatol Surg 2008;34 Suppl 2:S173-8.
    CrossRef
  12. Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, et al. Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J 2019;70:344-53.
    Pubmed CrossRef
  13. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-44.
    Pubmed CrossRef
  14. Faes L, Wagner SK, Fu DJ, Liu X, Korot E, Ledsam JR, et al. Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study. Lancet Digit Health 2019;1:e232-42.
    Pubmed CrossRef
  15. Jacoba CMP, Doan D, Salongcay RP, Aquino LAC, Silva JPY, Salva CMG, et al. Performance of automated machine learning for diabetic retinopathy image classification from multi-field handheld retinal images. Ophthalmol Retina 2023;7:703-12.
    Pubmed CrossRef
  16. O'Byrne C, Abbas A, Korot E, Keane PA. Automated deep learning in ophthalmology: AI that can build AI. Curr Opin Ophthalmol 2021;32:406-12.
    Pubmed CrossRef
  17. Haenssle HA, Fink C, Schneiderbauer R, Toberer F, Buhl T, Blum A, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol 2018;29:1836-42.
    Pubmed CrossRef
  18. Yeong EK, Hsiao TC, Chiang HK, Lin CW. Prediction of burn healing time using artificial neural networks and reflectance spectrometer. Burns 2005;31:415-20.
    Pubmed CrossRef
  19. Borsting E, DeSimone R, Ascha M, Ascha M. Applied deep learning in plastic surgery: classifying rhinoplasty with a mobile app. J Craniofac Surg 2020;31:102-6.
    Pubmed CrossRef

Stats or Metrics

Share this article on :

  • line

Related articles in JCM

Most KeyWord ?

What is Most Keyword?

  • It is most registrated keyword in articles at this journal during for 2 years.

Journal of Cosmetic Medicine

eISSN 2586-0585
pISSN 2508-8831
qr-code Download