TY - JOUR
T1 - Em-t2i: ensemble AI model for text-to-image synthesis using meta AI and microsoft copilot
AU - Faryad, Shagufta
AU - Shoukat, Ijaz Ali
AU - Ullah, Ubaid
AU - Khan, Javed Ali
AU - Faryad, Ayesha
AU - Rauf, Muhammad Arslan
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
PY - 2025/5/12
Y1 - 2025/5/12
N2 - In the monarchy of AI-powered image creation, single models often struggle with aligning rapid performance and accurate throughput, excelling in one aspect but deficient in overall effectiveness. So far, no ensemble AI model has pooled the strengths of multiple systems to enhance performance. This emphasizes the need for an ensemble AI model to overcome individual limitations and improve overall efficiency. Thus, we propose an Ensemble EM-T2I AI Model integrating Meta AI for agile text elucidation and initial image creation, coupled with Microsoft Copilot for refining visuals to increase clarity and fidelity. The proposed model’s performance is tested across several accuracy metrics: Object Detection Accuracy, Object Attribute Accuracy, Scene Understanding Accuracy, Image Quality Score, and Overall Precision Score. Employing diverse text prompts, we conducted a comprehensive assessment to measure the accuracy and overall quality of the image outputs. Microsoft Copilot achieves an OPS of 73.5 and is proficient in object detection, attribute identification, scene understanding, and overall image quality. Meta AI proved a rapid processing competency with mostly accurate results, albeit with intermittent clarity issues. The presented ensemble EM-T2I model represents a promising direction for advancing AI-powered image creation with proven robust performance accomplishing the foremost OPS of 75.58, signifying its efficacy in augmenting image quality and semantic alignment through diverse text prompts. Future endeavours will further refine these prototypes to come across growing strains in image synthesis and elucidation.
AB - In the monarchy of AI-powered image creation, single models often struggle with aligning rapid performance and accurate throughput, excelling in one aspect but deficient in overall effectiveness. So far, no ensemble AI model has pooled the strengths of multiple systems to enhance performance. This emphasizes the need for an ensemble AI model to overcome individual limitations and improve overall efficiency. Thus, we propose an Ensemble EM-T2I AI Model integrating Meta AI for agile text elucidation and initial image creation, coupled with Microsoft Copilot for refining visuals to increase clarity and fidelity. The proposed model’s performance is tested across several accuracy metrics: Object Detection Accuracy, Object Attribute Accuracy, Scene Understanding Accuracy, Image Quality Score, and Overall Precision Score. Employing diverse text prompts, we conducted a comprehensive assessment to measure the accuracy and overall quality of the image outputs. Microsoft Copilot achieves an OPS of 73.5 and is proficient in object detection, attribute identification, scene understanding, and overall image quality. Meta AI proved a rapid processing competency with mostly accurate results, albeit with intermittent clarity issues. The presented ensemble EM-T2I model represents a promising direction for advancing AI-powered image creation with proven robust performance accomplishing the foremost OPS of 75.58, signifying its efficacy in augmenting image quality and semantic alignment through diverse text prompts. Future endeavours will further refine these prototypes to come across growing strains in image synthesis and elucidation.
KW - Deep dream generator
KW - Ensemble model
KW - Meta AI Llama 3
KW - MS Copilot
KW - Text2Image
UR - http://www.scopus.com/inward/record.url?scp=105004731060&partnerID=8YFLogxK
U2 - 10.1007/s11760-025-04133-4
DO - 10.1007/s11760-025-04133-4
M3 - Article
AN - SCOPUS:105004731060
SN - 1863-1703
VL - 19
JO - Signal, Image and Video Processing
JF - Signal, Image and Video Processing
M1 - 569
ER -