Nowadays, it is hard to find a part of human life that Artificial Intelligence (AI) has not been involved in. With the recent advances in AI, the change for chatbots has been an ‘evolution’ instead of a ‘revolution’. AI-powered chatbots have become an integral part of customer services as they are as functional as humans (if not more), and they can provide 24/7 service (unlike humans). There are several publicly available, widely used AI-powered chatbots. So, “Which one is better?” is a question that instinctively comes to mind and needs to be shed light on. Motivated by the question, an experimental comparison of two widely used AI-powered chatbots, namely ChatGPT and Bard, was proposed in this study. For a quantitative comparison, (i) a gold standard QA dataset, which comprised 2.390 questions from 109 topics, was used, and (ii) a novel answer-scoring algorithm was proposed. The covered chatbots were evaluated using the proposed algorithm on the dataset to reveal their (i) generated answer length, and (ii) generated answer accuracy. According to the experimental results, (i) Bard generated lengthy answers compared to ChatGPT, and (ii) Bard provided answers more similar to the ground truth compared to ChatGPT.
Nowadays, it is hard to find a part of human life that Artificial Intelligence (AI) has not been involved in. With the recent advances in AI, the change for chatbots has been an ‘evolution’ instead of a ‘revolution’. AI-powered chatbots have become an integral part of customer services as they are as functional as humans (if not more), and they can provide 24/7 service (unlike humans). There are several publicly available, widely used AI-powered chatbots. So, “Which one is better?” is a question that instinctively comes to mind and needs to shed light on. Motivated by the question, an experimental comparison of two widely used AI-powered chatbots, namely ChatGPT and Bard, was proposed in this study. For a quantitative comparison, (i) a gold standard QA dataset, which comprised 2,390 questions from 109 topics, was used and (ii) a novel answer-scoring algorithm based on cosine similarity was proposed. The covered chatbots were evaluated using the proposed algorithm on the dataset to reveal their (i) generated answer length and (ii) generated answer accuracy. According to the experimental results, (i) Bard generated lengthy answers compared to ChatGPT and (ii) Bard provided answers more similar to the ground truth compared to ChatGPT.
Primary Language | English |
---|---|
Subjects | Information Systems (Other) |
Journal Section | Articles |
Authors | |
Early Pub Date | June 30, 2024 |
Publication Date | June 30, 2024 |
Submission Date | November 13, 2023 |
Acceptance Date | March 6, 2024 |
Published in Issue | Year 2024 Volume: 16 Issue: 2 |
All Rights Reserved. Kırıkkale University, Faculty of Engineering and Natural Science.