Are ChatGPT's Answers Defensible? Recent Research Identifies AI Chatbot's Reasoning Gaps

Since late 2022, ChatGPT has been gaining popularity due to its ability to respond to inquiries that are eerily similar to those provided by humans. A recent study conducted by Ohio State University has revealed a vulnerability in the reasoning powers of the chatbot, which has resulted in the chatbot being investigated.

During the research, large language models (LLMs), such as ChatGPT, were tested by participating in debate-style conversations in which users questioned the chatbot’s correct responses.

Reasoning of ChatGPT

After completing studies on various reasoning puzzles, such as mathematics, common sense, and logic, the research discovered a striking deficiency in ChatGPT’s capacity to defend its accurate opinions when confronted with challenges.

Instead of vehemently defending its accurate results, the model frequently gave in to the fallacious arguments that the users supplied, and it even occasionally apologized for the fact that it had first provided the correct answer incorrectly.

Boshi Wang, the primary author of the study and a PhD student in computer science and engineering at Ohio State, emphasized the significance of determining whether the reasoning capabilities of these generative AI tools are founded on a profound comprehension of the truth or whether they rely on patterns that they have learned to arrive at accurate conclusions.

For the study presented at the 2023 Conference on Empirical Methods in Natural Language Processing in Singapore this week, one ChatGPT was used to simulate a user challenging another ChatGPT.

The objective was to arrive at the proper conclusion through a joint effort, simulating how a human agent might interact with the model. The results were unexpected, with ChatGPT being deceived by users anywhere from 22 per cent to 70 per cent of the time across various benchmarks.

Although versions of ChatGPT, such as GPT-4, have shown lower failure rates, the research highlighted that models, despite their increasingly sophisticated reasoning capabilities, are not entirely foolproof. The failure rates were high even when ChatGPT professed confidence in its answers, suggesting a systemic issue rather than a simple lack of assurance.

According to the study’s findings, there are particular worries regarding the dependability of artificial intelligence models such as ChatGPT, particularly as these models become more ubiquitous and vital in various domains such as healthcare and criminal justice.

Co-author of the paper Xiang Yue noted the possible risks associated with depending on artificial intelligence models that are susceptible to being easily fooled. Because artificial intelligence systems play increasingly important roles in decision-making processes, ensuring their safety is paramount.

Does ChatGPT Lack Reasoning and Truth Understanding?

According to the study’s findings, the model’s difficulty defending itself can be linked to several variables. These aspects include that the base model does not have reasoning and truth comprehension and the influence of subsequent alignment based on human feedback.

It is possible that the training process, designed to produce responses that humans prefer, could mistakenly cause the model to surrender to opposing ideas more easily without maintaining a commitment to the truth.

The study acknowledged the issues discovered in artificial intelligence models such as ChatGPT and highlighted the difficulty of locating effective remedies because substantial language models are black boxes.

To mitigate the possible hazards of the broad usage of artificial intelligence systems, the researchers argue for continued efforts to improve the safety and reliability of these systems.

“This problem could potentially become very severe, and we could just be overestimating these models’ capabilities in really dealing with complex reasoning tasks,” according to Wang.

That we can locate and recognize its issues does not change the fact that we do not yet have perfect ideas about addressing them. He continued, saying, “There will be strategies, but it will take some time to arrive at those solutions.”