How ChatGPT Has, and Will Continue to, Transform Scientific Research

Feinberg experts answer audience questions during a ChatGPT panel discussion hosted by I.AIM and IPHAM, moderated by Yuan Luo, PhD (not pictured). From left to right: Catherine Gao, MD; Mohammad Hosseini, PhD; Alexandre Carvalho, MD; Abel Kho, MD; David Liebovitz, MD; Kristi Holmes, PhD; Ngan MacDonald; and Faraz Ahmad, MD, MS. Courtesy of Melissa Rohman.

Within just a matter of months, ChatGPT — the AI-powered chatbot created by Silicon Valley startup OpenAI — has taken the world by storm with its easy-to-use and accessible interface (at the time this story is published, free and paid subscription versions of ChatGPT are available) and human-like responses that, until now, have remained unheard of with most AI tools.

On paper, ChatGPT (the GPT stands for “Generative Pre-trained Transformer”) is a large language model (LLM) built from OpenAI’s GPT-3 family of LLMs. It uses an advanced learning algorithm called a neural network to absorb large amounts of information and data to then generate a human-like text response to a user’s prompt.

In what’s seemed like a lighting round of recent developments, OpenAI has made ChatGPT and their other LLMs available to developers to integrate into their own apps and products, and on March 14 the startup released its successor to ChatGPT: GPT-4, a multimodal large language model, meaning it can respond to both text and images given by users.

Needless to say, ChatGPT’s quick rise into mainstream popularity has been nothing short of exciting, but has also ignited fierce competition among big tech companies, with many having launched their own AI chatbots in response, such as Bard from Google, and Microsoft’s new ChatGPT-powered Bing browser.

ChatGPT has also caught the attention of professionals across fields, including academia, healthcare and scientific research. In an anonymous Twitter poll administered to Feinberg faculty, staff and students on February 13, almost 50 percent of respondents said that ChatGPT could “prove useful” in their own work and lives.

Still, the potential of ChatGPT has also been met with concern. Shortly after OpenAI released ChatGPT to the public, investigators led by Catherine Gao, MD, instructor of Medicine in the Division of Pulmonary and Critical Care, sought to determine whether the chatbot could produce scientific abstracts just as good, or if not better, than ones written by humans.

In the study, the team gave blinded human reviewers a mix of real and ChatGPT-generated abstracts, and found that the reviewers could only identify the fake abstracts 68 percent of the time. The reviewers also incorrectly identified 14 percent of the real abstracts as being written by ChatGPT.

The findings, which were published in npj Digital Medicine, show how ChatGPT can successfully produce realistic and convincing scientific abstracts.

“Even though the reviewers found 68 percent of the fake abstracts, that’s not very good differentiation, despite knowing they were being given generated abstracts and were being so skeptical,” Gao said.

Benefits, Risks and Room for Improvement

Like any new technology, ChatGPT is far from perfect and has much room for improvement. While it provides users with very confident-sounding answers within seconds, where it sources information isn’t currently disclosed to the user.

Fact-checked or not, the chatbot uses web-scraped data to generate its responses and prompts, which could increase the risk of spreading misinformation and promote bias, according to Yuan Luo, PhD, associate professor of Preventive Medicine and of Pediatrics, and chief AI officer for the Northwestern Clinical and Translational Sciences (NUCATS) Institute and the Institute for Artificial Intelligence in Medicine (I.AIM).

“I think the biggest disadvantage is this authoritative appearance without substantiation,” Luo said. “If you are not familiar with certain content, you might be led to believe whatever is written, which could be entirely false, and this has implications regarding the spread of misinformation.”

Despite the potential risks, Luo believes the tool can also be used for good. For example, it could help non-native English speakers write grammatically correct scientific abstracts — more than 95 percent of all scientific abstracts are written in English. In healthcare, physicians could use ChatGPT to compose patient notes more efficiently, potentially reducing burnout. However, using it to diagnose disease and recommend treatments is still questionable.

“If you think about the whole process of industrialization, it keeps automating and standardizing human jobs so that we can focus on ourselves with more higher-level activity. Once you internalize those fundamental things into your own muscle memory, I think this integration can free up a lot of human brain power to focus on the next level of exploration,” Luo said.

Feinberg’s Institute for Artificial Intelligence in Medicine (I.AIM) and Institute for Public Health and Medicine (IPHAM) hosted a panel discussion about ChatGPT for Feinberg faculty, staff and students on February 16. Courtesy of Melissa Rohman.

In January, the World Association of Medical Editors published its recommendations in response to the use of ChatGPT and other chatbots in research publications. Many high-impact journals have also followed suit by releasing their own statements about using the tool in research, some requiring investigators to disclose the use of ChatGPT in their work, but prohibit listing the chatbot as an author, and others banning the tool altogether.

According to Mohammad Hosseini, PhD, a postdoctoral scholar in the Department of Preventive Medicine’s Division of Health and Biomedical Informatics who is based at Northwestern’s Galter Health Sciences Library, banning the tool is not only controversial but also unenforceable.

“The way that we work together right now is a result of decades, if not centuries, of trial and error in academia. We have tried so many things, and this is the best thing we have come up with. Now we have this new entity that is challenging every single aspect with it,” Hosseini said.

When using ChatGPT in research, regulations that require transparency, accountability and disclosure must be top priority, according to Hosseini, who is the author of a recent editorial that suggests the following ethical guidelines for using LLMs such as ChatGPT in research:

Content generated by LLMs should be checked by a domain expert;
In the instance of errors of biases, co-authors should be held accountable;
Investigators should disclose the use of LLMs and indicate text written or co-written by LLMs;
When content of a publication is impacted, even in the absence of using AI-generated text, the following should be disclosed;
And investigators should not use LLMs to fabricate or falsify data.

Hosseini’s editorial was cited in JAMA in another editorial, which argues that the responsible use of AI language models and transparent reporting can help maintain the integrity of scientific research and trust in medical knowledge.

“The aim is to ensure that disclosure is happening and people who use these systems are as transparent, providing as many details as possible,” Hosseini said. “The recommendations we’ve provided, they’re just the beginning.”

Getting Ahead of the Game

Having discussions right now about ChatGPT can help inform new regulations that also ensure the tool remains both accessible and equitable, said Abel Kho, MD, director of I.AIM and the Institute for Public Health and Medicine (IPHAM)’s Center for Health Information Partnerships.

“If you look at the way technology is distributed in society today, it’s not equal,” Kho said. “One of the risks of technology advancement is that novel technology tends to be driven by people and institutions with the most resources. This can contribute to an environment where people without the means, or who may not be seen as having the same consumer value, can be marginalized.”

In addition to access, it’s also important to determine how ChatGPT will impact larger ecosystems complicated by issues related to information literacy, said Kristi Holmes, PhD, professor of Preventive Medicine, director of Northwestern’s Galter Health Sciences Library and chief of Knowledge Management for I.AIM.

“We need to carefully understand how people find, evaluate and make use of information. Whether we consider students, researchers, or members of the public, we must thoughtfully and thoroughly investigate how this kind of technology can intersect with their work and the way they’re living their lives,” Holmes said.

Recently, I.AIM and the Institute for Public Health in Medicine (IPHAM) hosted an open panel discussion inviting Feinberg faculty, staff and students to discuss ChatGPT with Northwestern experts. The panelists, which included Kho, Gao, Luo, Hosseini, Holmes and others, acknowledged the community’s shared hesitation surrounding ChatGPT while also highlighting potential benefits of the tool and, ultimately, how ChatGPT democratized the use of AI and why that’s a good thing.

A few weeks later, I.AIM hosted another panel discussion titled, “Navigating the Legal Landscape of AI in Medicine,” which included medical and legal experts to discuss clinical and ethical perspectives and potential direction for future regulation of tools like ChatGPT.

“It has brought a lot of attention to the potential for machine learning and artificial intelligence methods,” Kho said. “The most important thing right now is to engage as many different types of parties as we can in these discussions so that we can get ahead of it and project what the implications are and what policies or protections are necessary to put in place so we can have widespread and equitable impact.”