OpenAI launched its new AI-powered online search engine — SearchGPT — with the aim of supplanting “for specific search tasks” Google, Microsoft Bing and start-up Perplexity.
But the move is also raising concerns that it could open the door to plagiarism; AI-powered search engines have been accused of intentionally or unintentionally plagiarizing web-based content because the platforms scrape material and data from all over the web in real-time.
They can also generate content that closely mimics pre-existing content, according to Alon Yamin, CEO of AI-enabled plagiarism detection platform Copyleaks. That’s because the large language model engines behind generative AI (genAI) are trained using existing content.
“The trouble with ‘unintentional plagiarism’ is that it creates a gray area that’s challenging for both content creators and search engines to navigate,” Yamin said.
SearchGPT is a front-facing interface built atop OpenAI’s genAI-based ChatGPT chatbot; it will enable real-time web access for up-to-date sports scores, stock information and news. The search engine will also allow follow-up questions in the same search window, and its answers will consider the full context of the previous chat to offer an applicable answer.
The AI-based web crawler is also being touted for its ability to allow questions in “a more natural,” conversational way, according to OpenAI.
OpenAI announced on Oct. 31 that it had launched the SearchGPT prototype after beta testing it since July. Currently, access to SearchGPT is limited, as a list of hopeful free users waits for access.
OpenAI
The pilot version of the search engine will be available at chatgpt.com/search as well as being offered as a desktop and mobile app. All ChatGPT Plus and Team users, as well as SearchGPT waitlist users, will have access from here on. Enterprise and education users will get access in the next few weeks, OpenAI said, with a “rollout to all free users over the coming months.”
One standout feature is the search engine’s ability to allow follow-up questions that build on the context of the original query.
For example, a user could ask what the best tomato plants are for your region; that could be followed up by asking about the best time to plant them.
SearchGPT is also designed to offer links to publishers of information by citing and linking to them in searches. “Responses have clear, in-line, named attribution and links so users know where information is coming from and can quickly engage with even more results in a sidebar with source links,” OpenAI said in its announcement.
Search rivals beat OpenAI to the punch
Last year, Google added its own AI-based capabilities to its search tool; so did Microsoft, which integrated OpenAI’s GPT-4 into Bing. “Big hitters like Google are already developing AI detection tools to help identify AI-generated content. But the challenge lies in distinguishing between high-quality AI-assisted content and low-quality, plagiarized material,” Yamin said. “It’s undoubtedly an ongoing process that will require constant refinement of algorithms and policies.”
For its part, Perplexity said in an updated FAQ that its web crawler, PerplexityBot, will not index the full or partial text content of any site that disallows it using robots.txt code. Robots.txt files are common simple text files stored on a web server to instruct web crawlers about which pages or sections of a website they are allowed to crawl and index.
“PerplexityBot only crawls content in compliance with robots.txt,” the FAQ explained. Perplexity also said it does not build “foundation models,” (also known as large language models), “so your content will not be used for AI model pre-training.”
The bottom line, Yamin said, is that search engines are in a “tricky position” as genAI evolves. “They want to provide the best results to users, which increasingly involves AI-generated or AI-enhanced content. At the same time, they need to protect original creators and maintain the integrity of search results. We’re seeing efforts to strike this balance, but it’s a complex issue that will take time to fully address.”
ChatGPT (i.e., SearchGPT) is probably best positioned among all competitors to upset Google’s dominance in online search, according to Damian Rollison, director of market insights at marketing software company SOCi. Of all the areas where ChatGPT competes with Google, search is where the latter’s 26-year advantage is the strongest.
“The early results of Bing search integrated into ChatGPT have been shaky, and the incredibly complex requirements of maintaining a world-class search platform tap into areas of expertise where OpenAI has yet to demonstrate its capabilities,” Rollison said.
Andy Thurai, a vice president analyst at Constellation Research, noted that Google still owns about 90% of the search engine market, meaning it won’t to be easy for anyone to encroach on that dominance.
OpenAI
But Thurai said SearchGPT’s ease of use and conversational interface, which provides synthesized and more prose-like answers instead of traditional search results like Google, could attract more users in the future.
While Google can provide a personalized search result based on location, and previous searches, it still has limitations in terms of offering concise and conversational-style answers that remain on point, according to Thurai. “The concise nature of the answers, whether accurate or not, might be appealing to some users versus combing through many page search engines like those Google returns.”
Ironically, when ChatGPT was asked the question: Is SearchGPT as good as Google search? ChatGPT’s reply was nuanced.
“Google is great for quickly finding specific, current resources and ChatGPT is better for having interactive conversations, asking detailed questions, or seeking explanations on a wide range of topics,” SearchGPT responded. “The two can actually complement each other depending on what you need!”
When asked whether it’s as good or better than Bing, ChatGPT replied: “In short, if you’re looking for real-time information or need to browse the web, Bing is likely better. If you need detailed, conversational, or creative assistance, ChatGPT tends to be more helpful. Each tool excels in different areas!”
The murky issue of plagiarism
Thurai said he’s unsure whether AI-based search engines or “answer engines” will invite plagiarism on their own.
“They are not all that different from Google search, in which you get many answers instead of the most relevant answer that AI thinks is relevant to your question,” he said. “However, AI for content creation is a big concern for plagiarism. What is more concerning is that the current plagiarism tools don’t catch AI-produced content correctly. They are mostly useless.”
There are, however, tools that can create digital watermark/credentials such as C2PA, which can provide some content provenance and/or authenticity mechanisms, Thurai noted.
He also argued that text-based content production via AI-search engines is virtually impossible to catch. And people are getting unfairly penalized for plagiarism by using AI when in reality they didn’t, he said.
“As AI tools become more sophisticated and part of our day-to-day lives, distinguishing between AI-generated and human-created content, properly attributing original sources or authors, and empowering overall originality becomes even more critical,” Copyleak’s Yamin said. “This is precisely where the focus needs to remain — providing robust content integrity solutions that are evolving alongside the demands of the AI landscape.”