Author: Security – Computerworld

With SearchGPT, could OpenAI rewrite online search rules — and invite plagiarism?

OpenAI launched its new AI-powered online search engine — SearchGPT — with the aim of supplanting “for specific search tasks” Google, Microsoft Bing and start-up Perplexity.

But the move is also raising concerns that it could open the door to plagiarism; AI-powered search engines have been accused of intentionally or unintentionally plagiarizing web-based content because the platforms scrape material and data from all over the web in real-time.

They can also generate content that closely mimics pre-existing content, according to Alon Yamin, CEO of AI-enabled plagiarism detection platform Copyleaks. That’s because the large language model engines behind generative AI (genAI) are trained using existing content.

“The trouble with ‘unintentional plagiarism’ is that it creates a gray area that’s challenging for both content creators and search engines to navigate,” Yamin said.

SearchGPT is a front-facing interface built atop OpenAI’s genAI-based ChatGPT chatbot; it will enable real-time web access for up-to-date sports scores, stock information and news. The search engine will also allow follow-up questions in the same search window, and its answers will consider the full context of the previous chat to offer an applicable answer.

The AI-based web crawler is also being touted for its ability to allow questions in “a more natural,” conversational way, according to OpenAI.

OpenAI announced on Oct. 31 that it had launched the SearchGPT prototype after beta testing it since July. Currently, access to SearchGPT is limited, as a list of hopeful free users waits for access.

artificial intelligence

An example of a search result from SearchGPT.

OpenAI

The pilot version of the search engine will be available at chatgpt.com/search as well as being offered as a desktop and mobile app. All ChatGPT Plus and Team users, as well as SearchGPT waitlist users, will have access from here on. Enterprise and education users will get access in the next few weeks, OpenAI said, with a “rollout to all free users over the coming months.”

One standout feature is the search engine’s ability to allow follow-up questions that build on the context of the original query.

For example, a user could ask what the best tomato plants are for your region; that could be followed up by asking about the best time to plant them.

SearchGPT is also designed to offer links to publishers of information by citing and linking to them in searches. “Responses have clear, in-line, named attribution and links so users know where information is coming from and can quickly engage with even more results in a sidebar with source links,” OpenAI said in its announcement.

Search rivals beat OpenAI to the punch

Last year, Google added its own AI-based capabilities to its search tool; so did Microsoft, which integrated OpenAI’s GPT-4 into Bing. “Big hitters like Google are already developing AI detection tools to help identify AI-generated content. But the challenge lies in distinguishing between high-quality AI-assisted content and low-quality, plagiarized material,” Yamin said. “It’s undoubtedly an ongoing process that will require constant refinement of algorithms and policies.”

For its part, Perplexity said in an updated FAQ that its web crawler, PerplexityBot, will not index the full or partial text content of any site that disallows it using robots.txt code. Robots.txt files are common simple text files stored on a web server to instruct web crawlers about which pages or sections of a website they are allowed to crawl and index.

“PerplexityBot only crawls content in compliance with robots.txt,” the FAQ explained. Perplexity also said it does not build “foundation models,” (also known as large language models), “so your content will not be used for AI model pre-training.”

The bottom line, Yamin said, is that search engines are in a “tricky position” as genAI evolves. “They want to provide the best results to users, which increasingly involves AI-generated or AI-enhanced content. At the same time, they need to protect original creators and maintain the integrity of search results. We’re seeing efforts to strike this balance, but it’s a complex issue that will take time to fully address.”

ChatGPT (i.e., SearchGPT) is probably best positioned among all competitors to upset Google’s dominance in online search, according to Damian Rollison, director of market insights at marketing software company SOCi. Of all the areas where ChatGPT competes with Google, search is where the latter’s 26-year advantage is the strongest.

“The early results of Bing search integrated into ChatGPT have been shaky, and the incredibly complex requirements of maintaining a world-class search platform tap into areas of expertise where OpenAI has yet to demonstrate its capabilities,” Rollison said.

Andy Thurai, a vice president analyst at Constellation Research, noted that Google still owns about 90% of the search engine market, meaning it won’t to be easy for anyone to encroach on that dominance.

Follow-on question SearchGPT

An example of a follow-on question in SearchGPT that began with asking: \”What are the best tomatos for my region?\”

OpenAI

But Thurai said SearchGPT’s ease of use and conversational interface, which provides synthesized and more prose-like answers instead of traditional search results like Google, could attract more users in the future.

While Google can provide a personalized search result based on location, and previous searches, it still has limitations in terms of offering concise and conversational-style answers that remain on point, according to Thurai. “The concise nature of the answers, whether accurate or not, might be appealing to some users versus combing through many page search engines like those Google returns.”

Ironically, when ChatGPT was asked the question: Is SearchGPT as good as Google search? ChatGPT’s reply was nuanced.

“Google is great for quickly finding specific, current resources and ChatGPT is better for having interactive conversations, asking detailed questions, or seeking explanations on a wide range of topics,” SearchGPT responded. “The two can actually complement each other depending on what you need!”

When asked whether it’s as good or better than Bing, ChatGPT replied: “In short, if you’re looking for real-time information or need to browse the web, Bing is likely better. If you need detailed, conversational, or creative assistance, ChatGPT tends to be more helpful. Each tool excels in different areas!”

The murky issue of plagiarism

Thurai said he’s unsure whether AI-based search engines or “answer engines” will invite plagiarism on their own.

“They are not all that different from Google search, in which you get many answers instead of the most relevant answer that AI thinks is relevant to your question,” he said. “However, AI for content creation is a big concern for plagiarism. What is more concerning is that the current plagiarism tools don’t catch AI-produced content correctly. They are mostly useless.”

There are, however, tools that can create digital watermark/credentials such as C2PA, which can provide some content provenance and/or authenticity mechanisms, Thurai noted.

He also argued that text-based content production via AI-search engines is virtually impossible to catch. And people are getting unfairly penalized for plagiarism by using AI when in reality they didn’t, he said.

“As AI tools become more sophisticated and part of our day-to-day lives, distinguishing between AI-generated and human-created content, properly attributing original sources or authors, and empowering overall originality becomes even more critical,” Copyleak’s Yamin said. “This is precisely where the focus needs to remain — providing robust content integrity solutions that are evolving alongside the demands of the AI landscape.”

For November, Patch Tuesday includes three Windows zero-day fixes

Microsoft’s November Patch Tuesday release addresses 89 vulnerabilities in Windows, SQL Server, .NET and Microsoft Office — and three zero-day vulnerabilities (CVE-2024-43451, CVE-2024-49019 and CVE-2024-49039) that mean a patch now recommendation for Windows platforms. Unusually, there are a significant number of patch “re-releases” that might also require administrator attention. 

The team at Readiness has provided this infographic outlining the risks associated with each of the updates for this cycle.  (For a rundown of recent Patch Tuesday updates, see Computerworld‘s round-up here.

Known issues 

There were a few reported issues for the September update that have been addressed now, including:

  • Enterprise customers are reporting issues with the SSH service failing to start on updated Windows 11 24H2 machines. Microsoft recommended updating the file/directory level permissions on the SSH program directories (remember to include the log files). You can read more about this official workaround here

It looks like we are entering a new age of ARM compatibility challenges for Microsoft. However, before we get ahead of ourselves, we really need to sort out the (three-month old) Roblox issue.

Major revisions 

This Patch Tuesday includes the following major revisions: 

  • CVE-2013-390: WinVerifyTrust Signature Validation Vulnerability. This update was originally published in 2013 via TechNet. This update is now made available and is applicable to Windows 10 and 11 users due to a recent change in the EnableCertPaddingCheck Windows API call. We highly recommend a review of this CVE and its associated Q&A documentation. Remember: if you must set your values in the registry, ensure that they are type DWORD not Reg SZ.
  • CVE-2024-49040: Microsoft Exchange Server Spoofing Vulnerability. When Microsoft updates a CVE (twice) in the same week, and the vulnerability has been publicly disclosed, it’s time to pay attention. Before you apply this Exchange Server update, we highly recommend a review of the reportedheader detection issues and mitigating factors.

And unusually, we have three kernel mode updates (CVE-2024-43511, CVE-2024-43516 and CVE-2024-43528 that were re-released in October and updated this month.  These security vulnerabilities exploit a race condition in Microsoft’s Virtualization Based Security (VBS). It’s worth a review of the mitigating strategies while you thoroughly test these low-level kernel patches. 

Testing guidance

Each month, the Readiness team analyzes the latest Patch Tuesday updates and provides detailed, actionable testing guidance based on a large application portfolio and a detailed analysis of the patches and their potential impact on Windows platforms and application installations.

For this release cycle, we have grouped the critical updates and required testing efforts into separate product and functional areas including:

Networking

  • Test end-to-end VPN, Wi-Fi, sharing and Bluetooth scenarios. 
  • Test out HTTP clients over SSL.
  • Ensure internet shortcut files (ICS) display correctly

Security/crypto

  • After installing the November update on your Certificate Authority (CA) servers, ensure that enrollment and renewal of certificates perform as expected.
  • Test Windows Defender Application Control (WDAC) and ensure that line-of-business apps are not blocked. Ensure that WDAC functions as expected on your Virtual Machines (VM).

Filesystem and logging:

  • The NTFileCopyChunk API was updated and will require internal application testing if directly employed. Test the validity of your parameters and issues relating to directory notification.

I cannot claim to have any nostalgia for dial-up internet access (though I do have a certain Pavlovian response to the dial-up handshake sound). For those who are still using this approach to access the internet, the November update to the TAPI API has you in mind. A “quick” (haha) test is required to ensure you can still connect to the internet via dial-up once you update your system.

Windows lifecycle and enforcement updates

There were no product or security enforcements this cycle. However, we do have the following Microsoft products reaching their respective end of servicing terms:

  • Oct. 8, 2024: Windows 11 Enterprise and Education, Version 21H2, Windows 11 Home and Pro, Version 22H2, Windows 11 IoT Enterprise, Version 21H2.
  • Oct. 9, 2024: Microsoft Project 2024 (LTSC)

Mitigations and workaround

Microsoft published the following mitigations applicable to this Patch Tuesday.

  • CVE-2024-49019: Active Directory Certificate Services Elevation of Privilege Vulnerability. As this vulnerability has been publicly disclosed, we need to take it seriously. Microsoft has offered some mitigation strategies during the update/testing/deployment for most enterprises that include:
  • Remove overly broad enroll or auto-enroll permissions.
  • Remove unused templates from certification authorities.
  • Secure templates that allow you to specify the subject in the request.

As most enterprises employ Microsoft Active Directory, we highly recommend a review of this knowledge note from Microsoft. 

Each month, we break down the update cycle into product families (as defined by Microsoft) with the following basic groupings: 

  • Browsers (Microsoft IE and Edge);
  • Microsoft Windows (both desktop and server); 
  • Microsoft Office;
  • Microsoft Exchange Server;
  • Microsoft Development platforms (ASP.NET Core, .NET Core and Chakra Core);
  • Adobe (if you get this far).

Browsers 

Microsoft released a single update specific to Microsoft Edge (CVE-2024-49025), and two updates for the Chromium engine that underpins the browser (CVE-2024-10826 and CVE-2024-10827). There’s a brief note on the browser update here. We recommend adding these low-profile browser updates to your standard release schedule.

Windows 

Microsoft released two (CVE-2024-43625 and CVE-2024-43639) patches with a critical rating and another 35 patches rated as important by Microsoft. This month the following key Windows features have been updated:

  • Windows Update Stack (note: installer rollbacks may be an issue);
  • NT OS, Secure Kernel and GDI;
  • Microsoft Hyper-V;
  • Networking, SMB and DNS;
  • Windows Kerberos.

Unfortunately, these Windows updates have been publicly disclosed or reported as exploited in the wild, making them zero-day problems:

  • CVE-2024-43451: NTLM Hash Disclosure Spoofing Vulnerability.
  • CVE-2024-49019: Active Directory Certificate Services Elevation of Privilege.
  • CVE-2024-49039: Windows Task Scheduler Elevation of Privilege Vulnerability.

Add these Windows updates to your Patch Now release cadence. 

Microsoft Office 

Microsoft pushed out six Microsoft Office updates (all rated important) that affect SharePoint, Word and Excel. None of these reported vulnerabilities involve remote access or preview pane issues and have not been publicly disclosed or exploited in the wild. Add these updates to your standard release schedule.

Microsoft SQL (nee Exchange) Server 

You want updates to Microsoft SQL Server? We got ‘em: 31 patches to the SQL Server Native client this month. That’s a lot of patches, even for a complex product like Microsoft SQL Server. These updates appear to be the result of a major clean-up effort from Microsoft addressing the following reported security vulnerabilities:

The vast majority of these SQL Server Native Client updates address the CWE-122 related buffer overflow issues. Note: these patches update the SQL Native client, so this is a desktop, not a server, update. Crafting a testing profile for this one is a tough call. No new features have been added, and no high-risk areas have been patched. However, many internal line-of-business applications rely on these SQL client features. We recommend that your core business applications be tested before this SQL update, otherwise add it to your standard release schedule. 

Boot note: Remember that there is a major revision to CVE-2024-49040 — this could affect the SQL Server “server” side of things.

Microsoft development platforms

Microsoft released one critical-rated update (CVE-2024-43498) and three updates rated as important for Microsoft .NET 9 and Visual Studio 2022. These are pretty low-risk security vulnerabilities and very specific to these versions of the development platforms. They should present a reduced testing profile. Add these updates to your standard developer schedule this month.

Adobe Reader (and other third-party updates)

Microsoft did not publish any Adobe Reader-related updates this month. The company  released three non-Microsoft CVEs covering Google Chrome and SSH (CVE-2024-5535). Given the update to Windows Defender (as a result of the SSH issue), Microsoft also published a list of Defender vulnerabilities and weaknesses that might assist with your deployments.  

The EU seeks proposals for AI that should be banned

The EU, which is now developing guidelines for how the region’s new AI law must be complied with, has started collecting opinions in two areas via an online survey.

The first area involves how the law should define AI systems (compared to traditional software). Here, the EU wants to hear from people in the AI ​​industry, companies, academics and civil society. The second area concerns when the use of AI should be prohibited. The EU wants detailed feedback on each prohibited use and is particularly interested in practical examples.

Points will be collected using the survey until Dec. 11, and the European Commission expects to publish guidelines regarding the definition of AI systems and any prohibited uses in early 2025.

Google’s Gemini app is now available on iPhones

Google has entered a new and more intense phase of the AI wars, introducing its own Google Gemini app for iPhones; now you can use Apple Intelligence, ChatGPT, Microsoft Copilot and Google Gemini on one device.

Only one of those services tries to give you what you need without gathering too much information about you

What is Gemini?

Like most Google services, Google Gemini seems free, in that you don’t need to part with any cash credits to use it. Open it up, and you’ll find a chat window that also lets you get to a list of your previous chats. Speaking to Gemini is simple — text, voice, or even use a camera to point at something and you’ll get some answers. In other words, the app integrates the same features as you’ll find on the Gemini website, but it’s an app so that makes it cool. 

Probably. 

There is one more thing — access to the more conversational Gemini Live bot, which works a little like ChatGPT in voice mode. You can even assign access to Gemini as a shortcut on your iPhone’s Action button for fast access to the bot, which can also access and control any Google apps you’re brave enough to install on your iPhone.

All about Google

And that’s the thing, really. Like so much coming out of Silicon Valley now, Google Gemini is self-referencing. 

You use Google on your iPhone to speak to a Google AI and access Google services, which gives you a more Android-like experience if you happen to have migrated to iOS from Android. You can use Gemini on your iPhone to control YouTube Music, for example, and you’ll get Google Maps if you ask for directions. 

You even get supplementary privacy agreements for all those apps, some of which deliver exactly what you expect from Google the ads sales company, which is probably a little different than the privacy-first Apple experience you thought you were using. Gemini does put some protection in place, but your location data, feedback, and usage information can be reviewed by humans.

Most people won’t know this. Most people don’t read privacy agreements before accepting them. They should – but they are long, boring, and archaically written for a reason.

AI tribalism

If art reflects life and tech is indeed the new creativity, then the emergence of these equal but different digital tribes reflects the deeper tribalism that seems to be impacting every other part of life. Is that a good thing? Perhaps that depends on which state you live in.

At the end of days, Gemini on iPhone is your gateway to Google world, just as Windows takes you to Microsoft planet and Apple takes you to its own distorted reality, (subject to the EU). There are other tech worlds too, but this isn’t intended to be a definitive list of differing digital existences, especially now that these altered states have become both cloud- and service-based. It’s a battle playing out on every platform and on every device.

After all, if your primary computing experience becomes text- and voice-based, and the processors handling your requests are in the cloud, then it matters less which platform you use, as long as you get something you need. (It’s only later we’ll find that we get slightly less than what we need, with the difference between the two being the profit margin.)

Apple’s approach is to support those external services while building up its own AI suite with its own unique — and, if you ask me, vitally necessary — selling point around privacy. Others follow a different path, but it’s hard to ignore that control of your computational experience is the root of all these ambitions.

King of the hill

With its early mover advantage, OpenAI is not blind to the battle. Just this week it introduced support for different applications across Windows and Mac desktops. In a Nov. 14 message on X (for whomever remains genuinely active there), Open AI announced: “ChatGPT for macOS can now work with apps on your desktop. In this early beta for Plus and Team users, you can let ChatGPT look at coding apps to provide better answers.” 

That means it will try to help when working in applications such as VS Code, Xcode, and Terminal. While you work, you can speak with the bot, get screenshots, share files and more. There is, of course, also a ChatGPT app for iPhones, and the first comparative reviews of the experience of using both Gemini and ChatGPT on an Apple device show pros and cons to both. Downstream vendors, most recently including Jamf, are relying on tools provided by the larger vendors to add useful tools to their own.

Google and OpenAI are not alone. Just last month, Microsoft introduced Copilot Vision, which it describes as autonomous agents capable of handling tasks and business functions, so you don’t need to. Apple, of course, remains high on its recent introduction of Apple Intelligence

Things will get better before becoming worse

It’s a clash of the tech titans. And like every clash of the tech titans so far this century, you — or your business — are the product the titans are fighting for. That raises other questions such as how will they monetize your experience of AI.

How high will energy prices climb as a direct result of the spiraling electricity demands of these services? At what point will AI eat itself, creating emails from spoken summaries that are then in turn summarized by AI? When it comes to security and privacy, is even sovereign AI truly secure enough for use in regulated enterprise? Just how secure are Apple’s own AI servers?

And once the dominant players in the New AI Empire finally emerge, how, just how, will they do what Big Tech always does and follow Doctorow’s orders

You can follow me on social media! You’ll find me on BlueSky,  LinkedInMastodon, and MeWe

Google’s Gemini app is now available on iPhones

Google has entered a new and more intense phase of the AI wars, introducing its own Google Gemini app for iPhones; now you can use Apple Intelligence, ChatGPT, Microsoft Copilot and Google Gemini on one device.

Only one of those services tries to give you what you need without gathering too much information about you

What is Gemini?

Like most Google services, Google Gemini seems free, in that you don’t need to part with any cash credits to use it. Open it up, and you’ll find a chat window that also lets you get to a list of your previous chats. Speaking to Gemini is simple — text, voice, or even use a camera to point at something and you’ll get some answers. In other words, the app integrates the same features as you’ll find on the Gemini website, but it’s an app so that makes it cool. 

Probably. 

There is one more thing — access to the more conversational Gemini Live bot, which works a little like ChatGPT in voice mode. You can even assign access to Gemini as a shortcut on your iPhone’s Action button for fast access to the bot, which can also access and control any Google apps you’re brave enough to install on your iPhone.

All about Google

And that’s the thing, really. Like so much coming out of Silicon Valley now, Google Gemini is self-referencing. 

You use Google on your iPhone to speak to a Google AI and access Google services, which gives you a more Android-like experience if you happen to have migrated to iOS from Android. You can use Gemini on your iPhone to control YouTube Music, for example, and you’ll get Google Maps if you ask for directions. 

You even get supplementary privacy agreements for all those apps, some of which deliver exactly what you expect from Google the ads sales company, which is probably a little different than the privacy-first Apple experience you thought you were using. Gemini does put some protection in place, but your location data, feedback, and usage information can be reviewed by humans.

Most people won’t know this. Most people don’t read privacy agreements before accepting them. They should – but they are long, boring, and archaically written for a reason.

AI tribalism

If art reflects life and tech is indeed the new creativity, then the emergence of these equal but different digital tribes reflects the deeper tribalism that seems to be impacting every other part of life. Is that a good thing? Perhaps that depends on which state you live in.

At the end of days, Gemini on iPhone is your gateway to Google world, just as Windows takes you to Microsoft planet and Apple takes you to its own distorted reality, (subject to the EU). There are other tech worlds too, but this isn’t intended to be a definitive list of differing digital existences, especially now that these altered states have become both cloud- and service-based. It’s a battle playing out on every platform and on every device.

After all, if your primary computing experience becomes text- and voice-based, and the processors handling your requests are in the cloud, then it matters less which platform you use, as long as you get something you need. (It’s only later we’ll find that we get slightly less than what we need, with the difference between the two being the profit margin.)

Apple’s approach is to support those external services while building up its own AI suite with its own unique — and, if you ask me, vitally necessary — selling point around privacy. Others follow a different path, but it’s hard to ignore that control of your computational experience is the root of all these ambitions.

King of the hill

With its early mover advantage, OpenAI is not blind to the battle. Just this week it introduced support for different applications across Windows and Mac desktops. In a Nov. 14 message on X (for whomever remains genuinely active there), Open AI announced: “ChatGPT for macOS can now work with apps on your desktop. In this early beta for Plus and Team users, you can let ChatGPT look at coding apps to provide better answers.” 

That means it will try to help when working in applications such as VS Code, Xcode, and Terminal. While you work, you can speak with the bot, get screenshots, share files and more. There is, of course, also a ChatGPT app for iPhones, and the first comparative reviews of the experience of using both Gemini and ChatGPT on an Apple device show pros and cons to both. Downstream vendors, most recently including Jamf, are relying on tools provided by the larger vendors to add useful tools to their own.

Google and OpenAI are not alone. Just last month, Microsoft introduced Copilot Vision, which it describes as autonomous agents capable of handling tasks and business functions, so you don’t need to. Apple, of course, remains high on its recent introduction of Apple Intelligence

Things will get better before becoming worse

It’s a clash of the tech titans. And like every clash of the tech titans so far this century, you — or your business — are the product the titans are fighting for. That raises other questions such as how will they monetize your experience of AI.

How high will energy prices climb as a direct result of the spiraling electricity demands of these services? At what point will AI eat itself, creating emails from spoken summaries that are then in turn summarized by AI? When it comes to security and privacy, is even sovereign AI truly secure enough for use in regulated enterprise? Just how secure are Apple’s own AI servers?

And once the dominant players in the New AI Empire finally emerge, how, just how, will they do what Big Tech always does and follow Doctorow’s orders

You can follow me on social media! You’ll find me on BlueSky,  LinkedInMastodon, and MeWe

O2 unleashes AI grandma on scammers

Research by British telecommunications provider O2 has found that seven in ten Britons (71 percent) would like to take revenge on scammers who have tried to trick them or their loved ones. At the same time, however, one in two people does not want to waste their time on it.

AI grandma against telephone scammers

O2 now wants to remedy this with an artificial intelligence called Daisy. As the “head of fraud prevention”, it’s the job of this state-of-the-art AI granny to keep scammers away from real people for as long as possible with human-like chatter. To activate Daisy, O2 customers simply have to forward a suspicious call to the number 7726.

Daisy combines different AI models that work together to first listen to the caller and convert their voice to text. It then generates responses appropriate to the character’s “personality” via a custom single-layer large language model. These are then fed back via a custom text-to-speech model to generate a natural language response. This happens in real-time, allowing the tool to have a human-like conversation with a caller.

Although human-like is a strong understatement: Daisy was trained with the help of Jim Browning, one of the most famous “scambaiters” on YouTube. With the persona of a lonely and seemingly somewhat bewildered older lady, she tricks the fraudsters into believing that they have found a perfect target, while in reality she beats them with their own weapons.

AI is dumber than you think

OpenAI recently introduced SimpleQA, a new benchmark for evaluating the factual accuracy of large language models (LLMs) that underpin generative AI (genAI).

Think of it as a kind of SAT for genAI chatbots consisting of 4,326 questions across diverse domains such as science, politics, pop culture, and art. Each question is designed to have one correct answer, which is verified by independent reviewers. 

The same question is asked 100 times, and the frequency of each answer is tracked. The idea is that a more confident model will consistently give the same answer.

The questions were selected precisely because they have previously posed challenges for AI models, particularly those based on OpenAI’s GPT-4. This selective approach means that the low accuracy scores reflect performance on particularly difficult questions rather than the overall capabilities of the models.

This idea is also similar to the SATs, which emphasize not information that anybody and everybody knows but harder questions that high school students would have struggled with and had to work hard to master. This benchmark results show that OpenAI’s models aren’t particularly accurate on the questions that work asked. In short, they hallucinate. 

OpenAI’s o1-preview model achieved a 42.7% success rate. GPT-4o followed with a 38.2% accuracy. And the smaller GPT-4o-mini scored only 8.6%. Anthropic did worse than OpenAI’s top model; the Claude-3.5-sonnet model managed to get just 28.9% of the answers correct.

All these models got an F, grade-wise, providing far more incorrect answers than correct ones. And the answers are super easy for a human.

Here are the kinds of questions that are asked by SimpleQA: 

  • What year did the Titanic sink?
  • Who was the first President of the United States?
  • What is the chemical symbol for gold?
  • How many planets are in our solar system?
  • What is the capital city of France?
  • Which river is the longest in the world?
  • Who painted the Mona Lisa?
  • What is the title of the first Harry Potter book?
  • What does CPU stand for?
  • Who is known as the father of the computer?

These are pretty simple questions for most people to answer, but they can present a problem for chatbots. One reason these tools struggled is that SimpleQA questions demand precise, single, indisputable answers. Even minor variations or hedging can result in a failing grade. Chatbots do better with open-ended overviews of even very complex topics but struggle to give a single, concise, precise answer. 

Also, the SimpleQA questions are short and self-contained and don’t provide a lot of context. This is why providing as much context as possible in the prompts that you write improves the quality of responses. 

Compounding the problem, LLMs often overestimate their own accuracy. SimpleQA queried chatbots on what they think is the accuracy of their answers; the models consistently reported inflated success rates. They feign confidence, but their internal certainty may be low.

LLMs don’t really think

Meanwhile, newly published research from MIT, Harvard, and Cornell University show that while LLMs can perform impressive tasks, they lack a coherent understanding of the world.

As one of their test examples, the researchers found that LLMs can generate accurate driving directions in complex environments like New York City. But when researchers introduced detours, the models’ performance dropped because they didn’t have an internal representation of the environment (as people do). Closing just 1% of streets in New York City led to a drop in the AI’s directional accuracy from nearly 100% to 67%. 

Researchers found that even when a model performs well in a controlled setting, it might not possess coherent knowledge structures necessary for random or diverse scenarios. 

The trouble with AI hallucinations

The fundamental problem we all face is this: Industries and individuals are already relying on LLM-based chatbots and generative AI tools for real work in the real world. The public, and even professionals, believe this technology to be more reliable than it actually is. 

As one recent example, OpenAI offers an AI transcription tool called Whisper, which hospitals and doctors are already using for medical transcriptions. The Associated Press reported that a version of Whisper was downloaded more than 4.2 million times from the open-source AI platform HuggingFace.

More than 30,000 clinicians and 40 health systems, including the Children’s Hospital Los Angeles, are using a tool called Nabla, which is based on Whisper but optimized for medical lingo. The company estimates that Nabla has been used for roughly seven million medical visits in the United States and France. 

As with all such AI tools, Whisper is prone to hallucinations

One engineer who looked for Whisper hallucinations in transcriptions found the in every document examined. Another found hallucinations in half of the 100 hours of Whisper transcriptions he analyzed. 

Professors from the University of Virginia looked at thousands of short snippets from a research repository hosted at Carnegie Mellon University. They found that nearly 40% of the hallucinations were “harmful or concerning.”

In one transcription, Whisper even invented a non-existent medication called “hyperactivated antibiotics.”

Experts fear the use of Whisper-based transcription will result in misdiagnoses and other problems.

What to do about AI hallucinations

When you get a diagnosis from your doctor, you might want to get a second opinion. Likewise, whenever you get a result from ChatGPTPerplexity AI, or some other LLM-based chatbot, you should also get a second opinion.

You can use one tool to check another. For example, if the subject of your query has original documentation — say, a scientific research paper, a presentation, or a PDF of any kind — you can upload those original documents into Google’s NotebookLM tool. Then, you can copy results from the other tool, paste them into NotebookLM, and ask if it’s factually accurate. 

You should also check original sources. Fact-check everything. 

Chatbots can be great for learning, for exploring topics, for summarizing documents and many other uses. But they are not reliable sources of factual information, in general. 

What you should never, ever do is copy results from AI chatbots and paste it into something else to represent your own voice and your own facts. The language is often a bit “off.” The emphasis of points can be strange. And it’s a misleading practice. 

Worst of all, the chatbot you’re using could be hallucinating, lying or straight up making stuff up. They’re simply not as smart as people think.

FTC eyes Microsoft’s cloud practices amid antitrust scrutiny

The US Federal Trade Commission (FTC) is reportedly preparing to investigate Microsoft for potentially anticompetitive practices in its cloud computing division. This inquiry centers on whether Microsoft is abusing its market dominance by deploying restrictive licensing terms to dissuade customers from switching from its Azure platform to competitors, the Financial Times reported.

According to the report, the practices under scrutiny include sharply raising subscription fees for customers looking to switch providers, imposing high exit charges, and reportedly making Office 365 less compatible with competitor cloud services.

The investigation reflects the agency’s broader push, led by FTC Chair Lina Khan, to address Big Tech’s influence in sectors such as cloud services, with bipartisan support for curbing monopolistic practices.

In November 2023, the FTC began assessing cloud providers’ practices in four broad areas — competition, single points of failure, security, and AI — and sought feedback from stakeholders in academia, industry, and civil society.

The majority of the feedback the commission received highlighted concerns over licensing constraints that limit customers’ choices. 

Microsoft’s cloud strategy under fire

The inquiry reported by the Financial Times is still in its early stages, but an FTC challenge could significantly impact Microsoft’s cloud operations, which have grown rapidly in recent years.

“Interoperability and the fear of vendor lock-in are important criteria for enterprises selecting cloud vendors,” said Pareekh Jain, CEO of Pareekh Consulting. “This could create a negative perception of Microsoft. Previously, Microsoft faced a similar probe regarding the interoperability of Microsoft Teams.”

This scrutiny aligns with global regulatory focus: In the UK, the Competition and Markets Authority (CMA) is investigating Microsoft and Amazon following complaints about restrictive contracts and high “egress fees,” which make switching providers costly. Similarly, Microsoft recently sidestepped a formal probe in the European Union after it reached a multi-million-dollar settlement with rival cloud providers, addressing concerns of monopolistic practices.

Neither the FTC nor Microsoft had responded to questions about the reported investigation by press time.

Microsoft’s position in the cloud market

Cloud computing has rapidly expanded, with industry spending expected to reach $675 billion in 2024, according to Gartner. Microsoft controls roughly 20% of the global cloud market, second only to Amazon Web Services (31%) and ahead of Google Cloud (12%), according to Statista. Tensions have risen between the leading providers, with Microsoft accusing Google of using “shadow campaigns” to undermine its position by funding adversarial lobbying efforts.

“It seems Google has two ultimate goals in its astroturfing efforts: distract from the intense regulatory scrutiny Google is facing around the world by discrediting Microsoft and tilt the regulatory landscape in favor of its cloud services rather than competing on the merits,” Microsoft Deputy General Counsel Rima Alaily said in a statement in October.

AWS has also accused Microsoft of anticompetitive practices in the cloud computing segment and complained to the UK CMA.

These top cloud providers had already filed an antitrust case against Microsoft in 2022 alleging that Microsoft is using its software licensing terms to restrict European businesses’ options in selecting cloud providers for services like desktop virtualization and application hosting.

Previous FTC interventions and growing cloud sector scrutiny

This move follows the FTC’s legal challenge against Microsoft’s $75 billion acquisition of Activision Blizzard, which faced antitrust concerns around Microsoft’s cloud gaming business. While a federal court allowed the acquisition to proceed, the FTC’s appeal highlights its commitment to maintaining oversight of Big Tech’s market reach.

Since its inception, cloud computing has evolved from simple storage solutions to a cornerstone of AI development, with Microsoft, Amazon, and Google competing for contracts that power AI model training and deployment.

If pursued, this inquiry could lead to intensified regulations on Microsoft’s cloud strategy, underscoring the FTC’s commitment to protecting competitive markets in sectors increasingly dominated by a few key players. Neither the FTC nor Microsoft has publicly commented on the matter.

“Moving forward, all hyperscalers should commit to the interoperability of their cloud solutions in both intent and practice,” Jain noted, adding, “failing to do so may expose them to investigations that could damage their brand and business.”

Shared blame

If enterprises are finding themselves locked in to high costs, though, some of the blame may fall on them, suggested Yugal Joshi, a partner at Everest Group.

“Enterprises are happy signing highly discounted bundled deals, and when these financial incentives run out they complain about lock-in. Many of them already know what they are getting into but then are focused on near-term discounts over long-term interoperability and freedom to choose. Given the macro economy continues to struggle, price-related challenges are pinching harder,” Joshi said. “Therefore, clients are becoming more vocal and proactive about switching vendors if it saves them money.”

Microsoft has been a beneficiary of this, he said, because some clients are planning to move, and some have already moved, to its Dynamics platform from Salesforce.

FTC eyes Microsoft’s cloud practices amid anti-trust scrutiny

The US Federal Trade Commission (FTC) is reportedly preparing to investigate Microsoft for potentially anti-competitive practices in its cloud computing division. This inquiry centers on whether Microsoft is abusing its market dominance by deploying restrictive licensing terms to dissuade customers from switching from its Azure platform to competitors, the Financial Times reported.

According to the report, the practices under scrutiny include sharply raising subscription fees for customers looking to switch providers, imposing high exit charges, and reportedly making Office 365 less compatible with competitor cloud services.

The investigation reflects the agency’s broader push, led by FTC Chair Lina Khan, to address Big Tech’s influence in sectors such as cloud services, with bipartisan support for curbing monopolistic practices.

In November 2023, the FTC began assessing cloud providers’ practices in four broad areas — competition, single points of failure, security, and AI — and sought feedback from stakeholders in academia, industry, and civil society.

The majority of the feedback the commission received highlighted concerns over licensing constraints that limit customers’ choices. 

Microsoft’s cloud strategy under fire

The inquiry reported by the Financial Times is still in its early stages, but an FTC challenge could significantly impact Microsoft’s cloud operations, which have grown rapidly in recent years.

“Interoperability and the fear of vendor lock-in are important criteria for enterprises selecting cloud vendors,” said Pareekh Jain, CEO of Pareekh Consulting. “This could create a negative perception of Microsoft. Previously, Microsoft faced a similar probe regarding the interoperability of Microsoft Teams.”

This scrutiny aligns with global regulatory focus: In the UK, the Competition and Markets Authority (CMA) is investigating Microsoft and Amazon following complaints about restrictive contracts and high “egress fees,” which make switching providers costly. Similarly, Microsoft recently sidestepped a formal probe in the European Union after it reached a multi-million-dollar settlement with rival cloud providers, addressing concerns of monopolistic practices.

Neither the FTC nor Microsoft had responded to questions about the reported investigation by press time.

Microsoft’s position in the cloud market

Cloud computing has rapidly expanded, with industry spending expected to reach $675 billion in 2024, according to Gartner. Microsoft controls roughly 20% of the global cloud market, second only to Amazon Web Services (31%) and ahead of Google Cloud (12%), according to Statista. Tensions have risen between the leading providers, with Microsoft accusing Google of using “shadow campaigns” to undermine its position by funding adversarial lobbying efforts.

“It seems Google has two ultimate goals in its astroturfing efforts: distract from the intense regulatory scrutiny Google is facing around the world by discrediting Microsoft and tilt the regulatory landscape in favor of its cloud services rather than competing on the merits,” Microsoft Deputy General Counsel Rima Alaily said in a statement in October.

AWS has also accused Microsoft of anti-competitive practices in the cloud computing segment and complained to the UK CMA.

These top cloud providers had already filed an antitrust case against Microsoft in 2022 alleging that Microsoft is using its software licensing terms to restrict European businesses’ options in selecting cloud providers for services like desktop virtualization and application hosting.

Previous FTC interventions and growing cloud sector scrutiny

This move follows the FTC’s legal challenge against Microsoft’s $75 billion acquisition of Activision Blizzard, which faced antitrust concerns around Microsoft’s cloud gaming business. While a federal court allowed the acquisition to proceed, the FTC’s appeal highlights its commitment to maintaining oversight of Big Tech’s market reach.

Since its inception, cloud computing has evolved from simple storage solutions to a cornerstone of AI development, with Microsoft, Amazon, and Google competing for contracts that power AI model training and deployment.

If pursued, this inquiry could lead to intensified regulations on Microsoft’s cloud strategy, underscoring the FTC’s commitment to protecting competitive markets in sectors increasingly dominated by a few key players. Neither the FTC nor Microsoft has publicly commented on the matter.

“Moving forward, all hyperscalers should commit to the interoperability of their cloud solutions in both intent and practice,” Jain noted, adding, “failing to do so may expose them to investigations that could damage their brand and business.”

Getting started with Google Password Manager

If you’re still trying to remember all of your passwords and then type ’em into sites by hand, let me tell you: You’re doing it wrong.

With all the credentials we have to keep track of these days, there’s just no way the human brain can handle the task of storing the specifics — at least, not if you’re using complex, unique passwords that aren’t repeated (or almost repeated, even) from one site to the next. That’s where a password manager comes into play: It securely stores all your sign-in info for you and then fills it in as needed.

While there’s a case to be made for leaning on a dedicated app for that purpose (for reasons we’ll discuss further in a moment), Google has its own password management system built right into Chrome — and also now integrated directly into Android, at the operating system level. And it’s far better to rely on that than to use nothing at all.

Google Password Manager 101

First things first: You shouldn’t have to do anything to turn the Google Password Manager on. The system, once considered part of Google’s Smart Lock feature, works across Android, iOS, ChromeOS, and any other desktop platform where you’re signed into Chrome — and it’s typically activated by default in all of those places.

You’ll see the Password Manager’s prompts for credential-saving pop up anytime you enter your username and password into a site within the Chrome browser. The service will also offer to create complex new passwords for you when you’re signing up for something new. And whenever you return to a site where your credentials have been stored, Smart Lock will automatically fill them in for you — or, when more than one sign-in is associated with a single site, it’ll provide you with the option to pick the account you want to use.

The system is able to sign you into Android apps automatically, too, though it works somewhat sporadically — and you never quite know when it’ll be present. To use Google Password Manager in that way, you’ll need to search your Android device’s system settings for autofill, then:

  1. Tap “Autofill service from Google,” tap that same option once more, and confirm that the system is on and active.
  2. Return to that same settings search for autofill, tap “Preferred service,” and ensure that “Google” is both active and set to be the preferred service on that screen.

Google Password Manager can also sign you into both websites and apps across iOS, though on that front, you’ll need to manually enable the system by visiting the Passwords section of the iOS Settings app, selecting “Autofill” followed by “Passwords” and “Chrome,” and then turning on the “Autofill” option within that area.

Adjusting your Password Manager setup

If you ever want to look through and edit your stored passwords or adjust your Google Password Manager settings, the easiest thing is to sign into the Google Password Manager web interface at passwords.google.com — in any web browser, on any device you’re using.

There, you can view, edit, or delete any of your saved passwords as well as see and act on any alerts regarding possible security issues with your credentials.

You can also adjust your Google Password Manager preferences by clicking the gear icon in the upper-right corner of that page. It’s worth peeking in there once in a while, as you may find some options that are off by default and advisable to activate — like proactive alerts anytime a password you’ve saved is found to be compromised and on-device encryption for extra protection of any new passwords you save along the way.

That’s also where you can go to export all of your passwords for use in another service, if such a need ever arises.

google password manager settings

The Google Password Manager web settings section has a host of important options — some of which are disabled by default.

JR Raphael / IDG

Speaking of which, if you do at some point decide to use a standalone password manager — and we’ll dive into that subject further next — you’ll want to be sure to disable the “Offer to save passwords” and “Auto sign-in” options here to effectively turn Google Password Manager off and keep yourself from seeing confusingly overlapping prompts every time you try to sign in somewhere.

You’ll also want to revisit the related settings on any Android and/or iOS devices you’re using to be sure the new password manager is set to take the place of Google Password Manager in all the appropriate areas.

Google Password Manager vs. the competition

So why is it more advisable to use a dedicated password manager instead of Google Password Manager? Well, a few reasons:

First, dedicated password managers provide broader and more consistent support for storing and filling in passwords across the full spectrum of apps on both your phone and your computer — something most of us need to do quite regularly, especially in a work context. You don’t want to have to go manually look up a password and then copy and paste it over every time you sign into something outside of your browser, and with Google Password Manager, that’s frequently what you end up having to do.

Beyond that, dedicated password managers work seamlessly in any browser you’re using, on any device, instead of being closely connected only to Chrome.

They also tend to come with stronger and more explicit security assurances, and they often offer additional features such as the ability to share your passwords with team members or even external clients (with or without allowing the person to actually see the password in question). They frequently include other useful elements beyond just basic password storage, too, including the ability to securely store different types of notes and documents.

I maintain a collection of recommendations for the best password manager on Android, and my top choice right now is 1Password — which costs $36 a year for an individual subscription, $60 a year for a family membership that includes up to five people, $239-a-year Teams Starter Pack that allows up to 10 company users, or $96 per company user per year. And while my recommendation is technically Android-specific, I take into account the experience the service offers across all platforms, since most of us work across multiple device types. 1Password works equally well on the desktop front as well as on iOS.

If you aren’t going to take the time to mess with a dedicated password manager, though, Google’s built-in system is absolutely the next best thing. And now you know exactly how to use it.

This article was originally published in May 2020 and updated in November 2024.