Author: Security – Computerworld

Apple is the latest company to get pwned by AI

It’s happened yet again — this time to Apple.

Apple recently had to disable AI-generated news summaries in its News app in iOS 18.3. You can guess why: the AI-driven Notification Summaries for the news and entertainment categories in the app occasionally hallucinated, lied, and spread misinformation. 

Sound familiar? 

Users complained about the summaries, but Apple acted only after a complaint from BBC News, which told Apple that several of its notifications were improperly summarized. These were major errors in some cases. 

The generative AI (genAI) tool incorrectly summarized a BBC headline, falsely claiming that Luigi Mangione, who was charged with murdering UnitedHealthcare CEO Brian Thompson, had shot himself. It inaccurately reported that Luke Littler had won the PDC World Darts Championship hours before the competition had even begun and falsely claimed that Spanish tennis star Rafael Nadal had come out as gay.

Apple summarized other real stories with false information: The tool said Israeli Prime Minister Benjamin Netanyahu had been arrested, Pete Hegseth had been fired, that Trump tariffs had triggered inflation (before Donald Trump had re-assumed office), and spewed dozens of other falsehoods. 

Apple rolled out the feature not knowing it would embarrass the company and force a retreat — which is amazing when you consider that this happens to every other company that tries to automate genAI information delivery of any kind on a large scale. Microsoft Start’s travel section, for example, published an AI-generated guide for Ottawa that included the Ottawa Food Bank as a “tourist hotspot,” encouraging visitors to come on “an empty stomach.”

In September 2023, Microsoft’s news portal MSN ran an AI-generated obituary for former NBA player Brandon Hunter, who had passed away at the age of 42. The obituary headline called Hunter “useless at 42,” while the body of the text said that Hunter had “performed in 67 video games over two seasons.”

Microsoft’s news aggregator, MSN, attached an inappropriate AI-generated poll to a Guardianarticle about a woman’s death. The poll asked readers to guess the cause of death, offering options like murder, accident, or suicide.

During its first public demo in February 2024, Google’s Bard AI incorrectly claimed that the James Webb Space Telescope had taken the first pictures of a planet outside our solar system—some 16 years after the first extrasolar planets were photographed. 

These are just a few examples out of many. 

The problem: AI isn’t human

The Brandon Hunter example is instructive. The AI knows enough about language to “know” that a person who does something is “useful,” that death means they can no longer do that thing, and that the opposite of “useful” is “useless.” But AI does not have a clue that saying in an obituary that a person’s death makes them “useless” is problematic in the extreme.

Chatbots based on Large Language Models (LLMs) are inherently tone-deaf, ignorant of human context, and can’t tell the difference between fact and fiction, between truth and lies. They are, for lack of a better term, sociopaths — unable to tell the difference between the emotional impact of an obituary and a corporate earnings report. 

There are several reasons for errors. LLMs are trained on massive datasets that contain errors, biases, or inconsistencies. Even if the data is mostly reliable, it may not cover all possible topics a model is expected to generate content about, leading to gaps in knowledge. Beyond that, LLMs generate responses based on statistical patterns using probability to choose words rather than understanding or thinking. (They’ve been described as next-word prediction machines.)

The biggest problem, however, is that AI isn’t human, sentient, or capable of thought. 

Another problem: People aren’t AI

Most people don’t pay attention to the fact that we don’t actually communicate with complete information.  Here’s a simple example: If I say to my neighbor, “Hey, what’s up?” My neighbor is likely to reply, “Not much. You?”

A logic machine would likely respond to that question by describing the layers of the atmosphere, satellites, and the planets and stars beyond. It answered the question factually as it was asked, but the literal content of the question did not contain the actual information sought by the asker. 

To answer that simple question in the manner expected, a person has to be a human who is part of a culture and understands verbal conventions — or has to be specifically programmed to respond to such conventions with the correct canned response. 

When we communicate, we rely on shared understanding, context, intonation, facial expression, body language, situational awareness, cultural references, past interactions, and many other things. This varies by language. The English language is one of the most literally specific languages in the world, and so a great many other languages will likely have bigger problems with human-machine communication. 

Our human conventions for communication are very unlikely to align with genAI tools for a very long time.  That’s why frequent AI chatbot users often feel like the software sometimes willfully evades their questions. 

The biggest problem: Tech companies can be hubristic

What’s really astonishing to me is that companies keep doing this. And by “this,” I mean rolling out unsupervised automated content-generating systems that deliver one-to-many content on a large scale.

And scale is precisely the difference. 

If a single user prompts ChatGPT and gets a false or ridiculous answer, they are likely to shrug and try again, sometimes chastising the bot for its error, for which the chatbot is programmed to apologize and try again. No harm, no foul. 

But when an LLM spits out a wrong answer for a million people, that’s a problem, especially in Apple’s case, where no doubt many users are just reading the summary instead of the whole story. “Wow, Israeli Prime Minister Benjamin Netanyahu was arrested. Didn’t see that coming,” and now some two-digit percentage of those users are walking around believing misinformation. 

Each tech company believes they have better technology than the others. 

Google thought: Sure, that happened to Microsoft, but our tech is better. 

Apple thought: Sure, it happened to Google, but our tech is better.

Tech companies: No, your technology is not better. The current state of LLM technology is what it is — and we have definitely not reached the point where genAI chatbots can reliably handle a job like this. 

What Apple’s error teaches us

There’s a right way and a wrong way to use LLM-based chatbots. The right way is to query with intelligent prompts, ask the question in several ways, and always fact-check the responses before using or believing that information. 

Chatbots are great for brainstorming, providing quick information that isn’t important, or being a mere starting point for research that leads you to legitimate sources. 

But using LLM-based chatbots to write content unsupervised at scale? It’s very clear that this is the road to embarrassment and failure. 

The moral of the story is that genAI is still too unpredictable to reliably represent a company in one-to-many communications of any kind at scale. 

rrorSo, make sure this doesn’t happen with any project under your purview. Setting up any public-facing content-producing project meant to communicate information to large numbers of people should be a hard, categorical “no” until further notice. 

AI is not human, can’t think, and it will confuse your customers and embarrass your company if you give it a public-facing role.

Copilot AI comes to Microsoft 365 plans: Everything you need to know

If you’re a Microsoft 365 subscriber with a Personal or Family subscription, Microsoft just flipped a switch and activated Copilot AI features for your account. It’s a new part of your subscription, along with that 1TB of OneDrive storage and access to Office apps like Word and Excel. But there are some big catches — including a price hike and some limits.

Microsoft’s latest changes follow in Google’s footsteps, with AI features appearing in the standard subscription for much less than you’d pay for those AI features separately — but with the standard subscription price also going up at the same time.

Let’s dive into how Microsoft just transformed some of the world’s most popular productivity apps, what you can do now — and how you can avoid paying more.

Want to keep track of the latest Windows and Office developments in just a few minutes each week? Sign up for my free Windows Intelligence newsletter. Plus, get free Windows Field Guide downloads as a bonus when you sign up!

How Copilot works in Microsoft 365 plans

First things first, the basics: Microsoft announced that all Microsoft 365 Personal and Microsoft 365 Family plans now include Copilot features as of Jan. 16 in “most markets worldwide.” This won’t affect you when using a Microsoft 365 plan provided by a workplace — businesses still have to pay separately for AI features — but it will affect your individual plans. And plenty of professionals do pay for their own Microsoft 365 subscriptions. (I should know; I’m one of them!)

In other words, if you pay for Microsoft 365 and use apps like Word, Excel, PowerPoint, OneNote, and Outlook, you’ll now find the Copilot button popping up in these applications. Previously, you had to pay $20 per month for a Copilot Pro subscription to unlock these features.

Here’s the first big catch: The newly expanded paid 365 plans don’t give you unlimited access to Microsoft’s AI features. Instead, you get a monthly allotment of credits that Microsoft says “…should be enough for most subscribers.” In practice, that appears to be 60 credits per month — meaning you can use AI features 60 times per month. After that, you’ll need to pay for a $20-per-month Copilot Pro subscription to keep using those AI features.

Copilot added to Word
You’ll see an informational pop-up window the first time you open an app like Word.

Chris Hoffman, IDG

Note: These AI credits are actually shared across various other Microsoft apps — including for AI image generation in Designer, Paint, and Photos and text-editing work in Notepad. They’re not just for Word, Excel, and PowerPoint.

Plus, again, Microsoft is raising its 365 subscription prices, with Copilot bundled into the mix. They’re going up by $3 per month in the US, though the exact price increase will vary by country. For the yearly plans in the US, Microsoft 365 Family goes from $100 to $130 per year, and Microsoft 365 Personal goes from $70 to $100 per year.

This is the first time Microsoft has raised prices since launching the subscription service — originally called Office 365 — back in 2013. While it’s true that Microsoft is using these AI features as a way to hike prices, these subscriptions were overdue for a price increase anyway, and it’s nice to at least get something out of it. (In my opinion, between the 1TB of OneDrive storage and access to Office apps, it’s still a good value.)

It’s worth noting that this Copilot change is only for Microsoft 365 plans. If you buy a more traditional “one-time purchase” version of Office like Office 2024, your setup isn’t changing — and you won’t have access to these newer AI features.

Using Copilot AI in Microsoft 365 apps

With the new adjustments in place, Copilot AI is easy to find in Office apps: You’ll find a Copilot icon on the ribbon, or you can also select some text and click the little Copilot icon that appears next to it, or just press Alt+i. Then you can prompt Copilot to write or rewrite text in a document for you. You could also ask it questions about the document you’re viewing from the Copilot sidebar.

For more information on exactly how Copilot works in these Office apps, check out my Copilot Pro review from last year. The new built-in Copilot features are exactly the same as what you get with Copilot Pro; the only difference is that you’re limited to 60 uses per month in the 365 setup.

If you run out of credits, Microsoft will encourage you to upgrade to Copilot Pro. In a way, then, these AI features are a bit of a “trial” for Copilot Pro.

To check how many credits you have left, you can click the little menu icon in the Copilot sidebar in an Office app and then click “AI credit balance.” This will take you to your Microsoft 365 account subscription page, where you can see a running balance of the AI credits you’ve used.

Copilot AI credits
Your AI credit balance is just a few clicks away.

Chris Hoffman, IDG

Generating images with Microsoft Designer

The same credit system also applies to Microsoft Designer, which is a useful AI image-generation tool. (At our newsletter-focused small business The Intelligence, we use Microsoft Designer to create some feature image illustrations for our articles — we’re writers, not visual artists!)

That means with any paid Personal or Family 365 plan, you can opt to use your 60 monthly AI image credits directly within Designer, too. This is actually quite a downgrade: Previously, everyone got 15 credits per day for AI image generations. Now, subscribers get a total of 60 credits per month, while free accounts only get 15 credits per month.

If you need more than that, you can upgrade to the $20-a-month Copilot Pro plan, which gives you many more AI image generations in Designer and beyond. (Microsoft says you get “at least 10x more credits” for Designer with Copilot Pro, compared to the 15-credits-per-month free setup — so roughly 150 credits per month, then, compared to the 60 monthly credits in the base 365 subscription.)

AI tools are expensive to create and operate, and companies have lost a lot of money on them. It’s no surprise to see many AI tools offering less for free and looking for more payment from their users; that’s what’s happening here.

How to avoid the AI features (and costs) entirely

There are ways to avoid the Microsoft 365 subscription price increases, if you don’t anticipate using them and don’t want to pay for them. (The price increase doesn’t take effect until your next subscription renewal, by the way.)

If you already have a Microsoft 365 subscription, you can keep your old subscription price and opt out of the AI features “for a limited time.” Microsoft says you can switch by canceling your subscription and choosing one of the “Classic” plans during the cancelation process. Here are Microsoft’s instructions.

You could also buy “perpetual licenses” of Office instead of using the more prominently offered subscriptions. In other words, with a one-time purchase of Office 2024, you could use Office for a few years for that one-time purchase price. It’s not as good a deal as it sounds — that one-time purchase price will only get you access to Office apps like Word and Excel on a single computer, and you won’t have access to the 1TB of OneDrive storage. (Plus, while your license will be good in perpetuity, Microsoft will stop delivering security updates for Office 2024 in October 2029.)

You can also buy Microsoft 365 subscription keys from other retailers. Without getting into the weeds too far here, it’s worth noting that days after Microsoft implemented the subscription price increase, Amazon is still selling Microsoft 365 Personal subscriptions for $70 and Microsoft 365 Family subscriptions for $100 — the old prices. But these are the standard plans and include those AI features. That’s a bargain.

Of course, you could also turn to other office suites — the web-based Google Docs, the open-source LibreOffice, or the Apple-focused iWork suite — but Word, Excel, and PowerPoint are the business standard for a reason. And even with these AI-adding price increases, getting that 1TB of OneDrive storage at those prices is still a great deal.

Want more insights and helpful tips? Sign up for my free Windows Intelligence newsletter! You’ll also get three new things to try every Friday and free copies of Paul Thurrott’s Windows Field Guides as a special welcome gift.

For the first time ever, I wish Google would act more like Amazon

Fair warning: This isn’t your average article about what’s happening with all the newfangled AI hullabaloo in this weird and wild world of ours.

Nope — there’ll be no “oohing” and “ahhing” or talk about how systems like Gemini and ChatGPT and their brethren are, like, totally gonna revolutionize the world and change life as we know it.

Instead, I want to look at the state of these generative AI systems through as practical and realistic of a lens as possible — focusing purely on how they work right now and what they’re able to accomplish.

And with that in mind, my friend, there’s no way around it: These things seriously suck.

Sorry for the bluntness, but for Goog’s sake, someone’s gotta say it. For all their genuinely impressive technological feats and all the interesting ways they’re able to help with mundane work tasks, Google’s Gemini and other such generative AI systems are doing us all a major disservice in one key area — and everyone seems content in looking the other way and pretending it isn’t a problem.

That’s why I was so pleasantly surprised to see that one tech giant seemingly isn’t taking the bait and is instead lagging behind and taking its time to get this right instead of rushing it out half-baked, like everyone else.

It’s the antithesis to the strategy we’re seeing play out from Google and virtually every other tech player right now. And my goodness, is it ever a refreshing contrast.

[Get level-headed insight in your inbox with my Android Intelligence newsletter. Three new things to know and try each Friday!]

The Google Gemini Bizarro World

I won’t keep you waiting: The company that’s getting it right, at least in terms of its process and philosophy, is none other than Amazon.

I’ll be the first to admit: I’m typically not a huge fan of Amazon or its approach. But within this specific area, it really is creating a model for how tech companies should be thinking about these generative AI systems.

My revelation comes via a locked-down article that went mostly unnoticed at The Financial Times last week. The report’s all about how Amazon is scrambling to upgrade its Alexa virtual assistant with generative AI and relaunch it as a powerful “agent” for offering up complex answers and completing all kinds of online tasks.

More of the same, right? Sure sounds that way — but hang on: There’s a twist.

Allow me to quote a pertinent passage from behind the paywall for ya:

Rohit Prasad, who leads the artificial general intelligence (AGI) team at Amazon, told the Financial Times the voice assistant still needed to surmount several technical hurdles before the rollout.

This includes solving the problem of “hallucinations” or fabricated answers, its response speed or “latency,” and reliability. 

“Hallucinations have to be close to zero,” said Prasad. “It’s still an open problem in the industry, but we are working extremely hard on it.” 

(Insert exaggerated record-scratch sound effect here.)

Wait — what? Did we read that right?!

Let’s look to another passage to confirm:

One former senior member of the Alexa team said while LLMs were very sophisticated, they came with risks, such as producing answers that were “completely invented some of the time.”

“At the scale that Amazon operates, that could happen large numbers of times per day,” they said, damaging its brand and reputation.

Well, tickle me tootsies and call me Tito. Someone actually gives a damn.

If the contrast here still isn’t apparent, let me spell it out: These large-language-model systems — the type of technology under the hood of Gemini, ChatGPT, and pretty much every other generative AI service we’ve seen show up over the past year or two — they don’t really know anything, in any human-like sense. They work purely by analyzing massive amounts of data, observing patterns within that data, and then using sophisticated statistics to predict what word is likely to come next in any scenario — relying on all the info they’ve ingested as a guide.

Or, put into layman’s terms: They have no idea what they’re saying or if it’s right. They’re just coughing up characters based on patterns and probability.

And that gets us to the core problem with these systems and why, as I put it so elegantly a moment ago, they suck.

As I mused whilst explaining why Gemini is, in many ways, the new Google+ recently:

The reality … is that large-language models like Gemini and ChatGPT are wildly impressive at a very small set of specific, limited tasks. They work wonders when it comes to unambiguous data processing, text summarizing, and other low-level, closely defined and clearly objective chores. That’s great! They’re an incredible new asset for those sorts of purposes.

But everyone in the tech industry seems to be clamoring to brush aside an extremely real asterisk to that — and that’s the fact that Gemini, ChatGPT, and other such systems simply don’t belong everywhere. They aren’t at all reliable as “creative” tools or tools intended to parse information and provide specific, factual answers. And we, as actual human users of the services associated with this stuff, don’t need this type of technology everywhere — and might even be actively harmed by having it forced into so many places where it doesn’t genuinely belong.

That, m’dear, is a pretty pressing problem.

Allow me to borrow a quote collected by my Computerworld colleague Lucas Mearian in a thoroughly reported analysis of how, exactly, these large-language models work:

“Hallucinations happen because LLMs, in their in most vanilla form, don’t have an internal state representation of the world,” said Jonathan Siddharth, CEO of Turing, a Palo Alto, California company that uses AI to find, hire, and onboard software engineers remotely. “There’s no concept of fact. They’re predicting the next word based on what they’ve seen so far — it’s a statistical estimate.”

And there we have it.

That’s why Gemini, ChatGPT, and other such systems so frequently serve up inaccurate info and present it as fact — something that’s endlessly amusing to see examples of, sure, but that’s also an extremely serious issue. What’s more, it’s only growing more and more prominent as these systems show up everywhere and increasingly overshadow traditional search methods within Google and beyond.

And that brings us back to Amazon’s seemingly accidental accomplishment.

Amazon and Google: A tale of two AI journeys

What’s especially interesting about the slow-moving state of Amazon’s Alexa AI rollout is how it’s being presented as a negative by most market-watchers.

Back to that same Financial Times article I quoted a moment ago, the conclusion is unambiguous:

In June, Mihail Eric, a former machine learning scientist at Alexa and founding member of its “conversational modelling team,” said publicly that Amazon had “dropped the ball” on becoming “the unequivocal market leader in conversational AI” with Alexa.

But, ironically, that’s exactly where I see Amazon doing something admirable and creating that striking contrast between its efforts and those of Google and others in the industry.

The reality is that all these systems share those same foundational flaws. Remember: By the very nature of the technology, generative-AI-provided answers are woefully inconsistent and unreliable.

And yet, Google’s been going in overdrive to get Gemini into every possible place and get us all in the habit of relying on it for almost every imaginable purpose — including those where it simply isn’t reliable. (Remember my analogy from a minute ago? Yuuuuuup.)

In doing so, it’s chasing short-term market gains at the cost of long-term trust. All other variables aside, being wrong or misleading with basic information 20% of the time — or, heck, even just 10% of the time — is a pretty substantial problem. I’ve said it before, and I’ll say it again: If something is inaccurate or unreliable 10% of the time, it’s useful precisely 0% of the time.

And to be clear, the stakes here couldn’t be higher. In terms of their answer-offering and info-providing capabilities, Gemini and other such systems are being framed and certainly perceived as magical answer machines. Most people aren’t treating ’em with a hefty degree of skepticism and taking the time to ask all the right questions, verify answers, and so on. They’re asking questions, seeing or hearing answers, and then assuming they’re right.

And by golly, are they getting an awful lot of confidently stated inaccuracies as a result — something that, as we established a moment ago, is likely inevitable with this type of technology in its current state.

On some level, Google is clearly aware of this. The company had been developing the technology behind Gemini for years before rushing it out into the world following the success and attention around ChatGPT’s initial rollout — but, as had been said in numerous venues over time, it hadn’t felt like it was mature enough to be ready for public use.

So what changed? Not the nature of the technology — nope; by all counts, it was just the competitive pressure that forced Google to say “screw it, it’s good enough” and go all-in with systems that weren’t and still aren’t ready for primetime, at least with all of their promoted purposes.

And that, my fellow accuracy-obsessed armadillo, is where Amazon is getting it right. Rather than just rushing to replace Alexa with some new half-baked replacement, the company is actually waiting until it feels like it’s got the new system ready — with reliability, yes, but also with branding and a consistent-seeming user experience. (Anyone who’s been trying to navigate the comically complex web of Gemini and Assistant on Androidand beyond — can surely relate!)

Whether Amazon will keep up this pattern or eventually relent and go the “good enough” route remains to be seen. Sooner or later, investor pressure may force it to follow Google’s path and put its next-gen answer agent out there, even if it in all likelihood still isn’t ready by any reasonable standard.

For now, though, man: I can’t help but applaud the fact that the company’s taking its time instead of prematurely fumbling to the finish line like everyone else. And I can’t help but wish Google would have taken that same path, too, rather than doing its usual Google Thang™ and forcing some undercooked new concept into every last nook and cranny — no matter the consequences.

Maybe, hopefully, this’ll all settle out in some sensible way and turn into a positive in the future. For the moment, though, Google’s strategy sure seems like more of a minus than a plus for us, as users of its most important products — and especially in this arena, it sure seems like getting it right should mean more than getting it out into the world quickly, flaws and all and at any cost.

Get plain-English perspective on the news that matters with my free Android Intelligence newsletter — three things to know and try in your inbox each Friday.

Perplexity launches Sonar API, enabling enterprise AI search integration

Perplexity has introduced an API service named Sonar that would allow developers and enterprises to embed the company’s generative AI search technology into their applications.

The company has rolled out two initial tiers – a more affordable and faster option called Sonar, and a higher-priced tier, Sonar Pro, tailored for handling more complex queries.

In a blog post, Perplexity described Sonar API as “lightweight, affordable, fast, and simple to use,” noting that it includes features such as citations and the ability to customize sources. The company said the API is ideal for businesses requiring streamlined question-and-answer functionalities optimized for speed.

For enterprises with more complex requirements, Perplexity will offer the Sonar Pro API, which supports multi-step queries, a larger context window for handling longer and more detailed searches, and higher extensibility.

It will also provide approximately twice the number of citations per search compared to the standard Sonar API, the company said.

Competing with bigger players

The launch positions Perplexity as a stronger, more direct competitor to larger players such as OpenAI and Google, offering its real-time, web-connected search capabilities to users.

“Perplexity’s real-time data retrieval and citation-backed responses cater to enterprise demands for reliable, transparent, and actionable information,” said Prabhu Ram, VP of the industry research group at Cybermedia Research. “By prioritizing verifiability and up-to-date insights, it offers a specialized approach that sets it apart from broader conversational models like GPT-4 and Claude, which serve a wider range of use cases but may lack the tailored features enterprises need for effective decision-making and compliance management.”

However, significant challenges remain in areas such as data privacy, depth, scale, and audit trails before any company can establish itself as a leader in the field.

Although real-time information capabilities offer clear advantages, they also introduce additional complexities to an already intricate landscape.

“Enterprises tend to take cautious steps when exploring new technologies like enterprise research, especially when handling sensitive company data,” said Sanchit Vir Gogia, chief analyst and CEO at Greyhound Research. “Beyond concerns around data privacy, security, and compliance, any large company would first evaluate accuracy and performance. Whether Perplexity can outpace Google and OpenAI in this space will depend on the aggressiveness of Google’s and other competitors’ strategies.”

Real-time information and search is an area where Google excels. To stay competitive, Perplexity will need to explore other differentiators, such as compliance, Gogia said.

Use cases for enterprises

Analysts note that Perplexity’s latest offerings have diverse applications, ranging from basic tasks such as identifying service management issues in IT functions to broader use by leaders in supply chain, finance, sales, marketing, and operations.

“Sonar Pro, [especially] caters to the needs of compliance-driven industries such as healthcare, BFSI, and energy by offering robust tools for security reporting, portfolio management, and seamless DevOps integration,” Ram said. “These capabilities not only enhance operational efficiency but also ensure adherence to regulatory standards, positioning it as an indispensable solution for improving software development practices.”

But for the tool to succeed, significant customization may be necessary, tailored to specific industries and individual companies.

Initially, and in the near term, use cases are expected to remain highly customized before gradually evolving into industry-standard templates, all while adhering to local compliance requirements, according to Gogia. “Early adopters may come from industries like banking and manufacturing with a mature data posture, which is an essential foundation for any such deployment,” Gogia said. “Architectural maturity is another key requirement for success, as it enables the embedding and native use of such a tool within the architecture.”

Today’s AI models have a poor grasp of world history

Today’s AI models do a poor job of providing accurate information about world history, according to a new report from the Austrian research institute Complexity Science Hub (CSH).

In an experiment, OpenAI’s GPT-4, Meta’s Llama, and Google’s Gemini were asked to answer yes or no to historical questions — and only 46% of the answers were correct. GPT-4, for example, answered “yes” to the question of whether Ancient Egypt had a standing army, likely because the AI model chose to extrapolate data from other empires such as Persia.

“If you are told A and B 100 times and C one time, and then asked a question about C, you might just remember A and B and try to extrapolate from that,” researcher Maria del Rio-Chanona told Techcrunch.

According to the researchers, AI models have more difficulty providing accurate information about some regions than others, including sub-Saharan Africa.

Microsoft starts testing an AI-based search engine in Windows 11

Windows Insiders can now download a new beta version of Windows 11 that offers support for Microsoft’s new search engine based on artificial intelligence (AI); that means, among other things, it is possible to use natural language when performing a search.

For now, search functions are limited to finding images and documents in jpg, png, pdf, txt and xls formats.

To use the new AI search, a Copilot+ computer is required. And thanks to the special NPU chip in these PCs, users can try out the new search feature without being connected to the internet, The Verge reports.

As of now, the search tool works with English, French, Spanish, German, Japanese or Chinese.

Apple will add AI Mail tools to Macs and iPads this spring

Apple is about to improve its email application, Mail, on iPads and Macs. Both platforms will soon gain the same Apple Intelligence email summary and prioritization systems already available on iPhones — though not every iPhone user seems to have embraced the new systems.

What’s coming?

If you use an iPhone, you’ll already have experienced the system, which introduces a new user interface in Mail and uses artificial intelligence (AI) in an attempt to prioritize incoming email communications with smart categorization. The latter means your device can look at incoming correspondence and assign it to categories, which currently include the following groupings:

  • All Mail:  All your Mail in order of receipt.
  • Primary: All the messages Apple Intelligence thinks might be important to you within this category.
  • Transactions: Invoices, shopping receipts, and key messages from services and organizations, along with banking messages.
  • Updates: All the stuff you subscribe to. (I see this as a quick list of things to unsubscribe from.)
  • Promotions: Most marketing emails, which Apple describes as capturing special offers, deals and more.

Don’t worry about time-sensitive messages. Apple says these should automatically appear in your Primary account, even if the message belongs to a different category. That means you shouldn’t miss an important delivery when it arrives. Mail also uses AI to prioritize mails it believes might be more important, placing these at the top of the Primary mailbox.

How to manage categories

You can sort of teach Mail how to make better decisions when categorizing incoming emails. When you find an email gathered within an inappropriate category you can intervene.

On an iPhone, you do this by opening the message and tapping the three dots at the top right of the message page. A menu will appear, including an option to Categorize Sender. Tap this and you can assign emails from that sender to a more relevant category or leave automatic categorization on if you choose.

You can’t yet create your own custom groupings, such as for specific projects or workgroups, though you can use Smart Mailboxes to achieve this to some extent. Most observers believe that the capacity to build your own categories will be introduced eventually.

Had you noticed?

Apple has equipped email with a tool that lets you dig more deeply into the categories active on your device. Open Mail and tap the three dots icon at top right. Here you can switch between Categories and the traditional List view and also tap the About Categories item. When you do, you’ll be taken to an About Categories page in which you’ll find interesting information, such as:

  • How many of your messages are seen as being ones that matter the most. 
  • How many receipts, orders, and delivery-related emails you received in the last seven days.
  • The frequency and magnitude of promotional email received.

What else is coming?

The new tools will likely appear as Apple introduces new contextual awareness to its devices, which means Siri will be able to answer questions that relate to what’s on your screen or in your apps. That means you’ll be able to engage in tasks such as adding address data to contact cards, or checking your next dinner date with your gran while looking at a message from her.

What people think

Apple drew criticism when it first introduced these features, with some users not adapting to them at all. They complained about the icons used, and didn’t seem to understand how to manage the new features or the layout within Mail. But control remained in their hands; disabling Categories is simple.

On an iPhone, you open Mail, tap the three dots at the top right of the app, and then tap List View. This returns Mail to the time-based list view of incoming messages you are used to.

Which devices can use the feature?

Unlike other Apple Intelligence features, these Mail tools are available to most Apple devices capable of running the latest version of Apple’s operating systems, which basically means you don’t need to be using a recent device such as an M-series Mac if you want to use these tools.

When will these tools reach iPads and Macs?

We anticipate the new Mail features will arrive on iPads and Macs starting with the next major update, which is expected to arrive in April after extensive beta testing. That means you can expect the new Mail tools to arrive with iPad OS 18.4 and macOS 15.4, which should appear in spring. It is not clear whether these tools will form part of a wider introduction of more contextual intelligence across Apple’s devices.

You can follow me on social media! Join me on BlueSky,  LinkedInMastodon, and MeWe

AI can predict career success from a facial image, study finds

A new study by researchers from four universities claims artificial intelligence (AI) models can predict career and educational success from a single image of a person’s face.

The researchers from Ivy League schools and others used photos from LinkedIn and photo directories of several top US MBA programs to determine what is called the Big Five personality traits for 96,000 graduates. It then compared those personality traits to employment outcomes and education histories of the graduates to determine correlation between the personality and success.

The findings highlight the significant impact AI could have as it shapes hiring practices. Employers and job seekers are increasingly turning to generative AI (genAI) to to automate their search tasks, whether it’s creating a shortlist of candidates for a position or writing a cover letter and resume. And data shows applicants can use AI to improve the chances of getting a particular job or a company finding the perfect talent match.

“I think personality affects career outcomes, and to the extent we can infer personality, we can predict their career outcomes,” said Kelly Shue, a study co-author and a Yale School of Management (SOM) finance professor.

Shue also noted there are many “disturbing moral implications” related to organizations using AI models to determine personalities. “I do worry this could be used in a way — to put lightly — it could make a lot of people unhappy,” she said. “Imagine using it in a hiring setting or as part of university admissions. A firm is trying to hire the best possible workers, and now in addition to screening on standard stuff, such as where you went to school and what degrees you have and your work experience, they’re going to screen you on your personality.

“I think our study may prompt [the technology’s] use, although we’re careful in the way we wrote it up in that we’re not advocating for adoption,” Shue said.

Organizations have been screening job applicants based on personality for years using behavioral assessments such as Pymetrics games, which measure up to 91 personality traits that fit into 9 different categories.

In fact, Shue said, “a ton” of companies already heavily use these more obvious estimates of someone’s personality, “they just haven’t been doing that from pictures of a person’s face,” she said. “I’ve known students who don’t get a callback after the behavioral assessment. So, presumably they were screened out based just on personality.”

Derived from a psychology framework, the Big Five personality traits (also known as the OCEAN model) comprise: Openness (curiosity, aesthetic sensitivity, imagination); Conscientiousness (organization, productiveness, responsibility); Extraversion (sociability, assertiveness, energy level); Agreeableness (compassion, respectfulness, trust); and Neuroticism (anxiety, depression, emotional volatility).

AI photo analysis

Yale School of Management

Depending on which personality trait surfaces in the AI’s assessment, a school or company might pass an applicant by. For example, someone whose photo shows a tendency toward neuroticism is less likely to be hired.

“Neurotic is a very important personality trait,” Shue said. “In much of our analysis, it seems to have substantial predictive power for labor market outcomes, often going in a negative direction.”

Or, for example, someone who is less conscientious might be passed over by college admissions. “I think it’s possible personality matters for admissions,” Shue said. “Maybe schools want people who are going to be successful in their future careers, maybe they want diversity in personality, but certainly personality does matter for a lot of outcomes.

“To the extent that a school wants to admit a class that’s more likely to have [successful outcomes] they’d want to screen on personality.”

Using a combination of computer vision and AI natural language processing (NLP) technologies, the researchers from Yale, the University of Pennsylvania, Reichman University, and Indiana University, were able to determine how the personality traits played into career and educational outcomes.

While someone changing their expression in a photo could play into how the AI perceives personality, Shue said the researchers seen “stability” in results using different photographs of the same individual. “We can also use separate algorithms to determine whether a person is smiling or not and if they’re holding that smile fixed,” she said.

There has already been pushback on the use of AI in culling job candidates, as the technology has proven to be flawed based on its data sources. “As AI continues to influence hiring practices, this research invites further exploration into its ethical, practical, and strategic considerations,” the study states.

Shue said the research highlights how cognitive skills and personality traits are key to labor market success, and that if a photo can uncover personality, it could be equally important to other factors on a resume.

“The reason we think it matters is when first companies are looking to hire, the key thing they’re looking at is education, or GPA and standardized test scores sometimes,” she said. “So, then, what we’re saying is our personality measures is in the same ballpark as those other measures or variables for how much they predict career success.”

The study also highlights that individual pay varies widely, and factors like race or education explain only a small portion of this variation. For example, while education matters for income, it doesn’t account for much of the variation in pay, which also includes experience and proficiency.

“Another way of looking at it,” Shue said, “is among people with say 12 years of education, there’s still huge variations in income within that group.”

The study also drew from previous research conducted on how personality traits could be revealed through an analysis of someone’s face. For example, a 2020 paper published in the scientific journal Nature noted a growing number of researchers had shown a link between facial images and the Big Five personality traits.

Other follow-on studies revealed how facial recognition technology could pick up on a person’s political affiliations through a facial image. That study used more than one million images to predict their political orientation by comparing their similarity to faces of liberals and conservatives.

“Political orientation was correctly classified in 72% of liberal–conservative face pairs, remarkably better than chance (50%), human accuracy (55%), or one afforded by a 100-item personality questionnaire (66%),” the study published in Nature revealed.

The newest study focused on four main objectives:

  • Human Capital: Cognitive skills and personality traits are crucial for labor market success, but scaling personality measurement is challenging.
  • Methodology: Researchers developed “Photo Big 5,” extracting personality traits from facial images of 96,000 MBA graduates, with strong predictive value for career outcomes.
  • Predictive Power: The Photo Big 5 predicts school rank, compensation, seniority, industry choice, job transitions, and career growth, with modest links to GPA.
  • Ethics: The method improves accessibility and resists manipulation, but raises concerns about discrimination and autonomy.

A subsection of the study also cites literature about how a person’s facial image can uncover someone’s genetic makeup or even how pre-natal environment can contribute to personality.

Genetics, Shue said, can explain 30% to 60% of the variation in personality across individuals. There’s also research showing early childhood hormone exposure affects personality and how people look.

“So, then, I don’t think there’s a stretch to say there’s a strong genetic as well as environmental component to how we look, there’s a strong genetic-environmental component to our personality,” she said.

Delays in TSMC’s Arizona plant spark supply chain worries

Taiwan Semiconductor Manufacturing Company (TSMC) has said it is unlikely to equip its new US plant in Arizona with its most advanced chip technology ahead of its Taiwan factories, raising concerns about supply-chain hurdles for tech companies.

Speaking at a university event in Taiwan, TSMC CEO and Chairman C.C. Wei attributed the delays at TSMC’s Arizona factory to a combination of complex compliance requirements, local construction regulations, and extensive permitting processes, according to a Reuters report

2025’s first Patch Tuesday: 159 patches, including several zero-day fixes

Microsoft began 2025 with a hefty patch release this month, addressing eight zero-days with 159 patches for Windows, Microsoft Office and Visual Studio. Both Windows and Microsoft Office have “Patch Now” recommendations (with no browser or Exchange patches) for January.

Microsoft also released a significant servicing stack update (SSU) that changes how desktop and server platforms are updated, requiring additional testing on how MSI Installer, MSIX and AppX packages are installed, updated, and uninstalled. 

To navigate these changes, the Readiness team has provided this useful infographic detailing the risks of deploying the updates.

Known issues 

Readiness worked with both Citrix and Microsoft to detail the more serious update issues affecting enterprise desktops, including:

  • Windows 10/11: Following the installation of the October 2024 security update, some customers report that theOpenSSH (Open Secure Shell) service fails to start, preventing SSH connections. The service fails without detailed logging; manual intervention is required to run the sshd.exe process. Microsoft is investigating the issue with no (as of now) published schedule for either mitigations or a resolution.

Citrix reported significant issues with its Session Recording Agent (SRA), causing the January update to fail to complete successfully. Microsoft published a security bulletin (KB5050009) that says: “Affected devices might initially download and apply the January 2025 Windows security update correctly, such as via the Windows Update page in Settings.” Once this situation occurs, however, the update process stops and proceeds to rollback to the original state.

In short, if you have the Citrix SRA installed, your device was (likely) not updated this month.

Major revisions

For this Patch Tuesday, we have the following revisions to previously released updates:

Microsoft also released CVE-2025-21224 to address two memory related security vulnerabilities in the legacy line printer daemon (LPD), a Windows feature that has been deprecated for 15 years. I can’t see things improving for these print-related functions (given the problems we’ve seen for the past decade). Maybe now is the time to start removing these legacy features from your platform.

Windows lifecycle and enforcement updates

The following Microsoft products will be retired this year:

Of course, we don’t need to mention the elephant in the room. Microsoft will end support for Windows 10 in October.

Each month, we analyze Microsoft’s updates across key product families — Windows, Office, and developer tools — to help you prioritize patching efforts. This prescriptive, actionable, guidance is based on assessing a large application portfolio and a detailed analysis of the Microsoft patches and their potential impact on the Windows platforms and apps.

For this release cycle from Microsoft, we have grouped the critical updates and required testing efforts into different functional areas including:

Remote desktop

January has a heavy focus on Remote Desktop Gateway (RD Gateway) and network protocols, with the following testing guidance:

  • RD Gateway Connections: Ensure RD Gateway (RDG) continues to facilitate both UDP and TCP traffic seamlessly without performance degradation. Try disconnecting RDG from an existing/established connection.
  • VPN, Wi-Fi, and Bluetooth Scenarios: test end-to-end configurations and nearby sharing functionality.
  • DNS Management for Operators: Verify that users in the “Network Configuration Operators” group can manage DNS client settings effortlessly.

Local Windows file system and storage

File system and storage components also get minor updates. Desktop and server file system testing efforts should focus on:

  • Offline Files and Mapped Drives: Test mapped network drives under both online and offline conditions. Pay close attention to Sync Center status updates.
  • BitLocker: Validate drive locking and unlocking, BitLocker-native boot scenarios, and post-hibernation states with BitLocker enabled.

Virtualization and Microsoft Hyper-V

Hyper-V and virtual machines receive lightweight updates:

  • Traffic Testing: Install the Hyper-V feature and restart systems. Monitor network performance and ensure no regressions in virtual network traffic or virtual machine management.

Security and authentication

Key areas for security-related testing include:

  • Digest Authentication Stress Testing: Simulate heavy loads while using Digest authentication to uncover potential issues.
  • SPNEGO Negotiations: Verify Secure Negotiation Protocol (SPNEGO) functionalities in cross-domain or multi-forest Active Directory setups.
  • Authentication Scenarios: Test applications relying on LSASS processes and ensure that protocols like Kerberos, NTLM, and certificate-based authentication remain stable under load.

Other critical updates

There are some additional testing priorities for this release:

  • App Deployment Scenarios: Install and update MSIX/Appx packages with and without packaged services, confirming admin-only requirements for updates.
  • WebSocket Connections: Establish and monitor secure WebSocket connections, ensuring proper encryption and handshake results.
  • Graphics and Themes: Test GDI+-based apps and workflows involving theme files to ensure UI elements render correctly across different view modes. Some suggestions include foreign language applications that rely on Input Method Editors (IMEs).

January’s updates maintain a medium-risk profile for most systems, but testing remains essential — especially for networking, authentication, and file system scenarios. We recommend prioritizing remote network traffic validation, with light testing for storage and virtualization environments. If you have a large MSIX/Appx package portfolio, there’s a lot of work to do to ensure that your package installs, updates and uninstalls successfully.

Each month, we break down the update cycle into product families (as defined by Microsoft) with the following basic groupings: 

  • Browsers (Microsoft IE and Edge) 
  • Microsoft Windows (both desktop and server) 
  • Microsoft Office
  • Microsoft Exchange and SQL Server 
  • Microsoft Developer Tools (Visual Studio and .NET)
  • Adobe (if you get this far) 

Browsers

There were no Microsoft browser updates for Patch Tuesday this month. Expect Chromium updates that will affect Microsoft Edge in the coming week. (You can find the enterprise release schedule for Chromium here.)

Microsoft Windows

This is a pretty large update for the Windows ecosystem, with 124 patches for both desktops and servers, covering over 50 product/feature groups. We’ve highlighted some of the major areas of interest:

  • Fax/Telephony
  • MSI/AppX/Installer and the Windows update mechanisms
  • Windows COM/DCOM/OLE
  • Networking, Remote Desktop
  • Kerberos, Digital Certificates, BitLocker, Windows Boot Manager
  • Windows graphics (GDI) and Kernel drivers

Unfortunately, Windows security vulnerabilities CVE-2025-21275 and CVE-2025-21308 both affect core application functionality and have been publicly disclosed. Add these Windows updates to your “Patch Now” release schedule.

Microsoft Office

Microsoft Office gets three critical updates, and a further 17 patches rated important. Unusually, three Microsoft Office updates affecting Microsoft Access fall into the zero-day category with CVE-2025-21366, CVE-2025-21395 and CVE-2025-21186 publicly disclosed. Add these Microsoft updates to your “Patch Now” calendar.

Microsoft Exchange and SQL Server

There were no updates from Microsoft for SQL Server or Microsoft Exchange servers this month. 

Microsoft Developer Tools (Visual Studio and .NET)

Microsoft has released seven updates rated as important affecting Microsoft .NET and Visual Studio. Given the urgent attention required for Office and Windows this month, you can add these standard, low-profile patches to your standard developer release schedule. 

Adobe and third-party updates

No Adobe related patches were released by Microsoft this month. However, two third-party, development related updates were published; they affect GitHub (CVE-2024-50338) and CERT CC patch (CVE-2024-7344). Both updates can be added to the standard developer release schedule.