After the devastating fire of 2019, Notre Dame Cathedral was rebuilt and restored with the aid of digital twins.
Digital twins have become indispensable tools across industries. Powered by AI, these virtual constructs mirror physical systems in complex manufacturing facilities, supply chains, and operational workflows. By continuously monitoring their physical counterparts and feeding back recommendations, digital twins allow us to predict maintenance needs, optimize production schedules, and prevent disruptions before they occur.
|
ADVERTISEMENT |
Yet, for all their power, digital twins have largely remained confined to a particular domain: structured, operational decisions with desired outcomes. Could companies also use these digital constructs to tackle unstructured strategic challenges, such as market-entry decisions and long-term planning?
During a recent INSEAD Tech Talk X, I explored this question with Hamza Mudassir, a strategy lecturer at Cambridge University and co-founder and CEO of Strategize Labs, a startup that specializes in digital twins and AI.
Mudassir proposed a solution that resembles science fiction at first glance but is eminently feasible today: Run a debate among several AI agents, each with its own strengths, and, once they converge on a set of recommendations, test their recommendations on the digital twin of your business.
The hive mind
Businesses are already turning to AI to support decision-making. However, relying on a single model often misses the bigger picture. Mudassir observed that breakthrough strategies at companies like Apple and Microsoft didn’t emerge from standard consulting frameworks. Instead, they came from a different process: Having a small team of highly tuned people with different viewpoints— but a shared goal of doing what’s best for the organization—debate each other.
This was what Mudassir and his team did, except instead of humans, they employed multiple large language models (LLMs). They dubbed this process the “hive mind,” after the Star Trek cyborgs that appear to work independently but are united in a common objective.
Engineering disagreement
Mudassir’s team took four different LLMs, gave them discrete personalities and state-of-the-art models, then tasked them with solving unstructured problems on two topics—a strategy question and a human resources question. Then, the team told them to debate.
“An LLM arguing with another LLM to get to a good answer is effectively taking on the role of the user,” says Mudassir. “As a user, when you’re talking to ChatGPT, you’re giving it feedback. It’s an iterative system... and most of an LLM’s capabilities are based on its ability to be challenged.”
In another nod to science fiction, Mudassir’s team added what they call “the inception layer.” Their setup tricks each LLM into believing it’s interacting with humans, not other machines. Moreover, rather than handcrafting personalities, a master AI analyzes each problem and dynamically generates synthetic personas optimized for that specific challenge.
The system also incorporates “temperature” settings to control creativity. Low settings produce consistent and repeatable, albeit not very novel, results. High settings generate more innovative outputs as patterns collide in unexpected ways. The system manages both semantic and technical dimensions simultaneously.
Strategy vs. culture: Two experiments
To test their approach, Mudassir’s team pitted the AI hive mind against humans. First, the team came up with two thorny questions: a strategy question about turning around a brewery, and a human resources question involving harassment in an organization.
They recruited two groups of participants and assigned each group to one question. The group that got the complex, if relatively straightforward, strategy case was made up of MBA students with five to six years of work experience. The second group, comprising chief human resources officers (CHROs) and directors with at least 10 years of experience across different geographies, was tasked to solve the HR problem. Then, Mudassir’s team created two AI hive minds, one for each question.
For the strategy case, the AI hive mind trumped the humans hands-down. In 10 minutes, the hive mind produced four times the output that humans generated in 45 minutes—if for no other reason than “there’s only so much that three or four people can say physically in 45 minutes.”
But word count wasn’t the only dimension in which the AI hive mind overperformed. It also generated complete “McKinsey-looking” presentation decks, financial models, and even an unsolicited but much-needed supply chain analysis that Mudassir’s team hadn’t thought to request.
Mudassir’s verdict: “On very generic problems that are about brute-force sort of intelligence and computation on standard frameworks, theory, etc., [problems that] don’t have a lot of variance in terms of culture, geography, specialties, and places that the training data cannot touch—you are probably better off running a hive mind first.”
But the HR harassment case revealed the AI hive mind’s limitations. While the system performed on a par with the British CHRO in the group, it struggled to keep up with regional experts from Pakistan, Bangladesh, and the Middle East. These CHROs revealed missed gaps the LLMs couldn’t possibly know because they hadn’t been trained on them. Think cultural concepts as idiosyncratic as Pakistan’s “seth company culture,” which even Indians may have little clue about.
Mudassir’s assessment of the AI hive mind: “Figuring out the nuance, which is idiosyncratic to a city, to a town, to a state, in a country which isn’t heavily digitized—you’ll probably get wrong answers.”
The surprising discovery
One finding caught Mudassir’s team by surprise: At high creativity settings, they expected incoherent outputs from random pattern combinations. Instead, the agents’ arguments prefiltered the garbage at source. Because multiple agents challenged each idea, weak concepts got pushed down in priority while strong and useful ideas rose to the top.
What’s more, the infrastructure barriers are lower than you might think. Tools like Langflow enable anyone to get started on creating their own AI hive mind today. Engineers and the more technically inclined might use more sophisticated tools like LangGraph.
But remember this is augmentation, not replacement. To shine, aim to harness AI’s computational power while preserving the idiosyncratic, culturally embedded wisdom that no training data can yet capture.
As Mudassir put it, “Ultimately, we think of AI not as a decision maker, but a decision enabler. It’s a sandbox.... We should not treat it as anything more than that.”
Published Dec. 4, 2025, by INSEAD.

Add new comment