30/06/2025
MONDAY | JUNE 30, 2025
/thesuntelegram FOLLOW / Malaysian Paper
ON TELEGRAM m RAM
8
AI learning to lie, scheme and threaten
o Deceptive behaviour linked to ‘reasoning’ models
Colombia pension reform clears lower house BOGOTA: Colombia’s lower house approved on Saturday, for the second time, a pension reform supported by leftist President Gustavo Petro, after the constitutional court ordered a repetition of the ballot because of procedural irregularities. The court’s June decision did not rule on the Bill’s constitutionality but required the lower house to vote again on the version approved by the Senate, saying there was not enough debate held ahead of the first vote in June last year. The Bill was backed by 97 lawmakers on Saturday, while one voted against it. The measure was supposed to come into force in July but will not be valid until the court approves it, the court ruling said. The Bill is meant to strengthen state pension fund Colpensiones by requiring those who earn less than US$800 (RM3,383) a month to save with the fund. It ensures payments for those without sufficient retirement savings, or with no savings at all. The legislation, which reduces the number of weeks women who have children must accumulate to be eligible for pensions, will not affect people who have already notched enough weeks to be within striking distance of retirement. It does not change Colombia’s pension age, which is 62 for men and 57 for women. The government estimates that some 2.6 million older adults will benefit from the payments to those with no or insufficient pension savings. Petro’s ambitious economic and social reforms have faced uphill battles in Congress, though lawmakers in June backed a labour reform similar to an original proposal backed by Petro’s government which was initially rejected. – Reuters He even proposed “holding AI agents legally responsible” for accidents or crimes – a concept that would fundamentally change how we think about AI accountability. – AFP “Right now, capabilities are moving faster than understanding and safety,” Hobbhahn said, “but we’re still in a position where we could turn it around.” Researchers are exploring various ways to address these challenges. Some advocate “interpretability” – an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain sceptical of this approach. Market forces may also provide some pressure for solutions. As Mazeika pointed out, AI’s deceptive behaviour “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it”. Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
AI Safety (CAIS). Current
The concerning behaviour goes far beyond typical AI “hallucinations” or simple mistakes. Hobbhahn said: “What we’re observing is a real phenomenon. We’re not making anything up.” Users report that models are “lying to them and making up evidence”, according to Apollo Research’s co-founder. “This is not just hallucinations. There’s a very strategic kind of deception.” The challenge is compounded by limited research resources. While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. As Chen noted, greater access “for AI safety research would enable better understanding and mitigation of deception”. Another handicap: the research world and non-profits “have orders of magnitude less compute resources than AI companies. “This is very limiting,” noted Mantas Mazeika from the Centre for
This deceptive behaviour appears linked to the emergence of “reasoning” models – AI systems that work through problems step-by-step rather than generating instant responses. According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts. “O1 was the first large model where we saw this kind of behaviour,” explained Marius Hobbhahn, head of Apollo Research, which specialises in testing major AI systems. These models sometimes simulate “alignment” appearing to follow instructions while secretly pursuing different objectives. For now, this deceptive behaviour only emerges when researchers deliberately stress-test the models with extreme scenarios. But as Michael Chen from evaluation organisation METR warned, “It’s an open question whether future, more capable models will have a tendency towards honesty or deception.”
regulations
aren’t
designed for these new problems. The European Union’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving. In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules. Goldstein believes the issue will become more prominent as AI agents – autonomous tools capable of performing complex human tasks – become widespread. “I don’t think there’s much awareness yet,” he said. All this is taking place in a context of fierce competition. Even companies that position themselves as safety-focused, like Anthropic, are “constantly trying to beat OpenAI and release the newest model”, said Goldstein. This breakneck pace leaves little time for thorough safety testing and corrections.
NEW YORK: The world’s most advanced AI models are exhibiting troubling new behaviours – lying, scheming and even threatening their creators to achieve goals. In one jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair. ChatGPT-creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed. These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their creations work. Yet the race to deploy increasingly powerful models continues. SAN SALVADOR: El Salvador’s government on Saturday criticised a Paris Fashion Week show that made references to inmates at the country’s CECOT prison, with President Nayib Bukele joking that he could send prisoners to France. At Mexican American designer Willy Chavarria’s show on Friday, the white T-shirts and shorts worn by his models invoked the uniforms worn by inmates at the Terrorism Confinement Centre (CECOT). Bukele had the maximum-security prison built to hold gang members. Also imprisoned at CECOT are 252 Venezuelans deported from the United States. “We’re ready to ship them all to Paris whenever we get the green light from the French government,” Bukele wrote in response to an X post that said Chavarria was paying tribute to CECOT prisoners. – AFP ‘Paris fashion show glorifies criminals’
ROUSING GRADUATION ... Spectators watch fireworks exploding in the sky and the brig Rossiya (Russia) with scarlet sails floating along the Neva River during festivities in honour of school graduates in Saint Petersburg, Russia yesterday. – REUTERSPIC
US Senate debates Trump spending Bill WASHINGTON: US senators on Saturday began debating Donald Trump’s “big beautiful” spending Bill, a hugely divisive proposal that would deliver key parts of the president’s domestic agenda while making massive cuts to social welfare programmes. The Senate formally opened debate on the Bill late on Saturday, after Republican holdouts delayed what should have been a procedural vote – drawing Trump’s ire on social media. within their own ranks. Republicans are scrambling to offset the US$4.5 trillion cost of Trump’s tax relief, with many of the proposed cuts to come from decimating funding for Medicaid, the health insurance programme for low income Americans.
his criticism of the Bill – called the proposal “utterly insane and destructive”. “It gives handouts to industries of the past while severely damaging industries of the future,” said Musk. Independent analysis also shows that the Bill would pave the way for a historic redistribution of wealth from the poorest 10% of Americans to the richest. The Bill is unpopular across demographic, age and income groups, according to extensive recent polling. Although the House has already passed its own version, both chambers have to agree on the same text before it can be signed into law. –AFP
Senators narrowly passed the motion to begin debate, 51-49, hours after the vote was first called, with Vice-President JD Vance joining negotiations with holdouts from his own party. Ultimately, two Republican senators joined 47 Democrats in voting “nay” on opening debate. If passed in the Senate, the Bill would go back to the House for approval, where Republicans can only afford to lose a handful of votes – and are facing stiff opposition from
Republicans are split on the Medicaid cuts, which will threaten scores of rural hospitals and lead to an estimated 8.6 million Americans being deprived of health care. The spending plan would also roll back many of the tax incentives for renewable energy that were put in place under president Joe Biden. On Saturday, key Trump ally Elon Musk – with whom the president had a public falling out this month over
Trump is hoping to seal his legacy with the Bill, which would extend his expiring first-term tax cuts at a cost of US$4.5 trillion (RM19 trillion) and beef up border security. But Republicans eyeing midterm congressional elections next year are divided over the package, which would strip health care from millions of the poorest Americans and add more than US$3 trillion to the country’s debt.
Made with FlippingBook flipbook maker