Feature Story

Evidence in deep trouble?

It’s 2019, and in a private hearing a British Family Court listens to a child custody battle. A mother has accused her children’s father of threatening her over the phone. She says she has the evidence to prove it. The father denies the accusation. Proven to be true, it would destroy his character and ruin his chance at maintaining joint custody of his children. To his surprise, the mother produces an audio recording of him making violent threats towards her over the phone. He admits to his lawyer that the voice he hears is his own, but he vehemently denies he ever threatened his children’s mother.

Trusting his client and looking for an explanation, the father’s lawyer obtains the recording and passes it on to audio experts for examination. By observing the recording’s metadata, the experts are able to find that parts of the audio have been edited. It’s then that they realise a computer-generated clone of the father’s voice has been used to make it sound like he really did threaten his wife. Their findings are presented to the court, and the evidence is dismissed as false by a shocked judge.

It sounds like a scene taken straight out of the nightmarish anthology series Black Mirror. But what this scene actually describes is Britain’s first reported case of deepfake audio being presented as evidence in a court. Also known as voice cloning or synthetic voice generation, deepfake audio is a product of the rapidly evolving field of machine learning. A voice clone is produced using artificial intelligence (AI) that analyses recorded samples of a real person’s voice to make a synthetic recreation. A text-to-speech generator can then use the synthetic voice to vocalise any text.

The voice cloning market is expected to be valued at A$16.3 billion globally by 2030. Photo: Seamus Harrison.

This technology, like many forms of artificial intelligence, has grown significantly more accurate and accessible since 2019. Companies like Eleven Labs and Resemble AI allow people to clone voices for a small fee, and in some cases, at no charge at all.

“Voice cloning is reaching a mature stage where the performance is getting very good scores in technical papers,” says Dr Sonny Pham, a Senior Lecturer with the School of Electrical Engineering, Computing, and Mathematical Sciences at Curtin University.

Dr Sonny Pham says his goal is to goal is to develop technologies that are built on solid ethics and are good for society. Photo: Seamus Harrison.

Pham has been researching artificial intelligence for seventeen years. He’s the co-founder of iCetana and Hyprfire, two successful start-ups that use AI to enhance video surveillance and cyber security, respectively. Sitting across from me in his office, he says the accuracy of voice cloning depends on the number of samples the technology is provided with.

“If you have only one or two samples of a target speaker, chances are the generated voice won’t be that great and people will be able to easily identify it. But if you have several recordings to input, chances are the AI generated speech will be very indistinguishable from a real speaker,” he says.

Pham acknowledges there are both positive and malicious ways in which the rapidly improving technology can be used.

“It opens up a possibility for hackers to capture your voice in a separate context, and use that recording, to build a model that can mimic you,” he says.

“They can use that for the purpose of stealing your digital identity by passing an authentication system and getting unauthorised access to your account.”

As discovered in Britain, this technology can also be used to falsify audio evidence in courtrooms. It might not be long until deepfakes make their way into Western Australia’s own justice system. When that time comes, will the state’s courts be able to separate fact from fiction?

Evidence law researcher and lecturer at UWA Aidan Ricciardo says, “The legal profession by and large doesn’t view it as a new problem, because it’s just an evolution of an existing problem. That is sometimes people submit inauthentic evidence.”

Aidan Ricciardo authors a blog covering the latest developments in evidence law within the state. Photo: Supplied.

Speaking from his office at UWA, Ricciardo’s passion for evidence law is apparent as he explains that submission of false evidence goes back to forged signatures and documents.

“If a lawyer’s client claims never to have signed something with their signature on it, they can contact a handwriting expert to investigate the signature and make sure that any holes about its authenticity can be uncovered,” says Ricciardo.

If there’s any issues relating to authenticity that the attorney wants to bring up at the trial, then they can bring the handwriting specialist to the trial to act as an expert witness. The specialist will then explain their findings to the judge or jury, depending on the case. It’s then up to them to use all available evidence to decide who to believe.

Ricciardo says the same process would apply to synthetic voice recordings.

“There are a lot of experts who are, with the current state of technology, pretty good at determining whether something has been doctored or not,” he says.

“That might change in the future, and that’s really scary. But at the moment deepfakes can be sufficiently countered by appropriate experts who can speak to the authenticity of something, just as handwriting experts have always done with people’s signatures.”

As deepfake technology becomes more accessible, Ricciardo says it’s inevitable that we will start to see it in Western Australian courtrooms.

He says because lawyers already have ways to combat forged evidence in court, the only thing that would allow synthetic audio to impact a trial is a lack of awareness among judges and legal practitioners.

“It’s probably important then for lawyers, more than anyone, to have an awareness that audio, video and pictures can now be faked,” says Ricciardo.

It seems, therefore, the best way to prevent deepfake audio from muddying the truth in WA’s justice system is education. It’s important for attorneys to be aware that people’s voices can now be accurately reproduced. This will allow them to better protect their clients when they state, “I never said that.”