close
close

Led by a founder who sold a video startup to Apple, Panjaya is using deepfake techniques to dabble in video synchronization

There's a big opportunity for generative AI in the world of translation, and a startup called Panjaya is taking the concept to the next level: a hyper-realistic, gene-AI-based video synchronization tool that captures the original voice of a person speaking the new language , recreates the video and the speaker's physical movements automatically to adapt naturally to the new speech patterns.

After operating in secret for the past three years, the startup is unveiling the first version of its product, BodyTalk, while also receiving its first external funding of $9.5 million.

Panjaya is the brainchild of Hilik Shani and Ariel Shalom, two deep learning specialists who have quietly worked on deep learning technology for the Israeli government for most of their professional lives and are now the startup's general manager and CTO, respectively. They hung up their G-man hats in 2021 with the startup itch and Guy Piekarz joined as CEO 1.5 years ago.

Piekarz isn't a founder of Panjaya, but he's a notable name to have on board: In 2013, he sold a startup he founded did found at Apple. Matcha, as the startup was called, was an early, vibrant player in the streaming video discovery and recommendation space, acquired in the early days of Apple's TV and streaming strategy, when it was still more rumors than were actual products. Matcha was bootstrapped and sold for a song: $10 million to $15 million – modest considering how much Apple ultimately pushed streaming media.

Piekarz stayed at Apple for nearly a decade, building Apple TV and then its sports division. He was then introduced to Panjaya through Viola Ventures, one of his backers (others include R-Squared Ventures, JFrog co-founder and CEO Shlomi Ben Haim, Chris Rice, Guy Schory, Ryan Floyd of Storm Ventures and Ali Behnam of Riviera Partners) . , and Oded Vardi.

“I had since left Apple and was planning on doing something completely different,” Piekarz said. “However, seeing a demo of the technology blew me away and the rest is history.”

BodyTalk is interesting because it simultaneously incorporates multiple technologies that address different aspects of synthetic media.

It starts with audio-based translation, which can currently offer translations in 29 languages. The translation is then spoken in a voice that mimics the original speaker, which in turn is set to a version of the original video in which the speaker's lips and other movements are adapted to the new words and phrases. All of this is created automatically for videos after users upload them to the platform, which also has a dashboard with more editing tools. Future plans include an API as well as an approach to real-time processing. (Right now, BodyTalk runs in “near real time” and takes only a few minutes to process videos, Piekarz said.)

“We use the best of breed where we need it,” Piekarz said of the company’s use of large language models and other third-party tools. “And we build our own AI models for which the market has no real solution.”

An example of this is the company's lip syncing, he continued. “Our entire lip sync engine was developed in-house by our AI research team as we have not found anything that achieves this level and quality across multiple speakers, viewpoints and all the business use cases we want to support.”

The focus is currently only on B2B; Customers include JFrog and the TED media organization. The company plans to further expand in media, particularly in areas such as sports, education, marketing, healthcare and medicine.

The resulting translation videos are very creepy, not unlike what you get with deepfakes, although Piekarz cringe at the term, which over the years has taken on negative connotations that are the exact opposite of the market the startup is targeting.

“'Deepfake' is not something we're interested in,” he said. “We want to avoid that whole name.” Instead, think of Panjaya as part of the “deep real category,” he said.

By targeting only the B2B market and controlling who gets access to its tools, the company creates “guardrails” around the technology to protect it from misuse, he added. He also expects there will be more tools in the longer term, including watermarks, to help detect whether videos have been altered to create synthetic media, whether legitimate or nefarious. “We definitely want to be a part of this and not allow any misinformation,” he said.

The not-so-fine print

There are a number of startups competing with Panjaya in the broader AI-based video translation space, including big names like Vimeo and Eleven Labs, as well as smaller players like Speechify and Synthesis. For all of them, developing ways to improve the way syncing works feels a little like going against the grain. That's because subtitles have become an integral part of the way videos are consumed these days.

In television there are a number of reasons for this: poor speakers, background noise in our busy lives, mumbling actors, limited production budgets and more sound effects. CBS found in a survey of American television viewers that more than half of them kept closed captions “some (21%) or all of the time (34%).”

But some people love captions simply because they're entertaining to read, and a whole cult has formed around them.

On social media and other apps, captions are easily integrated into the experience. TikTok, for example, started enabling subtitling by default for all videos in November 2023.

However, there is still a huge market for dubbed content internationally, and while English is often considered the lingua franca of the Internet, there is evidence from research groups such as CSA that content delivered in native languages ​​is particularly popular in the US The B2B context arouses greater interest. Panjaya believes that more natural, native-language content could perform even better.

Some of his customers seem to support this theory. According to TED, talks synced using Panjaya's tools saw a 115% increase in views, with completion rates for these translated videos doubling.