JoyVoice Highlight
Figure 1: Model Architecture
Figure 2: Intelligibility Benchmarks
Multi-Speaker
Zero-Shot
Voice Clone
Bring Every Conversation to Life
JoyVoice empowers you to craft dynamic conversations for 2~8 speakers with incredible realism. Enjoy flawless consistency, stable character voices, and expressive delivery that makes every line feel alive.
Crosstalk Performers: Guo Degang & Yu Qian
CrosstalkPrompt Audio
Proposed Systems
Other Models
Tech Podcast (Chinese)
PodcastReference Audio
Generated Audio
Podcast by Luo Yonghao & He Tongxue
PodcastPrompt Audio
Proposed Systems
Other Models
Peppa Pig (Cartoon Multi-Role)
CartoonPrompt Audio
Proposed Systems
Other Models
Podcast by 3 Speakers
PodcastPrompt Audio
Proposed Systems
Other Models
Infernal Affairs: Andy Lau & Tony Leung
Movie & TVPrompt Audio
Proposed Systems
Other Models
Mystery Audiobook "Zheyun"
AudiobookPrompt Audio
Proposed Systems
Other Models
Whispers by Man & Woman
ChattingPrompt Audio
Proposed Systems
Other Models
Interview by Trump & Host
InterviewPrompt Audio
Proposed Systems
Other Models
Journey to the West
Movie & TVPrompt Audio
Proposed Systems
Other Models
Customized
Multi-Speaker
Post-Training
Lower Cost, Greater Convenience, More Authentic
There's no speech more natural than natural speech. By fine-tuning JoyVoice with just 10 minutes of ordinary-quality natural recording, a highly anthropomorphic multi-speaker model is ready to use.
Livestream Shopping
LivestreamLivestream Shopping
LivestreamEntertainment
PodcastTechnology
Podcast
Single
Speaker
Voice
Clone
Capture Every Nuance
JoyVoice excels in single-speaker voice cloning tasks, capable of highly expressive cloning of prosody, timbre, emotion, volume, speech rate, and paralinguistic information from reference audio. It also achieves cross-lingual voice cloning supporting Chinese, English, Japanese, Korean, and multiple Chinese dialects.