These samples were synthesized using the averaged representation of speakers and accents. First set is without accent conversion, the second (Conv) is with accent conversion.
Utterance 1: He will knock you off a few sticks in no time. Utterance 2: I graduated last of my class. Utterance 3: For the twentieth time that evening the two men shook hands. Utterance 4: I will go over tomorrow afternoon.
Ground Truth | CVAE-NL | CVAE-L | GST | GMVAE | Conv CVAE-NL | Conv CVAE-L | Conv GST | Conv GMVAE |
Speaker: ABA (Arabic) Speaker: HKK (Korean) Speaker: NCC (Chinese) Speaker: SVBI (Hindi) |
These samples were converted to the target accent.
Utterance 1: For the twentieth time that evening the two men shook hands. Utterance 2: And you always want to see it in the superlative degree. Utterance 3: I will go over tomorrow afternoon.
Source Ground Truth | CVAE-NL | CVAE-L | GST | GMVAE |
Speaker:THV (Vietnamese) Accent: Arabic Speaker:THV (Vietnamese) Accent: Hindi Speaker:NCC (Chinese) Accent: Hindi Speaker:NCC (Chinese) Accent: Spanish Speaker:EBVS (Spanish) Accent: Chinese Speaker:EBVS (Spanish) Accent: Korean Speaker:HKK (Korean) Accent: Arabic Speaker:HKK (Korean) Accent: Spanish |