Voice Codes:
set character doctor voice remote M021 <- Saso Doctor's voice
set character elder voice remote M009 <- Saso Elder's voice
Voice Codes:
set character doctor voice remote star
set character doctor voice remote katherine
set character doctor voice remote starconv
Voice Codes:
set character doctor voice remote BradVoice
Voice Codes:
set character doctor voice remote BradVoice
To trigger a TTS call:
sbm bml char doctor speech "Hello world. Testing Text to Speech"
Sent by Smartbody to TTS Engine:
RemoteSpeechCmd speak doctor 1 M021 ../../data/cache/audio/utt_20110528_175743_doctor_1.aiff <?xml version="1.0" encoding="UTF-8"?> <speech type="text/plain"> Hello world. Testing Text to Speech </speech>
Actual message sent to Rhetorical:
<?xml version="1.0" encoding="UTF-8"?> <speech type="text/plain">Hello world. Testing Text to Speech</speech>
Sent by TTS Engine:
RemoteSpeechReply doctor 2 OK: <?xml version="1.0" encoding="UTF-8"?> <speak> <soundFile name="d:\edwork\saso\core\beavin\..\..\data\cache\audio\utt_20110528_180148_doctor_2.aiff"/> <viseme start="0.0" type="_"/> <word end="0.4049886621315193" start="0.049977324263038546"> <viseme start="0.049977324263038546" type="Ih"/> <viseme start="0.14498866213151929" type="Ih"/> <viseme start="0.2" type="D"/> <viseme start="0.2549659863945578" type="OW"/> </word> <word end="0.8099773242630386" start="0.4049886621315193"> <viseme start="0.4049886621315193" type="OO"/> <viseme start="0.5199546485260771" type="Er"/> <viseme start="0.5849886621315192" type="R"/> <viseme start="0.6649886621315193" type="D"/> <viseme start="0.7699773242630386" type="D"/> </word> <viseme start="0.8099773242630386" type="_"/> <viseme start="0.860498866213152" type="_"/> <viseme start="1.060498866213152" type="_"/> <word end="1.5854875283446712" start="1.1104761904761904"> <viseme start="1.1104761904761904" type="D"/> <viseme start="1.1574603174603175" type="Ih"/> <viseme start="1.2354648526077097" type="Z"/> <viseme start="1.3304761904761904" type="D"/> <viseme start="1.3824943310657596" type="Ih"/> <viseme start="1.4374603174603175" type="NG"/> </word> <word end="1.8724716553287981" start="1.5854875283446712"> <viseme start="1.5854875283446712" type="D"/> <viseme start="1.6424943310657596" type="Ih"/> <viseme start="1.7174603174603174" type="KG"/> <viseme start="1.7674829931972789" type="Z"/> <viseme start="1.8374603174603175" type="D"/> </word> <word end="1.927482993197279" start="1.8724716553287981"> <viseme start="1.8724716553287981" type="D"/> <viseme start="1.9024943310657596" type="Ih"/> </word> <word end="2.408480725623583" start="1.927482993197279"> <viseme start="1.927482993197279" type="Z"/> <viseme start="2.0224943310657597" type="BMP"/> <viseme start="2.1174603174603175" type="EE"/> <viseme start="2.207482993197279" type="j"/> </word> <viseme start="2.408480725623583" type="_"/> <viseme start="2.4584580498866213" type="_"/> </speak>
Actual message sent to MSSpeech:
<speak version="1.0" xml:lang="en-US">Hello world. Testing Text to Speech .</speak>
(note the added period at the end)
Sent by TTS Engine:
RemoteSpeechReply doctor 1 OK: <?xml version="1.0" encoding="UTF-8"?> <speak> <soundFile name="d:\edwork\vhtoolkit\data\cache\audio\utt_20110528_180527_doctor_1.wav"/> <viseme start="0" type="_"/> <viseme start="0.003" type="Oh"/> <viseme start="0.047" type="Ih"/> <viseme start="0.098" type="D"/> <viseme start="0.258" type="Oh"/> <viseme start="0.418" type="Oh"/> <viseme start="0.479" type="Er"/> <viseme start="0.54" type="R"/> <viseme start="0.601" type="D"/> <viseme start="0.695" type="D"/> <viseme start="0.745" type="_"/> <viseme start="1.367" type="_"/> <viseme start="1.37" type="D"/> <viseme start="1.461" type="Ih"/> <viseme start="1.546" type="Z"/> <viseme start="1.6" type="D"/> <viseme start="1.654" type="Ih"/> <viseme start="1.729" type="KG"/> <viseme start="1.804" type="D"/> <viseme start="1.9" type="Ih"/> <viseme start="2.022" type="KG"/> <viseme start="2.087" type="Z"/> <viseme start="2.16" type="D"/> <viseme start="2.233" type="D"/> <viseme start="2.297" type="Oh"/> <viseme start="2.341" type="Z"/> <viseme start="2.425" type="BMP"/> <viseme start="2.509" type="Ih"/> <viseme start="2.606" type="j"/> <viseme start="2.73" type="_"/> </speak>
Actual text sent to cerevoice engine:
<?xml version="1.0" encoding="UTF-8"?> <speech type="text/plain">Hello world. Testing Text to Speech </speech>
(note the space, also note that cerevoicerelay removes punctuation because of an apparent bug in cerevoice)
Sent by TTS Engine (CerevoiceRelay Example) (hand-formatted):
RemoteSpeechReply doctor 1 OK: <?xml version="1.0" encoding="UTF-8"?> <speak> <soundFile name="d:\edwork\saso\data\cache\audio\utt_20110621_192933_doctor_1.wav"/> <viseme start="0.000000" type="_"/> <mark name="sp1:T0" time="0.010975"/> <mark name="sp1:T1" time="0.010975"/> <word end="2.468209" start="0.010975"> <viseme start="0.010975" type="Ih"/> <viseme start="0.090975" type="Ih"/> <viseme start="0.120952" type="D"/> <viseme start="0.231157" type="Oh"/> <viseme start="0.430088" type="OO"/> <viseme start="0.527008" type="Er"/> <viseme start="0.663673" type="D"/> <viseme start="0.723719" type="D"/> <viseme start="0.768662" type="D"/> <viseme start="0.848662" type="Ih"/> <viseme start="0.948662" type="Z"/> <viseme start="1.113696" type="D"/> <viseme start="1.173651" type="Ih"/> <viseme start="1.223510" type="NG"/> <viseme start="1.357624" type="D"/> <viseme start="1.431655" type="Ih"/> <viseme start="1.511610" type="KG"/> <viseme start="1.566621" type="Z"/> <viseme start="1.636644" type="D"/> <viseme start="1.696644" type="Oh"/> <viseme start="1.833379" type="Z"/> <viseme start="1.958231" type="BMP"/> <viseme start="2.028209" type="EE"/> <viseme start="2.188209" type="j"/> </word> <mark name="sp1:T2" time="2.468209"/> <mark name="sp1:T3" time="2.468209"/> <viseme start="2.468209" type="_"/> </speak>
Actual text sent to Festival:
<?xml version="1.0" encoding="UTF-8"?> <speech type="text/plain">Hello world. Testing Text to Speech </speech>
(note that this gets edited by FestivalRelay and eventually gets sent out as 'Helloworld.TestingTexttoSpeech'
Sent by TTS Engine (FestivalRelay Example) (hand-formatted):
RemoteSpeechReply doctor 7 OK: <?xml version="1.0" encoding="UTF-8"?> <speak> <soundFile name="d:\edwork\vhtoolkit\bin\FestivalRelay\data\cache\festival\utt_20110722_185051_doctor_7.wav"/> <viseme start="0.000000" type="_" /> <mark name="T0" time="0.080000"/> <word end="0.640000" start="0.080000" > <viseme start="0.080000" type="Ih" /> <viseme start="0.160000" type="Ih" /> <viseme start="0.240000" type="D" /> <viseme start="0.320000" type="Oh" /> <viseme start="0.400000" type="Er" /> <viseme start="0.440000" type="R" /> <mark name="T1" time="0.480000"/> </word> <mark name="T2" time="0.080000"/> <word end="0.640000" start="0.080000" > <viseme start="0.480000" type="D" /> <viseme start="0.560000" type="D" /> <mark name="T3" time="0.640000"/> </word> <mark name="T4" time="0.640000"/> <word end="0.880000" start="0.640000" > <viseme start="0.640000" type="D" /> <viseme start="0.720000" type="Ao" /> <viseme start="0.800000" type="D" /> <mark name="T5" time="0.880000"/> </word> <mark name="T6" time="0.880000"/> <word end="2.160000" start="0.880000" > <viseme start="0.880000" type="D" /> <viseme start="0.960000" type="Ih" /> <viseme start="1.040000" type="Z" /> <viseme start="1.120000" type="D" /> <viseme start="1.200000" type="Ih" /> <viseme start="1.280000" type="NG" /> <viseme start="1.360000" type="D" /> <viseme start="1.440000" type="Ih" /> <viseme start="1.520000" type="KG" /> <viseme start="1.600000" type="Z" /> <viseme start="1.680000" type="D" /> <viseme start="1.760000" type="Ao" /> <viseme start="1.840000" type="Z" /> <viseme start="1.920000" type="BMP" /> <viseme start="2.000000" type="EE" /> <viseme start="2.080000" type="j" /> <mark name="T7" time="2.160000"/> </word> <viseme start="2.160000" type="_" /> </speak>
RemoteSpeechCmd speak brad 1 BradVoiceFestival ../../data/cache/audio/utt_20110809_151922_brad_1.aiff <?xml version="1.0" encoding="utf-16"?> <speech id="sp1" ref="tech_sapiTTS" type="application/ssml+xml"> <mark name="T0" />SAPI <mark name="T1" /><mark name="T2" />is <mark name="T3" /><mark name="T4" />a <mark name="T5" /><mark name="T6" />speech <mark name="T7" /><mark name="T8" />and <mark name="T9" /><mark name="T10" />text <mark name="T11" /><mark name="T12" />to <mark name="T13" /><mark name="T14" />speech <mark name="T15" /><mark name="T16" />interface <mark name="T17" /><mark name="T18" />by <mark name="T19" /><mark name="T20" />Microsoft. <mark name="T21" /><mark name="T22" />I <mark name="T23" /><mark name="T24" />use <mark name="T25" /><mark name="T26" />it <mark name="T27" /><mark name="T28" />to <mark name="T29" /><mark name="T30" />be <mark name="T31" /><mark name="T32" />able <mark name="T33" /><mark name="T34" />to <mark name="T35" /><mark name="T36" />talk <mark name="T37" /><mark name="T38" />to <mark name="T39" /><mark name="T40" />you. <mark name="T41" /> </speech>