Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

TTS Engines:

Rhetorical (RVoiceRelay)

Voice Codes:
set character doctor voice remote M021 <- Saso Doctor's voice
set character elder voice remote M009 <- Saso Elder's voice

Cerevoice (CerevoiceRelay)

Voice Codes:
set character doctor voice remote star
set character doctor voice remote katherine
set character doctor voice remote starconv

Cepstral (CepstralRelay)

MSSpeech (MSSpeechRelay)

Voice Codes:
set character doctor voice remote BradVoice

Festival (FestivalRelay)

Voice Codes:
set character doctor voice remote BradVoice

RemoteSpeech Interface

To trigger a TTS call:

sbm bml char doctor speech "Hello world.  Testing Text to Speech"

Sent by Smartbody to TTS Engine:

RemoteSpeechCmd speak doctor 1 M021 ../../data/cache/audio/utt_20110528_175743_doctor_1.aiff
<?xml version="1.0" encoding="UTF-8"?>
<speech type="text/plain">
   Hello world.  Testing Text to Speech
</speech>

RVoiceRelay Example:
Actual message sent to Rhetorical:

<?xml version="1.0" encoding="UTF-8"?>
<speech type="text/plain">
   Hello world.  Testing Text to Speech
</speech>

Sent by TTS Engine:

RemoteSpeechReply doctor 2 OK:
<?xml version="1.0" encoding="UTF-8"?>
<speak>
   <soundFile name="d:\edwork\saso\core\beavin\..\..\data\cache\audio\utt_20110528_180148_doctor_2.aiff"/>
   <viseme start="0.0" type="_"/>
   <word end="0.4049886621315193" start="0.049977324263038546">
      <viseme start="0.049977324263038546" type="Ih"/>
      <viseme start="0.14498866213151929" type="Ih"/>
      <viseme start="0.2" type="D"/>
      <viseme start="0.2549659863945578" type="OW"/>
   </word>
   <word end="0.8099773242630386" start="0.4049886621315193">
      <viseme start="0.4049886621315193" type="OO"/>
      <viseme start="0.5199546485260771" type="Er"/>
      <viseme start="0.5849886621315192" type="R"/>
      <viseme start="0.6649886621315193" type="D"/>
      <viseme start="0.7699773242630386" type="D"/>
   </word>
   <viseme start="0.8099773242630386" type="_"/>
   <viseme start="0.860498866213152" type="_"/>
   <viseme start="1.060498866213152" type="_"/>
   <word end="1.5854875283446712" start="1.1104761904761904">
      <viseme start="1.1104761904761904" type="D"/>
      <viseme start="1.1574603174603175" type="Ih"/>
      <viseme start="1.2354648526077097" type="Z"/>
      <viseme start="1.3304761904761904" type="D"/>
      <viseme start="1.3824943310657596" type="Ih"/>
      <viseme start="1.4374603174603175" type="NG"/>
   </word>
   <word end="1.8724716553287981" start="1.5854875283446712">
      <viseme start="1.5854875283446712" type="D"/>
      <viseme start="1.6424943310657596" type="Ih"/>
      <viseme start="1.7174603174603174" type="KG"/>
      <viseme start="1.7674829931972789" type="Z"/>
      <viseme start="1.8374603174603175" type="D"/>
   </word>
   <word end="1.927482993197279" start="1.8724716553287981">
      <viseme start="1.8724716553287981" type="D"/>
      <viseme start="1.9024943310657596" type="Ih"/>
   </word>
   <word end="2.408480725623583" start="1.927482993197279">
      <viseme start="1.927482993197279" type="Z"/>
      <viseme start="2.0224943310657597" type="BMP"/>
      <viseme start="2.1174603174603175" type="EE"/>
      <viseme start="2.207482993197279" type="j"/>
   </word>
   <viseme start="2.408480725623583" type="_"/>
   <viseme start="2.4584580498866213" type="_"/>
</speak>

MSSpeechRelay Example:

Actual message sent to MSSpeech:

<speak version="1.0" xml:lang="en-US">
    Hello world.  Testing Text to Speech .
</speak>

Sent by TTS Engine:

RemoteSpeechReply doctor 1 OK:
<?xml version="1.0" encoding="UTF-8"?>
<speak>
   <soundFile name="d:\edwork\vhtoolkit\data\cache\audio\utt_20110528_180527_doctor_1.wav"/>
   <viseme start="0" type="_"/>
   <viseme start="0.003" type="Oh"/>
   <viseme start="0.047" type="Ih"/>
   <viseme start="0.098" type="D"/>
   <viseme start="0.258" type="Oh"/>
   <viseme start="0.418" type="Oh"/>
   <viseme start="0.479" type="Er"/>
   <viseme start="0.54" type="R"/>
   <viseme start="0.601" type="D"/>
   <viseme start="0.695" type="D"/>
   <viseme start="0.745" type="_"/>
   <viseme start="1.367" type="_"/>
   <viseme start="1.37" type="D"/>
   <viseme start="1.461" type="Ih"/>
   <viseme start="1.546" type="Z"/>
   <viseme start="1.6" type="D"/>
   <viseme start="1.654" type="Ih"/>
   <viseme start="1.729" type="KG"/>
   <viseme start="1.804" type="D"/>
   <viseme start="1.9" type="Ih"/>
   <viseme start="2.022" type="KG"/>
   <viseme start="2.087" type="Z"/>
   <viseme start="2.16" type="D"/>
   <viseme start="2.233" type="D"/>
   <viseme start="2.297" type="Oh"/>
   <viseme start="2.341" type="Z"/>
   <viseme start="2.425" type="BMP"/>
   <viseme start="2.509" type="Ih"/>
   <viseme start="2.606" type="j"/>
   <viseme start="2.73" type="_"/>
</speak>

Sent by TTS Engine (CerevoiceRelay Example) (hand-formatted):

RemoteSpeechReply doctor 1 OK:
<?xml version="1.0" encoding="UTF-8"?>
<speak>
   <soundFile name="d:\edwork\saso\data\cache\audio\utt_20110621_192933_doctor_1.wav"/>
   <viseme start="0.000000" type="_"/>
   <mark name="sp1:T0" time="0.010975"/>
   <mark name="sp1:T1" time="0.010975"/>
   <word end="2.468209" start="0.010975">
      <viseme start="0.010975" type="Ih"/>
      <viseme start="0.090975" type="Ih"/>
      <viseme start="0.120952" type="D"/>
      <viseme start="0.231157" type="Oh"/>
      <viseme start="0.430088" type="OO"/>
      <viseme start="0.527008" type="Er"/>
      <viseme start="0.663673" type="D"/>
      <viseme start="0.723719" type="D"/>
      <viseme start="0.768662" type="D"/>
      <viseme start="0.848662" type="Ih"/>
      <viseme start="0.948662" type="Z"/>
      <viseme start="1.113696" type="D"/>
      <viseme start="1.173651" type="Ih"/>
      <viseme start="1.223510" type="NG"/>
      <viseme start="1.357624" type="D"/>
      <viseme start="1.431655" type="Ih"/>
      <viseme start="1.511610" type="KG"/>
      <viseme start="1.566621" type="Z"/>
      <viseme start="1.636644" type="D"/>
      <viseme start="1.696644" type="Oh"/>
      <viseme start="1.833379" type="Z"/>
      <viseme start="1.958231" type="BMP"/>
      <viseme start="2.028209" type="EE"/>
      <viseme start="2.188209" type="j"/>
   </word>
   <mark name="sp1:T2" time="2.468209"/>
   <mark name="sp1:T3" time="2.468209"/>
   <viseme start="2.468209" type="_"/>
</speak>

Sent by TTS Engine (FestivalRelay Example) (hand-formatted):

RemoteSpeechReply doctor 1 OK:
<?xml version="1.0" encoding="UTF-8"?>
<speak>
   <soundFile name="d:\edwork\vhtoolkit\bin\FestivalRelay\data\cache\festival\utt_20110621_193643_doctor_1.wav"/>
   <viseme start="0.000000" type="_" />
   <mark name="T0" time="0.080000"/>
   <word end="1.920000" start="0.080000" >
      <viseme start="0.080000" type="Ih" />
      <viseme start="0.160000" type="Ih" />
      <viseme start="0.240000" type="D" />
      <viseme start="0.320000" type="Oh" />
      <viseme start="0.400000" type="Er" />
      <viseme start="0.440000" type="R" />
      <mark name="T1" time="0.480000"/>
   </word>
   <mark name="T2" time="0.080000"/>
   <word end="1.920000" start="0.080000" >
      <viseme start="0.480000" type="D" />
      <viseme start="0.560000" type="D" />
      <viseme start="0.640000" type="D" />
      <viseme start="0.720000" type="Ih" />
      <viseme start="0.800000" type="Z" />
      <viseme start="0.880000" type="D" />
      <viseme start="0.960000" type="Ih" />
      <viseme start="1.040000" type="NG" />
      <viseme start="1.120000" type="D" />
      <viseme start="1.200000" type="Ih" />
      <viseme start="1.280000" type="KG" />
      <viseme start="1.360000" type="Z" />
      <viseme start="1.440000" type="D" />
      <viseme start="1.520000" type="Ao" />
      <viseme start="1.600000" type="Z" />
      <viseme start="1.680000" type="BMP" />
      <viseme start="1.760000" type="EE" />
      <viseme start="1.840000" type="j" />
      <mark name="T3" time="1.920000"/>
   </word>
   <viseme start="1.920000" type="_" />
</speak>
  • No labels