Overview

AcquireSpeech is the integrated speech recognition interface for the Toolkit.  AcquireSpeech provides real time monitoring, transcripts, and recording, as well as allowing for direct text input and playback of recorded speech samples. It has been designed with a focus on configurability and usability, allowing for different speech recognition systems and usage scenarios. AcquireSpeech is not a speech recognizer itself, but a tool that connects the sound input on your computer to the included speech recognition server.  By default, the Toolkit uses PocketSphinx, but AcquireSpeech is compatible with a number of ASR systems.

Quick facts:

Users

AcquireSpeech is the key component for interacting with the Toolkit.  AcquireSpeech is designed for use with a microphone, but also supports text input of user speech and the use of prerecorded speech samples.

 

Launching AcquireSpeech

AcquireSpeech launches by default when 'Launch', under the 'Run It All' group in the Virtual Humans Launcher, is pressed.

To launch AcquireSpeech manually, expand the Launcher options with the 'Advanced' button and press 'Launch' which is found next to 'Speech Recognition' in the 'Input/Output' tab.  This will launch both the PocketSphinx Wrapper and AcquireSpeech.  Typically, there is a slight delay while PocketSphinx loads before AcquireSpeech connects to it.

Setting up a Microphone

AcquireSpeech is configured to automatically select your default audio input on start and usually does not require manual configuration.  However, it can be configured by either a configuration file or the Settings Tab.

Speaking to a Character

The Recorder tab is the primary interface for AcquireSpeech.  Go there to start a session, monitor your microphone level, and/or trigger recording with the press-to-talk 'Speak' button.

To start an AcquireSpeech session, click the green 'Start Session' button in the upper right, or Ctrl+r.  Once the session has started, the large 'Speak' button in the top of the tab will brighten and become interactive.  When depressed, the 'Speak' button turns bright green.  

To say something to the Toolkit, press the 'Speak' button and hold it down while speaking.  The horizontal level meter shows the input levels as you speak.  The meter indicates the correct level with green; if there is a lot of red or yellow, try turning down the microphone volume.  When finished speaking, release the button.  The utterances spoken, or the ASR's best guess, will appear in the 'Text' area below.  The speaker can be changed with the drop down box, if necessary.  The default is "user" throughout the Toolkit.

To lock the mouse to the 'Speak' button, such as during a presentation, press Ctrl+m.  This will also unlock the cursor when you are finished.  When the mouse is locked, the 'Speak' button darkens, to either dark gray or dark green.

To stop a session, either to end a Toolkit session or to change settings, click the red 'Stop Session' button.

 

Configuring AcquireSpeech

Message API

Receives

 vrAllCall  

Broadcasts AcquireSpeech VHMsg ID. It is set to asr.

vrKillComponent (all | asr)

Stops AcquireSpeech.  If the recording is running, AcquireSpeech will attempt to shut it down gracefully.

acquireSpeech action action description in xml

Tell AcquireSpeech to perform the specified action.  The action xml schema can be found insrc/java/acquirespeech.xsd.  It takes two attributes:

1. targetComponentID defines the component to apply this action to; if the ID is missing, the action is applied to the overall model structure

2. command defines the action command; see the list of action commands for the complete reference

Output

AcquireSpeech sends out events transmitting recognition result and notifications about the state of recording

vrSpeech  

see vrSpeech documentation

acquireSpeech startedListening inputComponentID sessionID utteranceID recordingTimeInMilliseconds

This message is sent when an utterance recording starts.

acquireSpeech stoppedListening inputComponentID sessionID utteranceID recordingTimeInMilliseconds

This message is sent when an utterance recording stops.

acquireSpeech info message

This message is sent when a group component selection is changed.

acquireSpeech startedSession inputComponentID sessionID timeInMilliseconds

This message is sent when a session recording starts.

acquireSpeech stoppedSession inputComponentID sessionID timeInMilliseconds

This message is sent when a session recording stops.

Known Issues

FAQ

See main FAQ.