Page tree

Overview

The Google and AT&T Automatic Speech Recognition (ASR) is a java application which acts as a speech server which uses either Google's or AT&T's speech recognition engine based on what you select. Both Google and AT&T implementations share the same code (it's a jar that implements the sonic protocol and both use the same jar). 

They are both web services and therefore require an internet connection.

Both services do not provide partial interpretations in real time, just the final interpretation. This component is used in connection with Acquirespeech. The audio is accumulated while the push-to-talk is pressed and sent it over to the server when the button is released. The results are then received and communicated back to Acquirespeech.

Quick facts:

  • Location: core/GoogleASR
  • Language: Java
  • Distribution: source

Users

Using the AT&T ASR

The AT&T service requires a subscription and allows for custom language models to be trained. Once you subscribe, you will receive a username and password to access your account.

You can find more details on how to subscribe and use the ASR engine here.

In order to create a language model

  1. prepare a file with one utterance ,that you want to be recognized, for each line of the file
  2. then select 'Manage grammar files'
  3. then select upload and select the text file (it needs to have the .train extension)
  4. then compile it
  5. then you can rename it to any other id you want to use)

Note: In order to use the AT&T ASR from the Toolkit launcher. Select "ATNT" as your speech server and your client type. you will need to provide your username and password as command line parameters. This can be done in the space provided for it in the launcher as shown below.

 

Using the Google ASR

The Google ASR doesn't require anything and cannot be customized (but the current settings work pretty well). The major limitation of the Google ASR is that it processes audio with a max length of 10 seconds long.

Command line parameters

-h display help
-l <arg> The language model ID to be used in the AT&T ASR
-m <arg> Maximum audio length (seconds) before sending request to service
-p <arg> Port to listen for sonic activity
-t <arg> Type of the asr to run (Google or AT&T), use G for Google and A for AT&T
-u <arg> The UID to use the AT&T MashUp site

usage: [-p TCP port  ][-m seconds ] [-h] [-t G|A (G is default)] [-u AT&T_MashUp_UID ] [-l Language model ID for AT&T ]

Known Issues

  • Google ASR only processes 10 second maximum for the audio.

FAQ

See the Main FAQ. Please use the Google Groups emailing list for unlisted questions.

  • No labels