the process of creating a character is iterative in nature. The following steps create an initial character that will probably need to be refined by repeating the steps as required to obtain the desired behavior.
Three actions are required to create the content necessary to drive the natural language component:
The entire information that defines a character, e.g. CakeVendor, for the FLoReS system is defined in a set of files sitting in the directory resources/characters/CakeVendor/
This directory contains three sub-directories that parallel the 3 steps defined above:
Authoring the content consists of editing 2 files. One for the user utterances and one for the system utterances. The file that contains the user utterances is basically the training data for the natural language understanding (NLU) module. The system utterance file instead contains the utterances that the character can say.
The NLU module given an utterance returns the most probably identifying strings. It's based on a maximum entropy multiclass classifier and therefore the user utterance file should list utterances maintaining their natural frequency. That is, the best way to obtain these utterances is by running wizard of oz experiments or role plays. Then annotate the data by assigning to each utterance said by a user during these experiments an identifying string. These identifying strings are sometime called speech acts or dialogue acts (in case more domain specific semantic is attached to the basic speech act). Examples of dialogue acts are: question.age to mark all utterances in which the user is asking about the age of the addressee.
When we design a dialogue policy for the character using this content, whenever we want to wait for the user to say a certain utterance, we will use the string identifier (speech act) associated to that utterance.
these uttearnces
The user and system utterances files use the same Excel spreadsheet format. These files have a number of columns (these are the initial 2 rows of the system utterances file for the character used in the example below):
UTTERANCE_ID | VERSION | CHARACTER | STATE | SPEECH_ACT | TEXT |
statement.not-understand | I'm sorry, I didn't understand what you said. Please try to rephrase it. | ||||
greeting.hello | Hello |
The only 2 columns of relevance are SPEECH_ACT and TEXT.
SPEECH_ACT contains the string identifier for the corresponding utterance found in the TEXT column.
The user utterance file needs to be called user-utterances.xlsx and the system utterance file must be called system-utterances.xlsx (these names can be configured, but the default configuration looks for those names in each character available).
CakeVendor.zip contains all is required to define a CakeVendor character that is an extension of the character created in this other tutorial for NPCEditor.