Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 26 Next »

Introduction

the process of creating a character is iterative in nature. The following steps create an initial character that will probably need to be refined by repeating the steps as required to obtain the desired behavior.

Three actions are required to create the content necessary to drive the natural language component:

  1. Create an initial set of system and user utterances. Each utterance needs to be associated to an identifying string (like a name) and the natural language understanding and dialogue management modules will use these identifiers instead of the real utterances.
  2. Create a dialogue policy. This policy consists of:
    1. the set of variables that constitute the dialogue state (the dialogue manager is an information based one. That is it decides what to say based on what the user said and the current state of the conversation as represented in the dialogue state (also called information state)).
    2. the set of sub-dialogue networks. These sub-dialogues are similar to planning operators, with preconditions and effects. The dialogue manager (DM), will select one based on what the user said and the current dialogue state. Each sub-dialogue typically carries out a short portion of an entire dialogue (e.g. the greeting phase, or answering a question).
  3. Train the natural language understanding module to map a given utterance to one of the known identifying strings defined in the first step.

The entire information that defines a character, e.g. CakeVendor, for the FLoReS system is defined in a set of files sitting in the directory resources/characters/CakeVendor/

This directory contains three sub-directories that parallel the 3 steps defined above:

  • content: this directory contains the files that define the user and system utterances and their identifying strings.
  • dm: this directory contains the dialogue policy
  • nlu: this directory contains the natural language understanding model learned by the corresponding module from the data in the data found in the content directory.

Step 1: Authoring the content

Authoring the content consists of editing 2 files. One for the user utterances and one for the system utterances. The file that contains the user utterances is basically the training data for the natural language understanding (NLU) module. The system utterance file instead contains the utterances that the character can say.

User utterances:

The NLU module given an utterance returns the most probably identifying strings. It's based on a maximum entropy multiclass classifier and therefore the user utterance file should list utterances maintaining their natural frequency. That is, the best way to obtain these utterances is by running wizard of oz experiments or role plays. Then annotate the data by assigning to each utterance said by a user during these experiments an identifying string. These identifying strings are sometime called speech acts or dialogue acts (in case more domain specific semantic is attached to the basic speech act). Examples of dialogue acts are: question.age to mark all utterances in which the user is asking about the age of the addressee.

When we design a dialogue policy for the character using this content, whenever we want to wait for the user to say a certain utterance, we will use the string identifier (speech act) associated to that utterance.

System utterances:

These utterances are the one the system can say. Similarly to the user utterances, each utterance has a specific string identifier. When designing the dialogue policy, if we want to say a certain system utterance, we will use the corresponding identifier (also here we call the identifier speech act).

File format:

The user and system utterances files use the same Excel spreadsheet format. These files have a number of columns (these are the initial 2 rows of the system utterances file for the character used in the example below):

TTERANCE_IDVERSIONCHARACTERSTATESPEECH_ACTTEXT
    statement.not-understandI'm sorry, I didn't understand what you said. Please try to rephrase it.
    greeting.helloHello

The only 2 columns of relevance are SPEECH_ACT and TEXT.

SPEECH_ACT contains the string identifier for the corresponding utterance found in the TEXT column.

The user utterance file needs to be called user-utterances.xlsx and the system utterance file must be called system-utterances.xlsx (these names can be configured, but the default configuration looks for those names in each character available).

Step 2: Authoring the dialogue policy

Overview

The FLoReS (Forward Locking Reward Seeking) dialogue manager is an information state and event driven dialogue manager. That is it does nothing unless an event is received. When an event is received it searches for the best action (i.e. sub-dialogue) that can be executed in the current information state and that achieves the highest expected reward. Once the best action is found it start executing it. Unless:

  • the current action in execution is the same as the best action found. In that case the dialogue manager simply continues to execute the action.
  • there is no best action. That can happen in 2 cases:
    • there are no actions that can be executed given the current event and information state.
    • all executable actions have negative expected rewards

As mentioned earlier the dialogue manager searches for the best available action every time an event comes in.

Operators (sub-dialogues or actions):

Actions are also called sub-dialogues and define dialogue trees. For example this is one sub-dialogue found in the CakeVendor example below:

These sub-dialogue trees define a small self-contained portion of conversation. the criteria to use to decide what should be a sub-dialogue is similar to the criteria used to decide what should be a function or method in a programming language: generality and reusability.

For example, the sub-dialogue above takes care of finding out whether the system can sell to the user a cake with normal sugar or with Xylitol based on collecting information about the user having many cavities or having diabetes. Because the utterances found in this sub-dialogue can happen only in that specific context, then it makes sense to keep them in the same sub-dialogue.

A sub-dialogue can be in 3 states: ACTIVE, INACTIVE and DORMANT.

At any time in the system there is at most 1 active sub-dialogue: the current action. As said above, in some cases there may be no active actions. All actions are normally inactive, unless they have been active and they have been substituted (swapped-out) by another action before their natural termination (that is, at some point the system found a better action and so changed the state of the current action to dormant and made the newly found best action as active). Not all actions that are active and are swapped out for a new best action can become dormant. Some will go back directly to the inactive state. An action, to be allowed to become dormant, must have special entry paths that allow for it to be awoken back to the active state in case it becomes again the best action.

Entry paths:

A sub-dialogue has multiple entry paths. The entry paths have a specific order (decided by the author) and each entry path has conditions to regulate when it can be taken and has also a start state. That is when the system during the search for the best available action considers a certain sub-dialogue, it'll considers all the possible entry paths in the order specified. The first that has satisfied conditions will be taken and it'll start the execution of the action at the specified start state in the sub-dialogue tree.

The possible types of entry paths are:

  1. user event entry paths: a user event entry path defines an entry path that can be taken only when certain specified events are received. A user event entry path can also have an optional condition on the information state. This optional condition is a Boolean expression of information state variables.
  2. system initiative entry paths: these entry paths have no events associated with them. They can have optional information state condition as for the user entry paths. These entry paths are used to give to a certain sub-dialogue the possibility of being initiated by the system no matter what event has been received. For example one can define a periodic timer event that wakes-up the dialogue manager periodically even if the user is inactive.
  3. re-entry paths: these paths allow an action that becomes dormant to become active again. Each re-entry path like the other can have an optional information state condition. It can also have an optional system utterance identifier associated to it. This system utterance will be said when the associate re-entry path is taken. An example of a system utterance appropriate for a re-entry path is something like "coming back to where we were..." or "so, i mentioned earlier...".

We refer to the entry paths with their conditions also as preconditions as that is the name traditionally used by the planning community.

Nodes and edges:

The edges of a sub-dialogue tree are of three types:

  1. user edges: these edges tell the system to wait for a certain event before traversing them. If a state has one outgoing edge that is a user edge, then all outgoing edges of that state will be user edge. this property make a state a user waiting state that blocks the execution of the action until the user says any of the events in the outgoing user edges.
  2. system edges: these edge when traversed make the system say a particular utterance. System edges take time to be traversed: the time taken by the associated system utterances to be played (one can configure to ignore this waiting but the default is to wait for a system edge to finish playing the associated animation).
  3. condition edges: these edges are used to connect state when we don't want to wait for an event and we don't want the virtual human to say anything.
Effects:

Nodes can have effects. There are two types of effects:

  1. an information state update. That is, changing the value of some variable in the information state when the node containing the effect is entered.
  2. a reward. A reward can be a numeric constant or an expression returning a number. When the state containing a reward is reached, the system achieves the associated reward. A sub-dialogue can have multiple rewards associated to multiple states.

In the example of sub-dialogue given here the red nodes are states with effects. These states can be inspected to display the particular effects associated with them. This graphical representation of a sub-dialogue is generated for debug purposes, it's not used to edit the sub-dialogue, just to check that the intended form is correctly generated from the provided information.

End node:

Each sub-dialogue is terminated when the execution path reaches a node that has no more outgoing edges.

Final sub-dialogue:

Each sub-dialogue can be marked final. That means that when the end node of a final sub-dialogue is reached, the conversation ends. When the conversation ends the DM will ignore all events and the user will not be able to interact with the virtual character anymore.

Execution

Execution of a sub-dialogue consists of taking a certain entry path (the one that lead to the maximum expected reward) and then at every node, take the first outgoing edge (the order is from left to right and is specified by the author) that can be taken (that is has a satisfied condition) until we reach a waiting point: a user state (i.e. a state with user outgoing edges). At that point the dialogue manager terminates the execution and waits for the next event. If the incoming event is one of the expected events (i.e. the events specified in the user edges) then the execution continues along the first satisfied user edge. If the final node is reached, the sub-dialogue is terminated and becomes inactive and the system searches for a new optimal action to start executing.

Information state:

The information state is formed by variables and stores the current state of the conversation. Three things can update the information state:

  1. the dialogue manager (DM) takes care of updating a set of special variables (e.g. the time since the last user action). These special variables can be found in a file called specialVariables.xml in the dm sub-directory. The file is automatically generated every time the DM starts.
  2. event listeners: one can associate to certain events automatic updates that are executes every time a particular event is received. these updates are also called state less updates because they happen regardless of the current action or best selected action.
  3. effects: as described in the Authoring for the FLoReS NL module section above, a sub-dialogue node can have a specific effect to update the value of a certain variable.
  4. forward inference rules: one can specify an ordered list of implications. They are executed every time a change is made to the information state. when one is found in which the antecedent of the implication is true, the consequent is executed. For example, give the rule "if A then B else C" if A is true, then B is executed otherwise C is executed. The else part is optional. A is a Boolean expression. B and C are assignments.

Dialogue policy execution

When an event is received, the dialogue manager (DM) checks to see if it is expected by the current action (i.e. the current action is at a user node and one of the user outgoing edges is waiting for the received event). If the current action is waiting for the received event the DM will continue the execution of the current action. Otherwise it'll execute a forward search to find the best action to execute. The forward search simulates possible future conversations. It's a breath first search and it's limited by time and depth (i.e. it'll always return quickly even if the search space is huge). Currently the limits are: 250ms or 10 levels maximum (i.e. the dialogue manager terminates the search for the optimal action after 250ms or if the search graph that represents the possible future conversations reaches a depth of 10 sub-dialogues, that is the search had enough time (i.e. within the 250ms timeout) to explore all possible conversations made up using a sequence of 10 sub-dialogues).

when an event is received, the forward search has two possible scenarios to consider:

  1. Ignore the received event: here the dialogue manager searches for the most promising system initiative operators. Two sub-searches are executed:
    1. consider whether the best operator is to keep the current active operator as it is, and
    2. consider all other system initiative operators.
  2. Handle the received event: here the dialogue manager searches for the most promising operator among those that are paused or inactive and that handle the received event.

The best action is the one that maximizes the expected reward. More precisely the formula is:

The preconditions are used to limit which sub-dialogues can be executed in a given state. Rewards are used to differentiate among a set of executable sub-dialogue.

The policy format

The dialogue policy is composed by several files. The main file that defines it is called policy.xml (also this name can be configured, but this is the default name).

A typical policy.xml file will look like the following:

policy.xml
<policy xmlns:xi="http://www.w3.org/2001/XInclude">
 <xi:include href="initKB.xml"/>
 <xi:include href="goals.xml"/>
 <stepDiscount value="0.9"/>
 <include href="textFormat/policy.txt"/>
</policy>

line 2 specifies the file used to define all the variables in the information state and to initialize them.

line 3 specifies the file that defines the basic value of the rewards available in this dialogue policy.

line 4 specifies the discount factor alpha mentioned in the expected reward formula. The line given above defines a discount factor of 0.9.

line 5 includes a file that specifies some operators (actions/sub-dialogues) in a particular text format. One could specify the sub-dialogue trees directly in a xml variant but it's harder and so we prefer to document how to design operators using this special text format. One can have any number of text format files included. When designing complex characters, it is helpful to organize the sub-dialogues in multiple files.

To include multiple files just duplicate line 5 for each different text format file that needs to be included.

The information state initialization file:

The following example shows the format of the information state initialization file:

Information state initialization file format
<informationState>
   <initialize expr="assign(lastNonNullSubdialog,null)"/>
   <initialize expr="assign(timeSinceLastAction,0)"/>
   <initialize expr="assign(alreadyAsked,false)"/>
...
</informationState>

It is a collection of lines to assign initial values to a set of variables. If a variable is used in a sub-dialogue, it must be defined in this file otherwise an error will be generated indicating which variable is undefined and where it was used.

For example, the line: <initialize expr="assign(lastNonNullSubdialog,null)"/> defines the variable lastNonNullSubdialog and assigns to it the value null.

A note about values:

Variables in the information state can have various types of values:

  • A number like in assign(timeSinceLastAction,0.3)
  • A string like in assign(name,'John')
  • A Boolean like in assign(notTrue,false)
  • No value like in assign(unknown,null)
  • A java object can be assigned but only programmatically

The reward definition file:

 

Reward definition file
<goals>
    <goal id="simple" desc="the basic reward" value= "10"/>
    <goal id="quick" desc="reward for something more important" value= "30"/>
...
</goals>

Each <goal> element defines a new reward (we refer to them also as goal to stick with the planning terminology even though they are not really goals).

For example, the line <goal id="simple" desc="the basic reward" value= "10"/> defines the reward named simple with description "the basic reward" and value 10. This lines internally defines a variable named valueFor_simple with value 10. This variable name is used if one wants to change the global value associated to a specific reward at run time (i.e. as an effect of a certain action).

The Text Format used by the files that defines the sub-dialogues (aka operators or actions):

This section describes the text format. As mentioned before one can include in the root policy file any number of files in  text format containing the definition of sub-dialogues. This features allows the author to organize the sub-dialogues in some meaningful way.

All files needs to be in the format described here.

Examples will be used to illustrate the format. This text defines a sub-dialogue named qamake:

Defining a sub-dialogue
Network qamake {
    #topic: qa
    #entrance condition: current NLU speech act = question.what.you-make
    
    system: answer.what.make
    #goal: simple
}

(sorry, one more way to call a sub-dialogue: Network)

First we list the topics associated with this sub-dialologue.

Step 3: Train the natural language understanding module

An example

CakeVendor.zip contains all is required to define a CakeVendor character that is an extension of the character created in this other tutorial for NPCEditor.

 

  • No labels