|
| United States Worldwide |
|
Proceedings of the Association of Computational Linguistics,
Madrid, July 7, 1997.
The ``Casual Cashmere Diaper Bag'':
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| D: | I'd like a soft-sided attache. <displays luggage page> |
|---|---|
| D: | The canvas line. |
| C: | How about kids? |
| B: | Can I see the squall jacket? |
| C: | Could I see the men's clothes? <displays menswear page> |
| C: | Dress shirts. |
| S: | Could we switch to children's clothing. |
| L: | Let's look at some casual dresses. |
| M: | I'd like to see the sweaters please. |
| S: | I'm looking for things from bed and bath. |
| B: | Let's go back to sweaters. |
| B: | Can I go back to the main screen. |
| L: | I'll go back to the womens. |
| A: | I'm looking for a blazer and slacks and skirts to go with it. |
| C: | I need a flat sheet and a fitted sheet in queen. |
We implemented the prototype Lands' End system using our SpeechActs [Martin et al. 1996] system, collecting the relevant semantics from utterances with a simple grammar specifying the allowable phrases.
One over-simplified grammar of such ``item specification'' phrases would allow any basic item (such as ``pants'') to be modified by any combination of meta-style, pattern style, color, size, gender, wearer's age, fabric type, fabric style, and maker's name. A particular sweater could be referred to as ``the petite women's medium dusty sage jewel-neck cashmere fine-knit 'drifter' sweater''. While no one would ever spontaneously utter this monster, we cannot predict which portion of these options will be used in any given utterance. Such an accepting grammar works just fine for extracting the meaning from a written form of the item description, and in fact, is used in the Lands' End system to identify what items are displayed on each ``page'' of the video-accessible catalog.
{leNounPhrase/nosize := [determiners]
[style/sty1][preModifiers][style/sty2]
[sem=style-name][sem=fabric-style]
[sem=material][style/sty3]
leNoun [postModPhrase];
head leNoun;
fabric := material.root;
fabric ^= postModPhrase.material;
index := postModPhrase.index;
fabric-style := fabric-style.root;
fabric-style ^= postModPhrase.fabric-style;
genderCat := preModifiers.genderCat;
genderCat ^= postModPhrase.genderCat;
.....
}
Example UG rule allowing many possible modifers.
Unfortunately, the perplexity of the grammar produced by the cross product of all these choices is so large that the word accuracy of the speech recognition becomes uselessly low. Phrases that no user would ever utter are ``heard'' by the SR engine; the ``casual cashmere diaper bag'' mentioned in the title of this paper refers to one of the more outrageous combinations that pass the muster of this weakly-constraining grammar.
If the lexical entry for every modifier were marked with a feature containing the set of things it could realistically modify (or, better yet, the set of classes of things), then the grammar could be written to allow only the ``reasonable'' combinations and to rule out the ridiculous ones that should be omitted to reduce the perplexity. With a grammar compiler that accepts such restrictions based on features in the lexicon, such a markup appears to be a possible solution. The grammar writer could create and record classes of basic items, noting that ``chinos'' and ``jeans'' were ``tough clothing'' and then only allowing them to be associated with fabrics appropriate for ``tough'' clothes. This strategy would block combinations such as ``lace chinos'' but allow ``silk blouses'' and ``denim jeans.''
The biggest disadvantage of requiring a grammar writer to figure out and record the features that determine allowable modifiers is the large amount of detailed work required to make such annotations. If these markings could be derived automatically from some pre-existing or easily-created data, then the task would be much reduced, and the cost of adding new items to the catalog would be much smaller. (In the particular case of modeling a catalog, the effort required to accommodate each subsequent revision of the items carried is a primary concern -- we don't mind working hard once in a while, but we do not want a new career updating this catalog.)
genderCat = 'womens fabric = 'cotton meta-style = 'casual catalogtype = 'pants style = 'chinoExample indexing of an item page described as ``women's chino slacks'' and as ``casual cotton pants.''
In the Lands' End example, we already have item descriptions which are part of their standard catalog database. We use these descriptive phrases both to navigate to the item or item collection (such as ``men's jackets'') the user has requested and to verify that the semantic grammar and lexicon will accept the phrases used by the catalog designers. Any new version of the catalog will necessarily already have these phrases created for it; using them additionally for grammar restriction almost automates the update chore for new editions of the catalog.
If the grammar were written incorporating tests to require the lexical markings indicating allowable modifiers, then it would reject any phrase that lacked the needed marks. If such a grammar were used with a ``bare'' lexicon (one lacking these modifier markings), it would not support parsing the page descriptors, and would compile into a speech grammar allowing only bare item names, devoid of any modifiers. We addressed this problem by adding the ability to switch the restrictions on or off, and then turning them off when parsing the (written) page descriptors.(See the example of switched tests in a grammar rule.)
Indexing and then processing the results of all the page descriptor parses provides the information content needed to automatically mark up the lexicon with the compatibility results derived from the page descriptors. Once the lexicon has been enhanced with this information, the restrictions can be turned on while the unified grammar is used by the speech recognizer. In our system, we compile the unified grammar to produce BNF reflecting the restrictions, but logically these restrictions could be applied ``on the fly'' by a speech recognizer or used in post-processing to choose among the n-best alternatives from a less restricted SR. Regardless of how it is implemented, the resultant grammar will not allow ``lace jeans'' simply because no page description phrase mentions any such thing.
{leNounPhrase/nosize := [determiners]
[style/sty1][preModifiers][style/sty2]
[sem=style-name][sem=fabric-style]
[sem=material] [style/sty3]
leNoun [postModPhrase];
head leNoun;
leNoun.cat-type *= material.cat-type-set;
fabric := material.root;
fabric ^= postModPhrase.material;
index := postModPhrase.index;
fabric-style := fabric-style.root;
fabric-style ^= postModPhrase.fabric-style;
genderCat := preModifiers.genderCat;
genderCat ^= postModPhrase.genderCat;
.......
}
The *= operator applied to leNoun is the switched test operator in this
example grammar rule.
One final problem must be addressed to make this scheme actually useful; there are sure to be some ``reasonable'' combinations of modifiers and basic items that the catalog makers just do not include in their catalog. If there were ``canvas jackets'' and ``denim jeans'' in the catalog but no ``denim jackets,'' then unless jeans and jackets shared a common ``kind of thing'' property on which to base the grammar restrictions, the restricted grammar could not hear the phrase ``denim jacket''. Presented with those sounds, it would probably produce something like the ``d'women jacket'' pronunciation of ``the women['s] jacket'', but it could not ``hear'' what the user actually said. This would be baffling to a naive user of the system, especially since rephrasing his request to include ``a jacket made of denim'' would also fail.
To fix this shortcoming, the examples that generate the automatic marking of the lexicon must be augmented to include the logical extensions of the actual database of ``real'' items. When proposing this approach, Nicole Yankelovich loosely described it as ``listing all the things that aren't in the catalog''. Of course, taking this literally would be an unbounded task and would defeat the whole goal of restricting the grammar; such a list would include the infamous cashmere diaper bag! What we really needed was a listing of the things that one might logically expect to find but which do not exist in this particular catalog. In our Lands' End example, we created pages of ``missing'' items and associated these explicitly missing pages as phantom pages under their logical parent pages in the catalog. These phantom pages serve to attach the information we give the customer when we report the omission. With this addition to the scheme, the user can be ``heard'' asking for a denim jacket, and will receive a helpful response.
We attach explicit helpful messages to some phantom pages (``Sorry, but the jackets do not come in denim, only Polartec, Thinsulate, and wool'') and otherwise generate a message indicating the query was heard, but no such item is in this catalog.
The restrictions computed by this scheme must be applied to the speech recognizer if any reduction in perplexity is to be achieved. Testing restrictions during SR or selecting ``semantically'' among the n-best are both possible implementations. Neither works with currently available SRs; these SRs use BNF garmmars and do not deliver semantically distinct alternatives for n-best [Hemphill1993, Smith and Bates1993].
The tool we use to impose these restrictions is a compiler capable of converting a grammar composed of patterns and calculated ``semantic'' restrictions into two compiled grammars: one for use in a speech recognizer and one to parse the recognized words and produce a structure representing the relevant semantics of the sentence. The Unified Grammar and its associated tools fill this requirement, providing a generally adequate approximation to this ideal compiler. The ideal compiler would turn the patterns and restrictions into just patterns, and do so without expanding the compact notation of the original grammar into some ``rolled-out'' form that is too large for the SR to use; this compactness requirement rules out any approach which ennumerates the acceptable sentences of the grammar. The Unified Grammar compiler produces a patterns-only grammar that also reflects the restrictions by pre-computing these tests, when possible, to create more specific patterns reflecting the constraints. It omits restrictions that are too complex for it to effect, thus allowing all the good utterances and possibly some bad ones as well.
To implement the example-based restrictions, the Unified Grammar language was extended to include tests that could be disabled or enabled with a global switch, and then the following processing was used:
It is important to note that this approach depends on there being some simple way to indicate in a grammar what sort of ``agreement'' is required between the parts of a phrase, and that a relatively rich example set illustrating the ``good'' agreements also must be available. General language processing lacks one or both of these requirements, so this approach must be understood as having relevance only where the ratio of example data is high relative to the variability that must be supported in the spoken language being processed.
Our example case used a ``binary'' decision paradigm, completely ruling out combinations which did not match up with criteria from the example set; by using likelihood weighting instead of rigid exclusion, a more flexible system could be built.
Clearly, categorization of the items used in this technique improves both the simplicity and the generality of the restrictions that can be generated. For example, using the fact that ``diaper bag'' and ``book bag'' are types of luggage, we can write the restriction rules to record and test the markings on their ``type'' rather than their ``species'', and thus get information about the appropriate modifiers for ``duffel bag'' without having ever ``seen'' sentences about duffel bags.
We have presented an implemented scheme which significantly reduces the perplexity of the speech recognition task in cases where the perplexity arises from allowing semantically irrelevant grammatical constructions. This method is applicable where there is a modest collection of relevant sample sentences to support building the restrictions by example. This method is applicable only in certain classes of speech, but in those cases it can automate the otherwise quite tedious task of manually marking semantic restrictions for a grammar.
This work is part of a larger effort within Sun Microsystems Labs prototyping tools to make the use of computer speech more practical. Thanks to my co-Principal Investigator Nicole Yankelovich and to Stuart Adams, Eric Baatz, and Andrew Kehler for their contributions.
The ``Casual Cashmere Diaper Bag'':
Constraining Speech
Recognition Using Examples
This document was generated using the LaTeX2HTML translator Version 96.1-h (September 30, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html cashmere.