ChatScript

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

ChatScript is a combination Natural Language engine and dialog management system designed initially for creating chatbots, but is currently also used for various forms of NL processing. It is written in C++. The engine is an open source project at SourceForge.[1] and GitHub.[2]

ChatScript was written by Bruce Wilcox and originally released in 2011, after Suzette (written in ChatScript) won the 2010 Loebner Prize, fooling one of four human judges.[3]

Features[edit]

In general ChatScript aims to author extremely concisely, since the limiting scalablility of hand-authored chatbots is how much/fast you can write the script.

Because ChatScript is designed for interactive conversation, it automatically maintains user state across volleys. A volley is any number of sentences the user inputs at once and the chatbots response.

The basic element of scripting is the rule. A rule consists of a type, a label (optional), a pattern, and an output. There are three types of rules. Gambits are something a chatbot might say when it has control of the conversation. Rejoinders are rules that respond to a user remark tied to what the chatbot just said. Responders are rules that respond to arbitrary user input which is not necessarily tied to what the chatbot just said. Patterns describe conditions under which a rule may fire. Patterns range from extremely simplistic to deeply complex (analogous to Regex but aimed for NL). Heavy use is typically made of concept sets, which are lists of words sharing a meaning. ChatScript contains some 2000 predefined concepts and scripters can easily write their own. Output of a rule intermixes literal words to be sent to the user along with common C-style programming code.

Rules are bundled into collections called topics. Topics can have keywords, which allows the engine to automatically search the topic for relevant rules based on user input.

Example Code[edit]

Topic: ~food(  ~fruit  fruit food eat)

t: What is your favorite food?
    a: (~fruit) I like fruit also.
    a: (~metal) I prefer listening to heavy metal music rather than eating it.

?:  WHATMUSIC ( << what music you ~like >>) I prefer rock music.
s: ( I * ~like * _~music_types)    ^if (_0 == country) {I don't like country.} else {So do I.}

Words starting with ~ are concept sets. ~fruit would be the list of all known fruits. The simple pattern (~fruit) reacts if any fruit is mentioned immediately after the chatbot asks for favorite food. The slightly more complex pattern for the rule labelled WHATMUSIC requires all the words what, music, you and any word or phrase meaning to like, but they may occur in any order. Responders come in three types. ?: rules react to user questions. s: rules react to user statements. u: rules react to either.

ChatScript code supports standard if-else, loops, user-defined functions and calls, and variable assignment and access.

Data[edit]

Some data in ChatScript is transient, meaning it will disappear at the end of the current volley. Other data is permanent, lasting forever until explicitly killed off.

Internally all data is represented text and is automatically converted to numeric forms as needed.

Variables[edit]

User variables come in several kinds. Variables purely local to a topic or function are transient. Global variables can be declared as transient or permanent. A variable is generally declared merely by using it, and its type depends on its prefix ($, $$, $_).

$_local  = 1			is a local transient variable being assigned a 1
$$global1.value = “hi”	is a transient global variable which is a JSON object
$global2 += 20		is a permanent global variable

Facts[edit]

In addition to variables, ChatScript supports facts – triples of data, which can also be transient or permanent. Functions can query for facts having particular values of some of the fields, making them act like an in-memory database. Facts can represent record structures and are how ChatScript represents JSON internally. Tables of information can be defined to generate appropriate facts.

table: ~inventors(^who ^what)
createfact(^who invent ^what)
DATA:
"Johannes Gutenberg" "printing press"
"Albert Einstein" ["Theory of Relativity" photon "Theory of General Relativity"]

The above table links people to what they invented (1 per line) with Einstein getting a list of things he did.

External communication[edit]

ChatScript embeds the Curl library and can directly read and write facts in JSON to a website.

Server[edit]

A ChatScript engine can run in local or server mode.

Pos-tagging, parsing, and ontology[edit]

ChatScript comes with a copy of English WordNet embedded within, including its ontology, and creates and extends its own ontology via concept declarations. It has an English language pos-tagger and parser and supports integration with TreeTagger for pos-tagging a number of other languages (TreeTagger commercial license required).

Databases[edit]

In addition to an internal fact database, ChatScript supports PostgreSQL and MongoDB both for access by scripts, but also as a central filesystem if desired so ChatScript can be scaled horizontally.

JavaScript[edit]

ChatScript also embeds DukTape, Ecmascript E5/E5.1 compatibililty, with some semantics updated from ES2015+.

Control Flow[edit]

A chatbot's control flow is managed by the control script. This is merely another ordinary topic of rules, that invokes API functions of the engine. Thus control is fully configurable by the scripter (and functions exist to allow introspection into the engine).

References[edit]