OCBP – “Introducing Darmok”
Reference Paper That Implemented Darmok System.
Darmok
Using designed OLCBP cycle has been designed with domains such as RTS games in mind. This approach has been implemented in the Darmok system, designed to play the full Wargus game. The only aspect of Wargus still not covered by Darmok is the “fog-of-war”, that has been disabled in during experiments.
The Darmok system learns how to play Wargus by observing how humans play. Darmok learns what we call plan snippets by observing a human play, and stores those snippets in the system in the form of cases. Such snippets are then retrieved and composed together to form plans.
Learning:
During learning Darmok observes a game trace to learn plan snippets that will be stored in the case base. In experiments, an expert plays Wargus to generate a trace. However, notice that the system could learn from any trace, even from traces of itself playing, or observing other systems play. Then, the trace is annotated by the expert, explaining the goals he was pursuing with the actions he took while playing. Using those annotations, a set of snippets are extracted from the trace and stored as a set of cases. For each snippet, the situation in which it was executed, the goal it was pursuing, and its success or failure are stored in the case base.
Execution:
Plan Retrieval, Plan Adaptation, Plan Expansion and Plan Execution are in charge of maintaining a current plan to win the game. The Plan Execution module is in charge of executing the current plan, and update its state (e.g., marking which actions succeeded or failed). The Plan Expansion module is in charge of identifying open goals in the current plan and expand them. In order to do that it relies on the Plan Retrieval module, which given an open goal and the current game state retrieves the most appropriate plan snippet to fulfill the open goal. Finally, we have the Plan Adaptation module in charge of adapting the retrieved snippets according to the current game state.
Darmok requires:
1- a set of goals.
2- a set of primitive actions.
3- a vocabulary for conditions.
4- a set of features to represent the game state (used for plan retrieval).
5- a set of annotated expert traces.
6- and (as to be explained later) a set of rules to help the system perform precondition-postcondition matching.
Plan Representation in Darmok
The plan representation formalism used by Darmok, designed to allow a system to learn plans, represent them, and to reason about them and their intended and actual effects.The basic constituent piece is the snippet. Snippets are composed of three elements:
• A set of preconditions that must be satisfied before the plan can be executed. For instance, a snippet can have as preconditions that a particular peasant exists and that a desired location is empty.
• A set of alive conditions that represent the conditions that must be satisfied during the execution of the plan for it to have chances of success (also known as “maintenance goals” in the planning literature). If at some moment during the execution, the alive conditions are not met, the plan can be stopped, since it will not achieve its intended goal. For instance, the peasant in charge of building a building must remain alive; if he is killed, the building will not be built.
• The plan itself. which can contain the following constructs: sequence, parallel, action, and subgoal, where an action represents the execution of a basic action in the domain of application (a set of basic actions must be defined for each domain), and a subgoal means that the execution engine must find another snippet that has to be executed to satisfy that particular subgoal.
Also, snippets are associated with goals. A goal is a representation of the intended goal of the snippet.
For every domain, an ontology of possible goals has to be defined. For instance, a snippet might have the goal of “having a tower”.
Notice that unlike classical planning approaches, postconditions cannot be specified for snippets, since a snippet is not guaranteed to succeed. Thus, we can only specify the goal a snippet pursues, i.e., its success conditions.
The Difference between a postcondition and a success condition :
A postcondition is a condition that we can ensure is going to be true after the execution of a snippet (or an action), while a success condition is a condition that when satisfied we can consider the snippet (or action) to have completed.
For example, a side effect of an action is a postcondition but not a success condition: “enemy killed” is a success condition of an attack, but not a postcondition since we cannot ensure that after the attack is done the enemy would be killed( because the domain is non-deterministic). Our use of success conditions instead of postconditions defines an abstraction over the notion of a nondeterministic action in planning handled by interleaving planning and execution.
Specifically, three things need to be defined for using Darmok in a particular domain:
• A set of basic actions that can be used in the domain. For instance, in Wargus we define actions such as move, attack, or build. For uniformity, in Darmok actions are treated as standard snippets, and thus have a goal, preconditions and alive conditions (so that the system can reason about them too).
• A set of sensors, that are used to obtain information about the current state of the world, and are used to specify the preconditions, alive conditions and goals of snippets. For instance, in Wargus we might define sensors such as numberOfTroops, or unitExists. A sensor might return any of the standard basic data types, such as boolean or integer.
• A set of goals. Goals can be structured in a specialization hierarchy in order to specify the relations among them.
A goal might have parameters, and for each goal a set of success conditions is defined. For instance, HaveUnits(TOWER) is a valid goal in our gaming domain and it has as success condition: UnitExists(TOWER). Therefore, the goal definition can be used by the system to reason about the intended result of a snippet, while the success conditions are used by the execution engine to verify whether a particular snippet succeeds at run time.
