On-line Case-Based Planning
This Paper is Made By (i will only try to summarize it for a general Reader):
Santi Ontanon and Kinshuk Mishra and Neha Sugandh and Ashwin Ram
CCL, Cognitive Computing Lab,
Georgia Institute of Technology,
Atlanta, GA 30322/0280, USA
Abstract
Some domains, such as real-time strategy (RTS) games, pose several challenges to traditional planning and machine learning techniques. In this paper, a novel on-line case-based planning architecture that addresses some of these problems. The architecture addresses issues of plan acquisition, on-line plan execution, interleaved planning and execution and on-line plan adaptation. It also introduces the Darmok system, which implements this architecture in order to play Wargus (an open source clone of the well-known RTS game Warcraft II). Then presents empirical evaluation of the performance of Darmok and show that it successfully learns to play the Wargus game.
There are four refinements with respect to the original CBR cycle:
1. Problems “enter the cycle” through the Expansion process. When a new problems arrives to the system, this problem is set as the current plan, i.e. the current plan consists of a single open problem. Thus, the first thing the Expansion module will do is to send this problem to the retrieve process.
2. The CBR cycle is divided in two parts: a first part composed of retrieval and adaptation (or reuse), and a second part composed of revision and retention. The first part is in charge of finding solutions to new problems (i.e. it corresponds to the problem solving capability of the CBR cycle), and the second part is responsible for learning from experience. Revision takes as input the outcome of executing the solutions provided by the system and revises them to verify they achieve their goals. Revised solutions are handed to retention, that will decide whether to retain new cases or not, and update any internal indexes or similarity metrics when required.
3. The new cycle incorporates the world in it’s design, since it is an important part of any real-time problem solving process.
4. The new cycle features a new delayed adaptation cycle (notice that there is a loop between adaptation and expansion). In a domain where the domain changes dynamically, we want to delay adaptation till the last moment to ensure that plans are adapted with the latest information. Thus, the expansion component may send back plans to adaptation if the environment changed too much since the last time the plan was adapted. Moreover, notice that for real-time domains it is important for adaptation echniques to be efficient.
Expansion:
1- Takes the current solution proposed by the system for a problem (i.e. the current plan), and tries to find open sub-problems (sub-goals) in it. If there are any, these sub-problems are sent to the retrieve process so that they can be solved.
2- Also monitor the world state for changes, and send plans to the adaptation module again in case the world state changes enough so that plans have to be changed. We call this a delayed adaptation, since adaptation is delayed and performed at run-time with the latest game state.
Execution:
This process is in charge of executing the current plan and updating its status according to the result of execution. If a particular step in the plan fails when executed, and that causes a particular sub- problem to fail, then the execution process will update the current plan to reflect this. When that happens the expansion module will be responsible to find an alternative plan for such a sub-problem.
Why Planing in RTS(Real Time Strategy) are Hard ?
Real-time strategy (RTS) games have several characteristics that make the application of traditional planning approaches difficult:
• They have huge decision space (i.e. the set of different actions that can be executed in any given state is huge).
• Huge state space (the combination of the previous bullet and this bullet makes them not suitable for search based AI techniques ( Minimax, Alpha-beta pruning )
• They are non-deterministic.
• They are incomplete information games, where the player can only sense the part of the map he has explored and include unpredictable opponents.
• They are real-time. Thus, while the system is deciding which actions to execute, the game continues executing and the game state changes constantly. They are difficult to represent using classical planning formalisms since postconditions for actions cannot be specified easily.


Trackbacks & Pingbacks