Tree search based Game AI: How to avoid AI 'wandering'/'procrastination' with sparse rewards?

My game AI makes use of an algorithm that searches all possible future states based on the moves I can make (minimax / monte carlo esque). It evaluates these states using a scoring system, picks the highest scored final state and follows it.

This works well in most situations, but awfully when rewards are sparse. For example: there's a desirable collectable object that's 3 tiles to the right of me. The natural solution would be to go right->right->right.

But, my algorithm searches 6 turns deep. And it will will find many paths that eventually collect the object, including ones that take longer than 3 turns. It might for example find a path that's: up->right->down->right->right->down, collecting the object on turn 5 instead.

Since in both cases, the final leaf nodes detect the object as collected, it doesn't naturally prefer one or the other. So, instead of going right on turn 1, it might go up, or down, or left. This behavior will be repeated exactly on the next turn, so that it basically ends up dancing randomly in front of the collectable object, only luck will make it step on it.

That's clearly suboptimal and I want to fix it, but have run out of ideas how to handle this appropriately. Are there any solutions for this issue or is there any theoretical work that deals with handling this issue?

Solutions I've tried:

Make it value object collection more on earlier turns. While this works, to beat evaluator 'noise', the difference between turns must be quite high. Turn 1 must be rated higher than 2, turn 2 rated higher than 3, etc. The difference between turn 1 and 6 needs to be so high that it ends up making the behavior extremely greedy, which is not desirable in most situations. In an environment with multiple objects, it might end up choosing the path that grabs an object on turn 1, instead of the much better path that can grab objects on turn 5 and 6.

Assign the object as a target and value distance to it. If not done on a turn to turn basis, the original problem persists. If done on a turn to turn basis, the difference in importance required per turn once again makes it too greedy. This method also decreases flexibility and causes other issues. Target selection is not trivial and kind of ruins the point of a minimax style algorithm

Going much deeper in my searches so that it can always find a second object. This would cost so much computing power that I'd have to make concessions, like pruning paths much more aggressively. If I do so, I'll be back at the same problem since I won't know how to get it to prefer pruning the 5 turn version over the 3 turn version.

Give extra value to the plans laid out last turn. If it can at least follow the suboptimal path, there wouldn't be as much of an issue. Unfortunately, this once again has to be a pretty strong effect for it to work reliably, making it follow sub-optimal paths in all scenarios, hurting overall performance.

asked Nov 13 '18 at 11:52

AnythingElse

396

add a comment |

Solutions I've tried:

Make it value object collection more on earlier turns. While this works, to beat evaluator 'noise', the difference between turns must be quite high. Turn 1 must be rated higher than 2, turn 2 rated higher than 3, etc. The difference between turn 1 and 6 needs to be so high that it ends up making the behavior extremely greedy, which is not desirable in most situations. In an environment with multiple objects, it might end up choosing the path that grabs an object on turn 1, instead of the much better path that can grab objects on turn 5 and 6.

Assign the object as a target and value distance to it. If not done on a turn to turn basis, the original problem persists. If done on a turn to turn basis, the difference in importance required per turn once again makes it too greedy. This method also decreases flexibility and causes other issues. Target selection is not trivial and kind of ruins the point of a minimax style algorithm

Going much deeper in my searches so that it can always find a second object. This would cost so much computing power that I'd have to make concessions, like pruning paths much more aggressively. If I do so, I'll be back at the same problem since I won't know how to get it to prefer pruning the 5 turn version over the 3 turn version.

Give extra value to the plans laid out last turn. If it can at least follow the suboptimal path, there wouldn't be as much of an issue. Unfortunately, this once again has to be a pretty strong effect for it to work reliably, making it follow sub-optimal paths in all scenarios, hurting overall performance.

asked Nov 13 '18 at 11:52

AnythingElse

396

add a comment |

Solutions I've tried:

Make it value object collection more on earlier turns. While this works, to beat evaluator 'noise', the difference between turns must be quite high. Turn 1 must be rated higher than 2, turn 2 rated higher than 3, etc. The difference between turn 1 and 6 needs to be so high that it ends up making the behavior extremely greedy, which is not desirable in most situations. In an environment with multiple objects, it might end up choosing the path that grabs an object on turn 1, instead of the much better path that can grab objects on turn 5 and 6.

Assign the object as a target and value distance to it. If not done on a turn to turn basis, the original problem persists. If done on a turn to turn basis, the difference in importance required per turn once again makes it too greedy. This method also decreases flexibility and causes other issues. Target selection is not trivial and kind of ruins the point of a minimax style algorithm

Going much deeper in my searches so that it can always find a second object. This would cost so much computing power that I'd have to make concessions, like pruning paths much more aggressively. If I do so, I'll be back at the same problem since I won't know how to get it to prefer pruning the 5 turn version over the 3 turn version.

Give extra value to the plans laid out last turn. If it can at least follow the suboptimal path, there wouldn't be as much of an issue. Unfortunately, this once again has to be a pretty strong effect for it to work reliably, making it follow sub-optimal paths in all scenarios, hurting overall performance.

asked Nov 13 '18 at 11:52

AnythingElse

396

Solutions I've tried:

Make it value object collection more on earlier turns. While this works, to beat evaluator 'noise', the difference between turns must be quite high. Turn 1 must be rated higher than 2, turn 2 rated higher than 3, etc. The difference between turn 1 and 6 needs to be so high that it ends up making the behavior extremely greedy, which is not desirable in most situations. In an environment with multiple objects, it might end up choosing the path that grabs an object on turn 1, instead of the much better path that can grab objects on turn 5 and 6.

Assign the object as a target and value distance to it. If not done on a turn to turn basis, the original problem persists. If done on a turn to turn basis, the difference in importance required per turn once again makes it too greedy. This method also decreases flexibility and causes other issues. Target selection is not trivial and kind of ruins the point of a minimax style algorithm

Going much deeper in my searches so that it can always find a second object. This would cost so much computing power that I'd have to make concessions, like pruning paths much more aggressively. If I do so, I'll be back at the same problem since I won't know how to get it to prefer pruning the 5 turn version over the 3 turn version.

Give extra value to the plans laid out last turn. If it can at least follow the suboptimal path, there wouldn't be as much of an issue. Unfortunately, this once again has to be a pretty strong effect for it to work reliably, making it follow sub-optimal paths in all scenarios, hurting overall performance.

artificial-intelligence minimax game-ai monte-carlo-tree-search

asked Nov 13 '18 at 11:52

AnythingElse

396

asked Nov 13 '18 at 11:52

AnythingElse

396

asked Nov 13 '18 at 11:52

AnythingElse

396

asked Nov 13 '18 at 11:52

AnythingElse

396

asked Nov 13 '18 at 11:52

AnythingElse

396

add a comment |

1 Answer
1

active

oldest

votes

When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?

I presume, you are quantifying each step of your move actions, giving a +1 if the step results in the picking up of an object. This means that in 3 steps, I can pick up the object with your above example, and get a +1 state of the play field, but I can also do this with 4-5-6-x steps, getting the same +1 state. If only a single object is reachable in the depth you are searching, your algorithm will likely select one of the random +1 states, giving you the above behaviour.

This could be solved, by quantifying with a negative score, each of the moves the AI must make. Thus, getting the object in 3 moves, will result in a -2, but getting the object in 6 moves, will result in -5. This way, the AI will clearly know, that it is preferable to get the object in the least amount of moves, ie, 3.

answered Jan 8 at 0:28

Adam Baranyai

1,52111233

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280469%2ftree-search-based-game-ai-how-to-avoid-ai-wandering-procrastination-with-sp%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?

answered Jan 8 at 0:28

Adam Baranyai

1,52111233

add a comment |

When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?

answered Jan 8 at 0:28

Adam Baranyai

1,52111233

add a comment |

When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?

answered Jan 8 at 0:28

Adam Baranyai

1,52111233

When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?

answered Jan 8 at 0:28

Adam Baranyai

1,52111233

answered Jan 8 at 0:28

Adam Baranyai

1,52111233

answered Jan 8 at 0:28

Adam Baranyai

1,52111233

answered Jan 8 at 0:28

Adam Baranyai

1,52111233

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb