Advertisement

How to implement custom environment in keras-rl / OpenAI GYM?

阅读量:

题意 :如何在 Keras-RL / OpenAI GYM 中实现自定义环境?

问题背景:

I am a novice in the field of Reinforcement Learning, having been on a quest to find an easy-to-use framework or module that can help me navigate this complex terrain effectively. During my research, I have discovered two notable modules: Keras-RL and OpenAI Gym.

改写说明

I can participate in both of them working on the examples they have shared on their WIKIs, but these include pre-established development environments and offer minimal guidance regarding setting up a custom environment.

我可以让他们在各自独立的WIKI上进行分享,在示例页面中运行。然而这些预设的环境配置已经预先设定好了功能模块,并未提供关于如何构建并配置属于我的定制化工作流程的相关指导或说明。

I would greatly appreciate if anyone could guide me to find a tutorial or just clarify the process for me on how I can set up a non-game environment.

若有人愿意帮我找到一个教程或仅是为了告知如何设置非游戏环境而提供帮助,请务必告知!感激涕零。

问题解决:

I have spent considerable time developing these libraries, and am now able to present some of my experimental results.

我已经在这些库上工作了一段时间,可以分享一些我的实验。

Let us first consider as an example of custom environment a text environment, https://github.com/openai/gym/blob/master/gym/envs/toy_text/hotter_colder.py

让我们首先探讨一个自定义环境的情境;一种文本环境:https://github.com/openai/gym/blob/master/gym/envs/toy_text/hotter_clder.py

For a custom environment, a couple of things should be defined.

对于自定义环境,需要定义几项内容。

  • Construct function 构造方法 __init__ 方法
  • Action space 动作空间
  • Observation space is defined as the set of all possible observations that an agent can receive.

观测空间(可参考 gym/gym/spaces 在 GitHub 上的 master 分支中了解所有可用的 gym 空间(这是一个数据结构))。

  • _seed method (not sure that it's mandatory)

_seed 方法(不确定是否是必须的)

_step-based approach accepting an action as its input parameter and returning the next state resulting from the execution of the action, along with the reward associated with transitioning to a new observational state, a boolean flag indicating whether the task is completed, and some optional additional information.]

_step 方法接收一个动作作为输入,并返回执行该动作后的观测结果、转移奖励值以及布尔值和一些可选的额外信息。

  • _reset method that implements logic of fresh start of episode.

_reset 方法实现了新一集开始的逻辑

Optionally, you can create a _render method with something like

可选地,你可以创建一个 _render 方法,类似于这样:

复制代码
  def _render(self, mode='human', **kwargs):

    
     outfile = StringIO() if mode == 'ansi' else sys.stdout
    
     outfile.write('State: ' + repr(self.state) + ' Action: ' + repr(self.action_taken) + '\n')
    
     return outfile

Furthermore, for enhanced code flexibility, you may define the logic of your reward within the _get_reward method and modify the observation space by actions taken in the _take_action method.

此外,在_get_reward方法中进行奖励设定,并在_take_action方法中对观测空间进行更新。

全部评论 (0)

还没有任何评论哟~