基于强化学习的煤层气井螺杆泵排采参数智能决策

2020年 42卷 第1期
阅读:77
查看详情
Intelligent decision making on PCP production parameters of CBM wells based on reinforcement learning
檀朝东 蔡振华 邓涵文 刘世界 秦鹏 王一兵 宋文容
TAN Chaodong CAI Zhenhua DENG Hanwen LIU Shijie QIN Peng WANG Yibing SONG Wenrong
中国石油大学(北京) 中海油能源发展股份有限公司工程技术分公司 新疆中泰集团有限公司 北京雅丹石油技术开发有限公司
China University of Petroleum (Beijing), Beijing 102249, China CNOOC EnerTech-Drilling & Production Co., Tianjin 300450, China Xinjiang Zhongtai Group Co., Ltd., Urumqi 830001, Xinjiang, China Beijing Yadan Petroleum Technology Development Company Limited, Beijing 102200, China
为了实现煤层气井螺杆泵排采参数的连续决策和连续控制,使煤层气井长期高效稳产,以煤层气井螺杆泵生产周期内最大累积产气量为优化目标,提出了一种具有动作自寻优能力的螺杆泵排采强化模型的框架和Q学习及Sarsa、Sarsa(lambda)算法。研究通过与环境的交互式学习,对动态环境进行灵活奖惩,实现智能体在复杂环境下智能决策和参数优化,可有效获取煤层气螺杆泵排采最优协调控制,从而解决传统方法不能根据环境变化迅速做出调整而降低排采效果的问题。实验分析表明,给定煤层气井产气量的变化曲线,以螺杆泵的频率为单一控制变量,应用Q学习方法能有效得到螺杆泵排采变频控制的最优策略,具有一定的应用潜力。
In order to realize the continuous decision making and continuous control on the production parameters of progressive cavity pump (PCP) in coalbed methane (CBM) wells and ensure the efficient and stable production of CBM wells in the long term, this paper put forward the framework of the reinforcement model with the ability of action self-optimization for PCP production of CBM well, as well as learning & Sarsa & Sarsa (lambda) algorithms by taking the maximum cumulative gas production within a PCP production cycle as the optimization target. The dynamic environment is rewarded and punished flexibly by means of the interactive learning with environment, so that the intelligent agent can perform intelligent decision making and parameter optimization in the complicated environment. In this way, the optimal coordinated control on the PCP production of CBM well can be realized effectively, and the problem that the traditional methods fail to make an adjustment quickly based on the environmental change to improve the production effect is solved. It is experimentally indicated that for a given gas production rate curve of CBM well, by taking PCP’s frequency as the single control variable, the learning method can effectively provide the optimal strategy of frequency conversion control on PCP production. Obviously, this method has a certain application potential.
煤层气; 螺杆泵; 排采参数; 智能决策; 强化学习; Q学习; 动作自寻优; 智慧油田;
coalbed methane; progressive cavity pump; production parameter; intelligent decision making; reinforcement learning; learning; action self-optimization; smart oilfield;
10.13639/j.odpt.2020.01.011