基于改进集成学习的测井岩性识别方法研究

2023年 62卷 第No. 2期
阅读:108
查看详情
Logging lithology identification method based on improved ensemble learning
罗仁泽 庹娟娟 倪华玲 李兴宇 雷璨如 郭亮
Renze LUO Juanjuan TUO Hualing NI Xingyu LI Canru LEI Liang GUO
1. 西南石油大学油气藏地质及开发工程国家重点实验室, 西南石油大学地球科学与技术学院, 四川成都 610500 2. 中国石油东方地球物理公司西南物探研究院, 四川成都 610500
1. State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Southwest Petroleum University, School of Earth Science and Technology, Chengdu 610500, China 2. Southwest Geophysical Exploration Bureau of Geophysical Prospecting, China National Petroleum Corporation, Chengdu 610500, China

测井数据中存在大量与岩性无关的冗余信息, 且各类岩性标签数据分布不均匀, 严重影响岩性识别准确率, 现有测井岩性识别算法无法有效解决岩性类间不平衡问题。为此提出了一种针对不平衡样本集的集成学习岩性预测方法KSMOSEL: 首先以录井岩性数据为岩性样本标签, 将测井曲线作为模型输入; 然后将K-means算法与合成少数类过采样技术(SMOTE)相结合形成K-means-合成过采样算法, 即KS采样算法, 对岩性样本集进行平衡化处理; 最后将采样后的数据集用于构建集成学习模型并训练, 采用多个分类器模型融合构成强学习器, 通过“软投票”方式预测岩性类型。以Hugoton油气田测井岩性数据为基础, 采用改进不平衡样本集的集成学习岩性预测方法对岩性进行分类, 并将识别效果与传统的分类模型: 支持向量机、K最近邻分类、决策树、XGBoost和随机森林等模型进行对比。试验结果表明: KSMOSEL方法具有更高的精度, 岩性识别准确率达到94.28%;KS采样之后, 支持向量机、K最近邻分类、决策树、XGBoost、随机森林、GBDT和集成学习等模型岩性识别准确率分别提高了18.68%, 12.03%, 3.77%, 10.23%, 24.77%, 16.69%, 19.37%, 在测井岩性数据分布比例不平衡时极大地提升了岩性识别的准确率。

Logging data contains a lot of redundant information that is irrelevant to lithology, and the distribution of various lithology label data is uneven, which substantially impacts the accuracy of lithology recognition.The commonly used classification algorithms cannot effectively solve the problem of imbalance between lithology classes.Therefore, for unbalanced sample sets, a k-means Synthetic Minority Over Sampling Ensemble Learning (KSMOSEL) lithology prediction method is suggested.Firstly, logging lithology data were used as lithology sample labels and logging data are used as lithology sample features in this study.Secondly, the k-means algorithm was combined with Synthetic Minority Over-sampling Technique (SMOTE) to form a k-means-synthesized oversampling (KS) algorithm, to balance the lithology sample set.Then, the sampled data sets were used to build and train the integrated learning model.Multiple classifier models were fused to form a strong learner.The new training data were modeled and the "soft voting" method was used to predict the lithology types.Finally, based on the logging lithology data from the Hugoton oil and gas field, the lithology identification method of over-sampling integrated learning with an improved unbalanced sample set was adopted to classify lithology, and the identification effect was compared with the traditional classification models: Support vector machine (SVM), k-nearest neighbor classification (KNN), Decision Tree, XGBoost, and random forest models.The experimental results revealed that KSMOSEL method had the highest accuracy, with a lithology identification accuracy of 94.28%.The accuracy of lithologic identification of SVM, KNN, Decision Tree, XGBoost, random forest, GBDT and integrated learning models increased by 18.68%, 12.03%, 3.77%, 10.23%, 24.77%, 16.69%, and 19.37%, respectively.It can be promoted as a lithology identification technique that can greatly improve the accuracy of lithology identification with an unbalanced distribution ratio of logging lithology data.

岩性识别; 非平衡数据; 过采样; KSMOSEL; 测井数据;
lithology identification; unbalanced data; oversampling; KSMOSEL; logging data;
国家重点研发计划深地专项项目(2016YFC0601100);四川省科技项目(2019CXRC0027)
10.3969/j.issn.1000-1441.2023.02.003