论文详情
基于NER的石油非结构化信息抽取研究
西南石油大学学报(自然科学版)
2020年 42卷 第6期
阅读:148
查看详情
Title
Research of Extraction on Petroleum Unstructured Information Based on Named Entity Recognition
Authors
ZHONG Yuan
LIU Xiaorong
WANG Jie
CHEN Yan
ZHANG Tai
单位
西南石油大学计算机科学学院, 四川 成都 610500
Organization
School of Computer Science, Southwest Petroleum University, Chengdu, Sichuan 610500, China
摘要
随着"智能油田"的建设加快,构建基于海量石油数据的智能分析系统意义重大。然而,由于石油生产过程中产生的文本数据往往无结构且类型多样,从中抽取关键信息进行分析成为一个研究热点,而信息抽取又需要高质量的语义实体做支撑。根据这一特定问题,提出基于命名实体识别(Named Entity Recognition,NER)技术针对石油非结构化文本进行信息抽取,构建双向长短时记忆(Bidirectional Long Short-Term Memory,Bi LSTM)网络模型提取语料特征,并结合条件随机场(Conditional Random Field,CRF)做分类器,构建了基于Bi LSTM+CRF的高精度NER模型,针对石油工业领域的非结构化文本进行命名实体抽取。通过在修井作业文本数据集上进行对比实验表明,本方法具有较高的精确率和召回率。
Abstract
With the acceleration of the construction of "intelligent oilfield", it is of great significance to build an intelligent analysis system for mass oil data. However, as a result of the dynamic text data generated in oilfield production process is often of unstructured and various types. Extracting the crucial information for analysis becomes a popular area of research, and information extraction needs high-quality entities to support. In this paper, we propose an unstructured text information extraction method based on NER (Named Entity Recognition) according to the particular problem. Feature extraction of oil corpus is carried out by Bidirectional Long Short-Term Memory (Bi-LSTM) network model, and combines Conditional Random Field (CRF) as classifier. Bi-LSTM+CRF method is used to construct a high-precision NER model to extract named entities from unstructured texts in petroleum industry. The experimental results on the text data set of well workover treatment show that this method has a higher precision and recall rate than other state-of-art methods.
关键词:
命名实体识别;
Bi-LSTM+CRF;
信息抽取;
非结构化文本;
Keywords:
NER;
Bi-LSTM+CRF;
information extraction;
unstructured text;
DOI
10.11885/j.issn.1674-5086.2020.05.12.01