结合本体技术,提出了一种新的从文档中抽取引文元数据信息的方法。该方法采用模 式匹配方式,可以从文档中提取作者、标题、日期等信息,并使用OWL 本体描述语言进行形式化,为进一步的语义搜索和语义存储奠定基础。实验数据证明了该方法的有效性。 关键词:信息抽取;语义网;本体;模式匹配 Abstract: A new method using ontology to extract citation metadata from technical documents is proposed in this paper. By the way of pattern matching, it can get metadata such as authors, titles and publishing date, and it uses OWL ontology describing language to formulate the extracted metadata, which assists the semantic searching and storage. The experiment proved its efficiency. Key words:information extraction, semantic web, ontology, pattern matching