学习Knowledge Graph Embedding Based Question Answering代码笔记

阅读量：

前言

最近被导师安排学习一下【Knowledge Graph Embedding Based Question Answering】这篇paper，这篇paper的重点在于运用了Knowledge Graph为dataset，在不用知道数据结构的情况下，去完成Question Answering这个自然语言处理方向的问题。这篇笔记只用来记录一下阅读这篇paper的github的代码时，作为一名很菜的本科学生所发现觉得可能有用的代码片段，具体对paper的笔记会再开一份笔记另行记录

希望自己能和大家一起学习进步！加油！

paper 链接：

delivery.acm.org/10.1145/330…acm =1564312374_9607150c0f9e4d7029cba11e69cb8903 (请复制全部)

github 链接：

github.com/xhuang31/KE…

下面会逐步缓慢更新

正文开始！

if the question contains specific words, delete it

比如我们想去掉what is your name里的what is，获得结果your name，便可使用如下代码：

复制代码

 whhowset = [{'what', 'how', 'where', 'who', 'which', 'whom'},

    
 {'in which', 'what is', "what 's", 'what are', 'what was', 'what were', 'where is', 'where are','where was', 'where were', 'who is', 'who was', 'who are', 'how is', 'what did'}, 
    
 {'what kind of', 'what kinds of', 'what type of', 'what types of', 'what sort of'}]
    
 question = ["what","is","your","name"]
    
 for j in range(3, 0, -1):
    
   if ' '.join(question[0:j]) in whhowset[j - 1]:
    
     del question[0:j]
    
     continue
    
 print(question)
    
 复制代码

output: ["your","name"]

create n-gram list for sentence word list

以下引用自wiki里对n-gram的解释：n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.

n可自定义，例如unigram, bigram. 对于n-gram的具体例子就是：

单词：word: apple, n-gram list: ['a','p','l','e','ap','pp','pl','pl','app','ppl','ple','appl','pple','apple']
句子：sentence: 'how are you', n-gram list: ['how', 'are', 'u', 'how are', 'are u', 'how are u']

复制代码

 question = ["how","are","u"]

    
 grams = []
    
 maxlen = len(question)
    
 for token in question:
    
     grams.append(token)
    
  
    
 for j in range(2, maxlen + 1):
    
     for token in [question[idx:idx + j] for idx in range(maxlen - j + 1)]:
    
     grams.append(' '.join(token))
    
  
    
 print(grams)
    
 复制代码

output: ['how', 'are', 'u', 'how are', 'are u', 'how are u']

write the output into a file

复制代码

 import os

    
 mids = ["I","I","am","a","human"]
    
 with open(os.path.join('output.txt'), 'w')as outfile:
    
     for i, entity in enumerate(set(mids)):
    
         outfile.write("{}\t{}\n".format(entity, i))
    
 复制代码

output: 为一个文件：output.txt: 内容为：

复制代码

       Human      0

    
     a	1
    
     am	2
    
     I       3
    
 复制代码

argParser in PyTorch:makes it easy to write user-friendly command-line interface. Define how a single command-line argument should be parsed.

function:parser.add_argument(name or flags...[, action][, nargs][, const][, default][, type][, choices][, required][, help][, metavar][, dest])

parameters (cite from the Pytorch documentation):

const - A constant value required by some action and nargs selections.
dest - The name of the attribute to be added to the object returned by parse_args().
action - The basic type of action to be taken when this argument is encountered at the command line.

复制代码

 import argparse

    
  
    
 parser = argparse.ArgumentParser(description='Process some integers.')
    
 parser.add_argument('integers', metavar='N', type=int, nargs='+',
    
                 help='an integer for the accumulator')
    
 parser.add_argument('--sum', dest='accumulate', action='store_const',
    
                 const=sum, default=max,
    
                 help='sum the integers (default: find the max)')
    
  
    
 args = parser.parse_args()
    
 print args.accumulate(args.integers)
    
 复制代码

output: python prog.py 1 2 3 4 --> 4(get the maximum), python prog.py 1 2 3 4 --sum -->10(get the sum)

Counter Object A counter tool is provided to support convenient and rapid tallies

复制代码

 from collections import Counter

    
  
    
 cnt = Counter()
    
 for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
    
   cnt[word] += 1
    
 print(cnt)
    
 复制代码

output: Counter({'blue': 3, 'red': 2, 'green': 1})

PyTorch Manualseed

复制代码

 import torch

    
  
    
 torch.manual_seed(3)
    
 print(torch.rand(3))
    
 复制代码

output: tensor([0.0043, 0.1056, 0.2858]),this array will always be the same, if you don't have the manual_seed function, the output will be different every time

CUDNN deterministic n some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance.If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch.backends.cudnn.deterministic = True

Example:

复制代码

 torch.backends.cudnn.deterministic = True

    
 复制代码

torchtext 注：以下部分来自于知乎大佬Lee的知乎文章 torchtext入门教程，轻松玩转文本数据处理仅作为学习笔记整理，侵删。

torchtext组件：

Field :主要包含以下数据预处理的配置信息，比如指定分词方法，是否转成小写，起始字符，结束字符，补全字符以及词典等等
Dataset :继承自pytorch的Dataset，用于加载数据，提供了TabularDataset可以指点路径，格式，Field信息就可以方便的完成数据加载。同时torchtext还提供预先构建的常用数据集的Dataset对象，可以直接加载使用，splits方法可以同时加载训练集，验证集和测试集。
Iterator : 主要是数据输出的模型的迭代器，可以支持batch定制

field：

复制代码

 TEXT = data.Field(lower=True)

    
 复制代码

此处为数据预处理设置为全部转为小写

Dataset

torchtext的Dataset是继承自pytorch的Dataset，提供了一个可以下载压缩数据并解压的方法（支持.zip, .gz, .tgz）

splits方法可以同时读取训练集，验证集，测试集

TabularDataset可以很方便的读取CSV, TSV, or JSON格式的文件

复制代码

 train = data.TabularDataset(path=os.path.join(args.output, 'dete_train.txt'), format='tsv', fields=[('text', TEXT), ('ed', ED)])

    
 dev, test = data.TabularDataset.splits(path=args.output, validation='valid.txt', test='test.txt', format='tsv', fields=field)
    
 复制代码

加载数据后可以建立词典，建立词典的时候可以使用与训练的word vector

复制代码

 TEXT.build_vocab(train，vectors="text.6B.100d")

    
 复制代码

Iterator

Iterator是torchtext到模型的输出，它提供了我们对数据的一般处理方式，比如打乱，排序，等等，可以动态修改batch大小，这里也有splits方法可以同时输出训练集，验证集，测试集

复制代码

 train_iter = data.Iterator(train, batch_size=args.batch_size, device=torch.device('cuda', args.gpu), train=True,

    
                            repeat=False, sort=False, shuffle=True, sort_within_batch=False)
    
     dev_iter = data.Iterator(dev, batch_size=args.batch_size, device=torch.device('cuda', args.gpu), train=False,
    
                          repeat=False, sort=False, shuffle=False, sort_within_batch=False)
    
 复制代码

Floor division: Python Arithmetic Operators -- // The division of operands where the result is the quotient in which the digits after the decimal point are removed. But if one of the operands is negative, the result is floored, i.e., rounded away from

复制代码

 print(9//4)

    
 print(-11//3)
    
 复制代码

output: 2 -4

转载于:https://juejin.im/post/5d3d8157f265da1ba84ada19

全部评论 (0)

还没有任何评论哟~

学习Knowledge Graph Embedding Based Question Answering代码笔记

前言最近被导师安排学习一下【KnowledgeGraphEmbeddingBasedQuestionAnswering】这篇paper，这篇paper的重点在于运用了KnowledgeGraph为d...

【PaperReading】Knowledge Graph Embedding Based Question Answering

KnowledgeGraphEmbeddingBasedQuestionAnswering 摘要关键词引言问题陈述基于QAKG的知识嵌入知识图嵌入谓词和头部实体学习模型，基于神经网络的谓...

论文笔记：Knowledge-Augmented Language Model Promptingfor Zero-Shot Knowledge Graph Question Answering

论文来源：arxiv202306 论文地址：2306.04136.pdfarxiv.org 论文代码：未公布 BaekJ,AjiAF,SaffariA.KnowledgeAugmentedLangua...

Knowledge Graph Prompting for Multi-Document Question Answering

题目知识图谱提示多文档问答论文地址：https://arxiv.org/abs/2308.11730 项目地址：https://github.com/YuWVandy/KGLLMMDQA 摘要大...

Sequence-to-Sequence Knowledge Graph Completion and Question Answering

摘要知识图嵌入模型用低维嵌入向量表示知识图的每个实体和关系。这些方法最近被应用于KG链路预测和不完全KG的问答KGQA。KGE通常为图中的每个实体创建一个嵌入，这导致在具有数百万个实体的真实图上产生...

Knowledge Embedding Based Graph Convolutional Network

研究问题提出了一种可以充分结合异构的节点信息和边信息，同时学习这两者的嵌入的图卷积网络KEGCN，并将之前的几种知识图谱CNN纳入一个统一的框架下背景动机传统的图卷积模型一般不关注学习边的嵌入，...

CVPR 2020 Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

动机 1.在过去几年里，人工智能已经向类人推理方向取得了重大进展。这是通过模拟受限任务中的人类智能片段而实现的，在这些任务中，机器的性能很容易评估。在这些任务中，视频故事问答作为一个测试床出现，以近似...

Two-stage Generative Question Answering on Temporal Knowledge Graph Using Large Language Models

本文是LLM系列文章，针对《TwostageGenerativeQuestionAnsweringonTemporalKnowledge GraphUsingLargeLanguageModels》的...

Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph

Q:这篇论文试图解决什么问题？ A:这篇论文旨在解决复杂时间知识图谱问题回答（TemporalKnowledgeGraphQuestionAnswering,TKGQA）任务中的挑战。

Graph Embedding学习笔记

1\.简介 GraphEmbedding指把graph转化为低维vector，使得Graph上的问题可以用vector上的方法处理。这样做的意义在于：低维vector形式的算法相比原graph形式...

是否确定退出登录?

学习Knowledge Graph Embedding Based Question Answering代码笔记

前言

正文开始！

全部评论 (0)

相关文章推荐

学习Knowledge Graph Embedding Based Question Answering代码笔记

【PaperReading】Knowledge Graph Embedding Based Question Answering

论文笔记：Knowledge-Augmented Language Model Promptingfor Zero-Shot Knowledge Graph Question Answering

Knowledge Graph Prompting for Multi-Document Question Answering

Sequence-to-Sequence Knowledge Graph Completion and Question Answering

Knowledge Embedding Based Graph Convolutional Network

CVPR 2020 Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

Two-stage Generative Question Answering on Temporal Knowledge Graph Using Large Language Models

Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph

Graph Embedding学习笔记