Advertisement

NLP Topic 1 Word Embeddings and Sentence Embeddings

阅读量:

Topic 1 Word Embeddings and Sentence Embeddings

cs224n-2019

  • lecture 1: Introduction and Word Vectors

  • lecture 2: Word Vectors 2 and Word Senses
    slp

第6章:向量语义学 ruder.io/word-embeddings

  • chapter 14: The Representation of Sentence Meaning

语言是信息传递知识传递的载体,
能有效沟通的前提是,双方的知识等同

文章目录

  • Topic 1: Word Embeddings and Sentence Embeddings
    • How can we represent the meaning of a word?

      • By analyzing the context surrounding each word.
      • Word2vec: Overview
      • objective and prediction function

How to represent the meaning of a word?

意义在于符号(signifier)对应于其背后的概念(signified),这种对应关系可以用符号化的方式加以描述

Representing words by their context

The system should be designed to represent similarities within their own vector representations.
All optimization objectives and actual applications center on capturing semantic similarities among words.
In distributional semantics, a word's meaning is largely determined by its nearby co-occurring words.
As per this principle, one can determine a word's identity based on its typical companions.
Word embeddings represent a dense vector for each individual word, which is designed to be most similar to those of words appearing in comparable contexts.

Word2vec: Overview

Word2vec (Mikolov et al. 2013) is a framework for learning word vectors, main idea:

我们拥有大量类型的文本数据。
每个词在固定词汇表中都有对应的向量表示。
遍历文本中的每个位置t,其中心词是c,而o是其周围的上下文词。
利用中心词c和其上下文词o的向量相似度来计算给定c时o的概率(反之亦然)。
持续调整这些向量以最大化这种概率。

Example windows and process for computing P(w_{t+j}|w_t):

objective and prediction function

全部评论 (0)

还没有任何评论哟~