注意神经机器翻译----创建与训练机器翻译模型

阅读量：

该笔记本基于 seq2seq 模型用于将西班牙语文本转换为英语。作为一个较为复杂的案例，请假设读者具备一定的 seq2seq 模型知识。

在本笔记本中完成模型训练后, 您将能够输入西班牙语句子示例, 例如 “¿Still are they at home? ”, 并将其返回对应的英文翻译: “The translation is ‘still at home’.”

就玩具级别的实例而言，其翻译效果尚可；然而所生成的关注图可能更具吸引力。这有助于了解输入句子哪些部分在被翻译时受到模型关注：

注意：此示例在单个P100 GPU上运行大约需要10分钟。

复制代码

    from __future__ import absolute_import, division, print_function
    
    !pip install tensorflow-gpu==2.0.0-alpha0
    import tensorflow as tf
    
    import matplotlib.pyplot as plt
    from sklearn.model_selection import train_test_split
    
    import unicodedata
    import re
    import numpy as np
    import os
    import io
    import time
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

下载并准备数据集

我们将在http://www.manythings.org/anki/这一资源平台上获取语言数据集。该数据集由以下格式的语言训练样本组成：

May I borrow this book? ¿Puedo tomar prestado este libro?
有多种语言可供选择，但我们将使用英语 - 西班牙语数据集。为方便起见，我们在Google Cloud上托管了此数据集的副本，但您也可以下载自己的副本。下载数据集后，以下是我们准备数据的步骤：
1.为每个句子添加开始和结束标记。
2.删除特殊字符以清除句子。
3.创建一个单词索引和反向单词索引（从单词→id和id→单词映射的字典）。
4.将每个句子填充到最大长度。

复制代码

    # Download the file
    path_to_zip = tf.keras.utils.get_file(
    'spa-eng.zip', origin='http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip', 
    extract=True)
    
    path_to_file = os.path.dirname(path_to_zip)+"/spa-eng/spa.txt"
    
    
      
      
      
      
      
      
    
    AI助手

复制代码

    # Converts the unicode file to ascii
    def unicode_to_ascii(s):
    return ''.join(c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn')
    
    
    def preprocess_sentence(w):
    w = unicode_to_ascii(w.lower().strip())
    
    # creating a space between a word and the punctuation following it
    # eg: "he is a boy." => "he is a boy ." 
    # Reference:- https://stackoverflow.com/questions/3645931/python-padding-punctuation-with-white-spaces-keeping-punctuation
    w = re.sub(r"([?.!,¿])", r" \1 ", w)
    w = re.sub(r'[" "]+', " ", w)
    
    # replacing everything with space except (a-z, A-Z, ".", "?", "!", ",")
    w = re.sub(r"[^a-zA-Z?.!,¿]+", " ", w)
    
    w = w.rstrip().strip()
    
    # adding a start and an end token to the sentence
    # so that the model know when to start and stop predicting.
    w = '<start> ' + w + ' <end>'
    return w
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    en_sentence = u"May I borrow this book?"
    sp_sentence = u"¿Puedo tomar prestado este libro?"
    print(preprocess_sentence(en_sentence))
    print(preprocess_sentence(sp_sentence).encode('utf-8'))
    
    
      
      
      
      
    
    AI助手

复制代码

    # 1. Remove the accents
    # 2. Clean the sentences
    # 3. Return word pairs in the format: [ENGLISH, SPANISH]
    def create_dataset(path, num_examples):
    lines = io.open(path, encoding='UTF-8').read().strip().split('\n')
    
    word_pairs = [[preprocess_sentence(w) for w in l.split('\t')]  for l in lines[:num_examples]]
    
    return zip(*word_pairs)
    
    
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    en, sp = create_dataset(path_to_file, None)
    print(en[-1])
    print(sp[-1])
    
    
      
      
      
    
    AI助手

复制代码

    def max_length(tensor):
    return max(len(t) for t in tensor)
    
    
      
      
    
    AI助手

复制代码

    def tokenize(lang):
      lang_tokenizer = tf.keras.preprocessing.text.Tokenizer(
      filters='')
      lang_tokenizer.fit_on_texts(lang)
      
      tensor = lang_tokenizer.texts_to_sequences(lang)
      
      tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor,
                                                         padding='post')
      
      return tensor, lang_tokenizer
    
    
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    def load_dataset(path, num_examples=None):
    # creating cleaned input, output pairs
    targ_lang, inp_lang = create_dataset(path, num_examples)
    
    input_tensor, inp_lang_tokenizer = tokenize(inp_lang)
    target_tensor, targ_lang_tokenizer = tokenize(targ_lang)
    
    return input_tensor, target_tensor, inp_lang_tokenizer, targ_lang_tokenizer
    
    
      
      
      
      
      
      
      
      
    
    AI助手

限制数据集的大小以更快地进行实验（可选）

对于处理超过1万条消息（即超过1万条消息/天）的数据集进行训练需要较长的时间。为了加快训练速度并平衡效率与资源投入之间的关系，在不影响整体性能的前提下，建议将消息数量限定在每日3万条以内（当然这可能会导致翻译质量下降）：

复制代码

    # Try experimenting with the size of that dataset
    num_examples = 30000
    input_tensor, target_tensor, inp_lang, targ_lang = load_dataset(path_to_file, num_examples)
    
    # Calculate max_length of the target tensors
    max_length_targ, max_length_inp = max_length(target_tensor), max_length(input_tensor)
    
    
      
      
      
      
      
      
    
    AI助手

复制代码

    # Creating training and validation sets using an 80-20 split
    input_tensor_train, input_tensor_val, target_tensor_train, target_tensor_val = train_test_split(input_tensor, target_tensor, test_size=0.2)
    
    # Show length
    len(input_tensor_train), len(target_tensor_train), len(input_tensor_val), len(target_tensor_val)
    
    
      
      
      
      
      
    
    AI助手

复制代码

    def convert(lang, tensor):
      for t in tensor:
    if t!=0:
      print ("%d ----> %s" % (t, lang.index_word[t]))
    
    
      
      
      
      
    
    AI助手

复制代码

    print ("Input Language; index to word mapping")
    convert(inp_lang, input_tensor_train[0])
    print ()
    print ("Target Language; index to word mapping")
    convert(targ_lang, target_tensor_train[0])
    
    
      
      
      
      
      
    
    AI助手

创建一个 tf.data 数据集

复制代码

    BUFFER_SIZE = len(input_tensor_train)
    BATCH_SIZE = 64
    steps_per_epoch = len(input_tensor_train)//BATCH_SIZE
    embedding_dim = 256
    units = 1024
    vocab_inp_size = len(inp_lang.word_index)+1
    vocab_tar_size = len(targ_lang.word_index)+1
    
    dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)
    dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
    
    
      
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    example_input_batch, example_target_batch = next(iter(dataset))
    example_input_batch.shape, example_target_batch.shape
    
    
      
      
    
    AI助手

编写编码器和解码器模型

在此问题中, 我们将开发一个基于注意力机制的编码器-解码器系统, 您可以在TensorFlow官网访问其提供的《神经机器翻译（seq2seq）教程》一文。本示例采用了经过优化的API集合, 该笔记本应用了来自《神经机器翻译（seq2seq）教程》中介绍的注意方程式。图中展示了输入单词如何通过注意机制分配权重, 接着解码器利用这些权重来预测句子中的下一个单词。

The sequence is processed by an encoder model to yield an encoder output with dimensions _(batch_size\ ,\ max_length\ ,\ hidden_size)_ and an encoder hidden state with dimensions _(batch_size\ ,\ hidden_size)_.

以下是用到的的方程式：

The Bahdanau attention mechanism is being employed. We will establish notation prior to simplifying the form.

FC 是全连接（密集）层
- EO 表示编码器输出
- H 表示内部状态
- X 是输入至解码器的信息

和伪代码：

score variable is calculated as $FC(\tanh(FC(EO) + FC(H)))$ .
attention weights are computed via the application of the Softmax function along the first axis. By default, this function operates on the last axis of tensors; however, in our case, we need to apply it to the initial dimension to ensure proper normalization across sequences.
The context vector is generated by summing the element-wise product of attention weights and EO across a specific dimension. This step ensures that each position in the sequence receives a weighted contribution from all other positions based on their relevance as determined by attention mechanisms.
The embedding output represents the encoded representation of the input sequence.
The merged vector is formed by concatenating two vectors: one derived from an operation on tokens within our current sentence and another representing global contextual information captured by a precomputed sentence-level embedding.
This concatenated representation is then fed into a Gated Recurrent Unit (GRU), which processes it sequentially while maintaining its hidden state through time steps. This mechanism allows us to model temporal dependencies effectively while preserving contextual information at both local and global levels.

每个步骤中所有向量的形状都已在代码中的注释中指定：

复制代码

    class Encoder(tf.keras.Model):
      def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.enc_units, 
                                   return_sequences=True, 
                                   return_state=True, 
                                   recurrent_initializer='glorot_uniform')
    
      def call(self, x, hidden):
    x = self.embedding(x)
    output, state = self.gru(x, initial_state = hidden)        
    return output, state
    
      def initialize_hidden_state(self):
    return tf.zeros((self.batch_sz, self.enc_units))
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)
    
    # sample input
    sample_hidden = encoder.initialize_hidden_state()
    sample_output, sample_hidden = encoder(example_input_batch, sample_hidden)
    print ('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
    print ('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))
    
    
      
      
      
      
      
      
      
    
    AI助手

复制代码

    class BahdanauAttention(tf.keras.Model):
      def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)
      
      def call(self, query, values):
    # hidden shape == (batch_size, hidden size)
    # hidden_with_time_axis shape == (batch_size, 1, hidden size)
    # we are doing this to perform addition to calculate the score
    hidden_with_time_axis = tf.expand_dims(query, 1)
    
    # score shape == (batch_size, max_length, hidden_size)
    score = self.V(tf.nn.tanh(
        self.W1(values) + self.W2(hidden_with_time_axis)))
    
    # attention_weights shape == (batch_size, max_length, 1)
    # we get 1 at the last axis because we are applying score to self.V
    attention_weights = tf.nn.softmax(score, axis=1)
    
    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)
    
    return context_vector, attention_weights
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    attention_layer = BahdanauAttention(10)
    attention_result, attention_weights = attention_layer(sample_hidden, sample_output)
    
    print("Attention result shape: (batch size, units) {}".format(attention_result.shape))
    print("Attention weights shape: (batch_size, sequence_length, 1) {}".format(attention_weights.shape))
    
    
      
      
      
      
      
    
    AI助手

复制代码

    class Decoder(tf.keras.Model):
      def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units, 
                                   return_sequences=True, 
                                   return_state=True, 
                                   recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)
    
    # used for attention
    self.attention = BahdanauAttention(self.dec_units)
    
      def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)
    
    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)
    
    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
    
    # passing the concatenated vector to the GRU
    output, state = self.gru(x)
    
    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))
    
    # output shape == (batch_size, vocab)
    x = self.fc(output)
    
    return x, state, attention_weights
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    decoder = Decoder(vocab_tar_size, embedding_dim, units, BATCH_SIZE)
    
    sample_decoder_output, _, _ = decoder(tf.random.uniform((64, 1)), 
                                      sample_hidden, sample_output)
    
    print ('Decoder output shape: (batch_size, vocab size) {}'.format(sample_decoder_output.shape))
    
    
      
      
      
      
      
      
    
    AI助手

定义优化器和损失函数

复制代码

    optimizer = tf.keras.optimizers.Adam()
    loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    
    def loss_function(real, pred):
      mask = tf.math.logical_not(tf.math.equal(real, 0))
      loss_ = loss_object(real, pred)
    
      mask = tf.cast(mask, dtype=loss_.dtype)
      loss_ *= mask
      
      return tf.reduce_mean(loss_)
    
    
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

Checkpoints (基于对象的保存)

复制代码

    checkpoint_dir = './training_checkpoints'
    checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
    checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 encoder=encoder,
                                 decoder=decoder)
    
    
      
      
      
      
      
    
    AI助手

Training

通过编码层处理输入信息，并返回其对应的编码结果及其隐藏状态。
其输出结果及当前的隐藏状态与初始标记一起传输至解码层进行后续处理。
解码层根据接收到的信息生成预测结果及新的隐层表示。
然后将当前的隐层表示反馈至模型用于后续处理，并利用生成结果计算相应的损失函数。
通过教师强制机制为后续步骤提供真实目标数据。
教师强制机制是一种在训练阶段直接使用真实标签加速收敛的技术方法。
最终步骤是对损失函数进行计算并执行反向传播以优化模型参数。

复制代码

    @tf.function
    def train_step(inp, targ, enc_hidden):
      loss = 0
        
      with tf.GradientTape() as tape:
    enc_output, enc_hidden = encoder(inp, enc_hidden)
    
    dec_hidden = enc_hidden
    
    dec_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1)       
    
    # Teacher forcing - feeding the target as the next input
    for t in range(1, targ.shape[1]):
      # passing enc_output to the decoder
      predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
    
      loss += loss_function(targ[:, t], predictions)
    
      # using teacher forcing
      dec_input = tf.expand_dims(targ[:, t], 1)
    
      batch_loss = (loss / int(targ.shape[1]))
    
      variables = encoder.trainable_variables + decoder.trainable_variables
    
      gradients = tape.gradient(loss, variables)
    
      optimizer.apply_gradients(zip(gradients, variables))
      
      return batch_loss
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    EPOCHS = 10
    
    for epoch in range(EPOCHS):
      start = time.time()
    
      enc_hidden = encoder.initialize_hidden_state()
      total_loss = 0
    
      for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
    batch_loss = train_step(inp, targ, enc_hidden)
    total_loss += batch_loss
    
    if batch % 100 == 0:
        print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,
                                                     batch,
                                                     batch_loss.numpy()))
      # saving (checkpoint) the model every 2 epochs
      if (epoch + 1) % 2 == 0:
    checkpoint.save(file_prefix = checkpoint_prefix)
    
      print('Epoch {} Loss {:.4f}'.format(epoch + 1,
                                      total_loss / steps_per_epoch))
      print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

翻译

评估功能相当于训练循环的一种形式，在这里我们不采用教师强制机制。
每个时间步骤中，解码器的输入由其上一个预测结果、当前隐藏状态以及编码器输出组成。
预计在某个特定的时间点之前停止模型进行预测，并确定结束标记。
同时记录下每一个时间步所对应的注意力权重信息。

注意：编码器输出仅针对一个输入计算一次。

复制代码

    def evaluate(sentence):
    attention_plot = np.zeros((max_length_targ, max_length_inp))
    
    sentence = preprocess_sentence(sentence)
    
    inputs = [inp_lang.word_index[i] for i in sentence.split(' ')]
    inputs = tf.keras.preprocessing.sequence.pad_sequences([inputs], 
                                                           maxlen=max_length_inp, 
                                                           padding='post')
    inputs = tf.convert_to_tensor(inputs)
    
    result = ''
    
    hidden = [tf.zeros((1, units))]
    enc_out, enc_hidden = encoder(inputs, hidden)
    
    dec_hidden = enc_hidden
    dec_input = tf.expand_dims([targ_lang.word_index['<start>']], 0)
    
    for t in range(max_length_targ):
        predictions, dec_hidden, attention_weights = decoder(dec_input, 
                                                             dec_hidden, 
                                                             enc_out)
        
        # storing the attention weights to plot later on
        attention_weights = tf.reshape(attention_weights, (-1, ))
        attention_plot[t] = attention_weights.numpy()
    
        predicted_id = tf.argmax(predictions[0]).numpy()
    
        result += targ_lang.index_word[predicted_id] + ' '
    
        if targ_lang.index_word[predicted_id] == '<end>':
            return result, sentence, attention_plot
        
        # the predicted ID is fed back into the model
        dec_input = tf.expand_dims([predicted_id], 0)
    
    return result, sentence, attention_plot
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    # function for plotting the attention weights
    def plot_attention(attention, sentence, predicted_sentence):
    fig = plt.figure(figsize=(10,10))
    ax = fig.add_subplot(1, 1, 1)
    ax.matshow(attention, cmap='viridis')
    
    fontdict = {'fontsize': 14}
    
    ax.set_xticklabels([''] + sentence, fontdict=fontdict, rotation=90)
    ax.set_yticklabels([''] + predicted_sentence, fontdict=fontdict)
    
    plt.show()
    
    
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

复制代码

    def translate(sentence):
    result, sentence, attention_plot = evaluate(sentence)
        
    print('Input: %s' % (sentence).encode('utf-8'))
    print('Predicted translation: {}'.format(result))
    
    attention_plot = attention_plot[:len(result.split(' ')), :len(sentence.split(' '))]
    plot_attention(attention_plot, sentence.split(' '), result.split(' '))
    
    
      
      
      
      
      
      
      
      
    
    AI助手

恢复最新的检查点并进行测试

复制代码

    # restoring the latest checkpoint in checkpoint_dir
    checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))
    
    
      
      
    
    AI助手

复制代码

    translate(u'hace mucho frio aqui.')
    
    
      
    
    AI助手

复制代码

    translate(u'esta es mi vida.')
    
    
      
    
    AI助手

复制代码

    translate(u'¿todavia estan en casa?')
    
    
      
    
    AI助手

复制代码

    # wrong translation
    translate(u'trata de averiguarlo.')
    
    
      
      
    
    AI助手

下一步操作尝试

获取多样化的数据集以便测试翻译效果，例如英语到德语或英语到法语。
- 探索更大规模的数据集合进行训练。
使用更多的时间段来提升模型的泛化能力。

全部评论 (0)

还没有任何评论哟~

注意神经机器翻译----创建与训练机器翻译模型

这个笔记本训练序列到序列（seq2seq）模型西班牙语到英语翻译。这是一个高级示例，假定读者已经对序列模型有一定的了解。在此笔记本中训练模型后，您将能够输入西班牙语句子，例如“¿todaviaest...

机器翻译：用 Open-NMT 训练翻译模型

作者：@DataTurks 翻译：疯狂的技术宅原文：https://hackernoon.com/neural… 学习在任意两种语言之间翻译的完整指南本文通过手把手的教程，帮你学习怎样把给定语言翻...

CS224dlecture9 神经机器翻译和注意力机制

神经机器翻译构造一个大型网络，以一种端到端的方式对这个网络进行训练和优化端到端指的是（编码器解码器结构）还是源语言目标语言？ Encoder：将输入序列压缩成一个语义向量c Decoder：根据语...

Tf2.0+基于注意力的神经机器翻译训练发布过程

本篇是人工智能、机器翻译的干货文章，面向从事人工智能的研发工程师和翻译领域的职业从业者。意在译员能更好的从计算机编程角度理解机器翻译的代码实现过程。同时也和机器翻译开发工程师共同分享源码的快乐。

神经机器翻译（Neural Machine Translation）系列教程 - （九）自己动手-搭建-神经机器翻译 -- nematus神经机器翻译

一、简介 nematus是英国爱丁堡大学自然语言处理小组基于theano开发的一套开源的神经机器翻译系统，无论在学术界还是产业界都有一定影响力。实验室网址：http://edinburghnlp.i...

预训练模型增强机器翻译

预训练和微调，例如BERT，通过将知识从资源丰富的预训练任务转移到资源低/零资源的下游任务，在语言理解方面取得了巨大成功。然而，由于在神经机器翻译（NMT）中，由于双语任务的训练目标与单语预训练模型相...

【五机器翻译与文本生成】【 5.3 神经机器翻译：Seq2Seq模型、Attention机制、Transformer】

各位技术宅们，今天咱们要掀开机器翻译的底裤，看看那些让AI突然开挂说八国语言的黑魔法。想象一下，十年前你用的翻译软件能把howareyou翻译成怎么是你，现在DeepL已经能翻译十四行诗了，这中间到底...

机器翻译技术：深入解析神经机器翻译

1\.背景介绍 1.1机器翻译的历史与发展机器翻译（MachineTranslation,MT）作为自然语言处理（NaturalLanguageProcessing,NLP）领域的一个重要分支，旨在...

实践：动手搭建神经机器翻译模型

我们现在几乎每天都会用到翻译软件，无论是看论文看源码看新闻，总是会遇见一些不熟悉不认识的单词，关于机器翻译背后的原理我们在前一篇文章中已经讲到了，今天就来动手实践一下。在这个例子中我们会用一个很小的...

机器翻译/注意力机制

机器翻译（machinetranslation,MT）是用计算机来实现不同语言之间翻译的技术。被翻译的语言通常称为源语言（sourcelanguage），翻译成的结果语言称为目标语言（targetla...

是否确定退出登录?

注意神经机器翻译----创建与训练机器翻译模型

限制数据集的大小以更快地进行实验（可选）

创建一个 tf.data 数据集

编写编码器和解码器模型

定义优化器和损失函数

Checkpoints (基于对象的保存)

Training

翻译

恢复最新的检查点并进行测试

下一步操作尝试

全部评论 (0)

相关文章推荐

注意神经机器翻译----创建与训练机器翻译模型

机器翻译：用 Open-NMT 训练翻译模型

CS224dlecture9 神经机器翻译和注意力机制

Tf2.0+基于注意力的神经机器翻译训练发布过程

神经机器翻译（Neural Machine Translation）系列教程 - （九）自己动手-搭建-神经机器翻译 -- nematus神经机器翻译

预训练模型增强机器翻译

【五 机器翻译与文本生成】【 5.3 神经机器翻译：Seq2Seq模型、Attention机制、Transformer】

机器翻译技术：深入解析神经机器翻译

实践：动手搭建神经机器翻译模型

机器翻译/注意力机制

【五机器翻译与文本生成】【 5.3 神经机器翻译：Seq2Seq模型、Attention机制、Transformer】