Character-Based Sequence-to-Sequence (seq2seq) Models¶
Since release 0.6.0, shorttext supports sequence-to-sequence (seq2seq) learning. While there is a general seq2seq class behind, it provides a character-based seq2seq implementation.
Creating One-hot Vectors¶
To use it, create an instance of the class shorttext.generators.SentenceToCharVecEncoder
:
>>> import numpy as np
>>> import shorttext
>>> from urllib.request import urlopen
>>> chartovec_encoder = shorttext.generators.initSentenceToCharVecEncoder(urlopen('http://norvig.com/big.txt', 'r'))
The above code is the same as Character to One-Hot Vector .
-
shorttext.generators.charbase.char2vec.
initSentenceToCharVecEncoder
(textfile, encoding=None)¶ Instantiate a class of SentenceToCharVecEncoder from a text file.
Parameters: - textfile (file) – text file
- encoding (str) – encoding of the text file (Default: None)
Returns: an instance of SentenceToCharVecEncoder
Return type:
Training¶
Then we can train the model by creating an instance of shorttext.generators.CharBasedSeq2SeqGenerator
:
>>> latent_dim = 100
>>> seq2seqer = shorttext.generators.CharBasedSeq2SeqGenerator(chartovec_encoder, latent_dim, 120)
And then train this neural network model:
>>> seq2seqer.train(text, epochs=100)
This model takes several hours to train on a laptop.
-
class
shorttext.generators.seq2seq.charbaseS2S.
CharBasedSeq2SeqGenerator
(sent2charvec_encoder, latent_dim, maxlen)¶ Class implementing character-based sequence-to-sequence (seq2seq) learning model.
This class implements the seq2seq model at the character level. This class calls
Seq2SeqWithKeras
.Reference:
Oriol Vinyals, Quoc Le, “A Neural Conversational Model,” arXiv:1506.05869 (2015). [arXiv]
-
compile
(optimizer='rmsprop', loss='categorical_crossentropy')¶ Compile the keras model.
Parameters: - optimizer (str) – optimizer for gradient descent. Options: sgd, rmsprop, adagrad, adadelta, adam, adamax, nadam. (Default: rmsprop)
- loss (str) – loss function available from keras (Default: ‘categorical_crossentropy`)
Returns: None
-
decode
(txtseq, stochastic=True)¶ Given an input text, produce the output text.
Parameters: txtseq (str) – input text Returns: output text Return type: str
-
loadmodel
(prefix)¶ Load a trained model from various files.
To load a compact model, call
load_compact_model()
.Parameters: prefix (str) – prefix of the file path Returns: None
-
prepare_trainingdata
(txtseq)¶ Transforming sentence to a sequence of numerical vectors.
Parameters: txtseq (str) – text Returns: rank-3 tensors for encoder input, decoder input, and decoder output Return type: (numpy.array, numpy.array, numpy.array)
-
savemodel
(prefix, final=False)¶ Save the trained models into multiple files.
To save it compactly, call
save_compact_model()
.If final is set to True, the model cannot be further trained.
If there is no trained model, a ModelNotTrainedException will be thrown.
Parameters: - prefix (str) – prefix of the file path
- final (bool) – whether the model is final (that should not be trained further) (Default: False)
Returns: None
Raise: ModelNotTrainedException
-
train
(txtseq, batch_size=64, epochs=100, optimizer='rmsprop', loss='categorical_crossentropy')¶ Train the character-based seq2seq model.
Parameters: - txtseq (str) – text
- batch_size (int) – batch size (Default: 64)
- epochs (int) – number of epochs (Default: 100)
- optimizer (str) – optimizer for gradient descent. Options: sgd, rmsprop, adagrad, adadelta, adam, adamax, nadam. (Default: rmsprop)
- loss (str) – loss function available from keras (Default: ‘categorical_crossentropy`)
Returns: None
-
Decoding¶
After training, we can use this class as a generative model of answering questions as a chatbot:
>>> seq2seqer.decode('Happy Holiday!')
It does not give definite answers because there is a stochasticity in the prediction.
Model I/O¶
This model can be saved by entering:
>>> seq2seqer.save_compact_model('/path/to/norvigtxt_iter5model.bin')
And can be loaded by:
>>> seq2seqer2 = shorttext.generators.seq2seq.charbaseS2S.loadCharBasedSeq2SeqGenerator('/path/to/norvigtxt_iter5model.bin')
-
shorttext.generators.seq2seq.charbaseS2S.
loadCharBasedSeq2SeqGenerator
(path, compact=True)¶ Load a trained CharBasedSeq2SeqGenerator class from file.
Parameters: - path (str) – path of the model file
- compact (bool) – whether it is a compact model (Default: True)
Returns: a CharBasedSeq2SeqGenerator class for sequence to sequence inference
Return type:
Reference¶
Aurelien Geron, Hands-On Machine Learning with Scikit-Learn and TensorFlow (Sebastopol, CA: O’Reilly Media, 2017). [O’Reilly]
Ilya Sutskever, James Martens, Geoffrey Hinton, “Generating Text with Recurrent Neural Networks,” ICML (2011). [UToronto]
Ilya Sutskever, Oriol Vinyals, Quoc V. Le, “Sequence to Sequence Learning with Neural Networks,” arXiv:1409.3215 (2014). [arXiv]
Oriol Vinyals, Quoc Le, “A Neural Conversational Model,” arXiv:1506.05869 (2015). [arXiv]
Tom Young, Devamanyu Hazarika, Soujanya Poria, Erik Cambria, “Recent Trends in Deep Learning Based Natural Language Processing,” arXiv:1708.02709 (2017). [arXiv]
Zackary C. Lipton, John Berkowitz, “A Critical Review of Recurrent Neural Networks for Sequence Learning,” arXiv:1506.00019 (2015). [arXiv]