# Maximum Entropy (MaxEnt) Classifier¶

## Maxent¶

Maximum entropy (maxent) classifier has been a popular text classifier, by parameterizing the model to achieve maximum categorical entropy, with the constraint that the resulting probability on the training data with the model being equal to the real distribution.

The maxent classifier in shorttext is impleneted by keras. The optimization algorithm is defaulted to be the Adam optimizer, although other gradient-based or momentum-based optimizers can be used. The traditional methods such as generative iterative scaling (GIS) or L-BFGS cannot be used here.

To use the maxent classifier, import the package:

>>> import shorttext
>>> from shorttext.classifiers import MaxEntClassifier


>>> classdict = shorttext.data.nihreports()


The classifier can be instantiated by:

>>> classifier = MaxEntClassifier()


Train the classifier:

>>> classifier.train(classdict, nb_epochs=1000)


After training, it can be used for classification, such as

>>> classifier.score('cancer immunology')   # NCI tops the score
>>> classifier.score('children health')     # NIAID tops the score
>>> classifier.score('Alzheimer disease and aging')    # NIAID tops the score


To save the model,

>>> classifier.save_compact_model('/path/to/filename.bin')


To load the model to be a classifier, enter:

>>> classifier2 = shorttext.classifiers.load_maxent_classifier('/path/to/filename.bin')

class shorttext.classifiers.bow.maxent.MaxEntClassification.MaxEntClassifier(preprocessor=<function MaxEntClassifier.<lambda>>)

This is a classifier that implements the principle of maximum entropy.

Reference: * Adam L. Berger, Stephen A. Della Pietra, Vincent J. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing,” Computational Linguistics 22(1): 39-72 (1996).

convert_classdict_to_XY(classdict)

Convert the training data into sparse matrices for training.

Parameters: classdict (dict) – training data a tuple, consisting of sparse matrices for X (training data) and y (the labels of the training data) tuple
index_classlabels()

Index the class outcome labels.

Index the class outcome labels into integers, for neural network implementation.

loadmodel(nameprefix)

Load a trained model from files.

Given the prefix of the file paths, load the model from files with name given by the prefix followed by “_classlabels.txt”, “.json”, “.h5”, “_labelidx.pkl”, and “_dictionary.dict”.

If this has not been run, or a model was not trained by train(), a ModelNotTrainedException will be raised while performing prediction or saving the model.

Parameters: nameprefix (str) – prefix of the file path None
savemodel(nameprefix)

Save the trained model into files.

Given the prefix of the file paths, save the model into files, with name given by the prefix. There will be give files produced, one name ending with “_classlabels.txt”, one with “.json”, one with “.h5”, one with “_labelidx.pkl”, and one with “_dictionary.dict”.

If there is no trained model, a ModelNotTrainedException will be thrown.

Parameters: nameprefix (str) – prefix of the file path None ModelNotTrainedException
score(shorttext)

Calculate the scores for all the class labels for the given short sentence.

Given a short sentence, calculate the classification scores for all class labels, returned as a dictionary with key being the class labels, and values being the scores. If the short sentence is empty, or if other numerical errors occur, the score will be numpy.nan. If neither train() nor loadmodel() was run, it will raise ModelNotTrainedException.

Parameters: shorttext (str) – a short sentence a dictionary with keys being the class labels, and values being the corresponding classification scores dict ModelNotTrainedException
shorttext_to_vec(shorttext)

Convert the shorttext into a sparse vector given the dictionary.

According to the dictionary (gensim.corpora.Dictionary), convert the given text into a vector representation, according to the occurence of tokens.

This function is deprecated and no longer used because it is too slow to run in a loop. But this is used while doing prediction.

Parameters: shorttext (str) – short text to be converted. sparse vector of the vector representation scipy.sparse.dok_matrix
train(classdict, nb_epochs=500, l2reg=0.01, bias_l2reg=0.01, optimizer='adam')

Train the classifier.

Given the training data, train the classifier.

Parameters: classdict (dict) – training data nb_epochs (int) – number of epochs (Defauly: 500) l2reg (float) – L2 regularization coefficient (Default: 0.01) bias_l2reg (float) – L2 regularization coefficient for bias (Default: 0.01) optimizer (str) – optimizer for gradient descent. Options: sgd, rmsprop, adagrad, adadelta, adam, adamax, nadam. (Default: adam) None
shorttext.classifiers.bow.maxent.MaxEntClassification.load_maxent_classifier(name, compact=True)

Load the maximum entropy classifier from saved model.

Given a moel file(s), load the maximum entropy classifier.

Parameters: name (str) – name or prefix of the file, if compact is True or False respectively compact (bool) – whether the model file is compact (Default:True) maximum entropy classifier MaxEntClassifier
shorttext.classifiers.bow.maxent.MaxEntClassification.logistic_framework(nb_features, nb_outputs, l2reg=0.01, bias_l2reg=0.01, optimizer='adam')

Construct the neural network of maximum entropy classifier.

Given the numbers of features and the output labels, return a keras neural network
for implementing maximum entropy (multinomial) classifier.
Parameters: nb_features (int) – number of features nb_outputs (int) – number of output labels l2reg (float) – L2 regularization coefficient (Default: 0.01) bias_l2reg (float) – L2 regularization coefficient for bias (Default: 0.01) optimizer (str) – optimizer for gradient descent. Options: sgd, rmsprop, adagrad, adadelta, adam, adamax, nadam. (Default: adam) keras sequential model for maximum entropy classifier keras.model.Sequential

## Reference¶

Adam L. Berger, Stephen A. Della Pietra, Vincent J. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing,” Computational Linguistics 22(1): 39-72 (1996). [ACM]

Daniel E. Russ, Kwan-Yuet Ho, Joanne S. Colt, Karla R. Armenti, Dalsu Baris, Wong-Ho Chow, Faith Davis, Alison Johnson, Mark P. Purdue, Margaret R. Karagas, Kendra Schwartz, Molly Schwenn, Debra T. Silverman, Patricia A. Stewart, Calvin A. Johnson, Melissa C. Friesen, “Computer-based coding of free-text job descriptions to efficiently and reliably incorporate occupational risk factors into large-scale epidemiological studies”, Occup. Environ. Med. 73, 417-424 (2016). [BMJ]

Daniel Russ, Kwan-yuet Ho, Melissa Friesen, “It Takes a Village To Solve A Problem in Data Science,” Data Science Maryland, presentation at Applied Physics Laboratory (APL), Johns Hopkins University, on June 19, 2017. (2017) [Slideshare]