Installation

PIP

Package shorttext runs in Python 2.7, 3.5, 3.6, and 3.7. However, for Python 3.7, the backend of keras cannot be Tensorflow.

To install the package in Linux or OS X, enter the following in the console:

pip install -U shorttext

It is very possible that you have to do it as root, that you have to add sudo in front of the command.

On the other hand, to get the development version on Github, you can install from Github:

pip install -U git+https://github.com/stephenhky/PyShortTextCategorization@master

By adding -U in the command, it automatically installs the required packages. If not, you have to install these packages on your own.

Before using, check the language model of spaCy has been installed or updated, by running:

python -m spacy download en

Backend for Keras

The package keras (version >= 2.0.0) uses either Tensorflow, Theano, or CNTK as the backend, while Theano is usually the default. However, it is highly recommended to use Tensorflow as the backend. Users are advised to install the backend Tensorflow (preferred for Python 2.7, 3.5, and 3.6) or Theano (preferred for Python 3.7) in advance. Refer to Frequently Asked Questions (FAQ) for how to switch the backend. It is also desirable if the package Cython has been previously installed.

Possible Solutions for Installation Failures

Most developers can install shorttext with the instructions above. If the installation fails, you may try one (or more) of the following:

  1. Installing Python-dev by typing:
pip install -U python-dev

for Python 2.7, or

pip install -U python3-dev

for Python 3.5, 3.6, and 3.7.

  1. Installing gcc by entering
apt-get install libc6

Required Packages

  • Numpy (Numerical Python, version >= 1.11.3)
  • SciPy (Scientific Python, version >= 0.18.1)
  • Scikit-Learn (Machine Learning in Python)
  • keras (Deep Learning Library for Theano and Tensorflow, version >= 2.2.3)
  • gensim (Topic Modeling for Humans, version >= 3.2.0)
  • Pandas (Python Data Analysis Library)
  • spaCy (Industrial Strenglth Natural Language Processing in Python, version >= 1.7.0)
  • PuLP (Optimization with PuLP)
  • PyStemmer (Snowball Stemmer, the package stemming is no longer used)

Home: Homepage of shorttext