News

  • 12/21/2023: shorttext 1.6.1 released.

  • 08/26/2023: shorttext 1.6.0 released.

  • 06/19/2023: shorttext 1.5.9 released.

  • 09/23/2022: shorttext 1.5.8 released.

  • 09/22/2022: shorttext 1.5.7 released.

  • 08/29/2022: shorttext 1.5.6 released.

  • 05/28/2022: shorttext 1.5.5 released.

  • 12/15/2021: shorttext 1.5.4 released.

  • 07/11/2021: shorttext 1.5.3 released.

  • 07/06/2021: shorttext 1.5.2 released.

  • 04/10/2021: shorttext 1.5.1 released.

  • 04/09/2021: shorttext 1.5.0 released.

  • 02/11/2021: shorttext 1.4.8 released.

  • 01/11/2021: shorttext 1.4.7 released.

  • 01/03/2021: shorttext 1.4.6 released.

  • 12/28/2020: shorttext 1.4.5 released.

  • 12/24/2020: shorttext 1.4.4 released.

  • 11/10/2020: shorttext 1.4.3 released.

  • 10/18/2020: shorttext 1.4.2 released.

  • 09/23/2020: shorttext 1.4.1 released.

  • 09/02/2020: shorttext 1.4.0 released.

  • 07/23/2020: shorttext 1.3.0 released.

  • 06/05/2020: shorttext 1.2.6 released.

  • 05/20/2020: shorttext 1.2.5 released.

  • 05/13/2020: shorttext 1.2.4 released.

  • 04/28/2020: shorttext 1.2.3 released.

  • 04/07/2020: shorttext 1.2.2 released.

  • 03/23/2020: shorttext 1.2.1 released.

  • 03/21/2020: shorttext 1.2.0 released.

  • 12/01/2019: shorttext 1.1.6 released.

  • 09/24/2019: shorttext 1.1.5 released.

  • 07/20/2019: shorttext 1.1.4 released.

  • 07/07/2019: shorttext 1.1.3 released.

  • 06/05/2019: shorttext 1.1.2 released.

  • 04/23/2019: shorttext 1.1.1 released.

  • 03/03/2019: shorttext 1.1.0 released.

  • 02/14/2019: shorttext 1.0.8 released.

  • 01/30/2019: shorttext 1.0.7 released.

  • 01/29/2019: shorttext 1.0.6 released.

  • 01/13/2019: shorttext 1.0.5 released.

  • 10/03/2018: shorttext 1.0.4 released.

  • 08/06/2018: shorttext 1.0.3 released.

  • 07/24/2018: shorttext 1.0.2 released.

  • 07/17/2018: shorttext 1.0.1 released.

  • 07/14/2018: shorttext 1.0.0 released.

  • 06/18/2018: shorttext 0.7.2 released.

  • 05/30/2018: shorttext 0.7.1 released.

  • 05/17/2018: shorttext 0.7.0 released.

  • 02/27/2018: shorttext 0.6.0 released.

  • 01/19/2018: shorttext 0.5.11 released.

  • 01/15/2018: shorttext 0.5.10 released.

  • 12/14/2017: shorttext 0.5.9 released.

  • 11/08/2017: shorttext 0.5.8 released.

  • 10/27/2017: shorttext 0.5.7 released.

  • 10/17/2017: shorttext 0.5.6 released.

  • 09/28/2017: shorttext 0.5.5 released.

  • 09/08/2017: shorttext 0.5.4 released.

  • 09/02/2017: end of GSoC project.

  • 08/22/2017: shorttext 0.5.1 released.

  • 07/28/2017: shorttext 0.4.1 released.

  • 07/26/2017: shorttext 0.4.0 released.

  • 06/16/2017: shorttext 0.3.8 released.

  • 06/12/2017: shorttext 0.3.7 released.

  • 06/02/2017: shorttext 0.3.6 released.

  • 05/30/2017: GSoC project (Chinmaya Pancholi ).

  • 05/16/2017: shorttext 0.3.5 released.

  • 04/27/2017: shorttext 0.3.4 released.

  • 04/19/2017: shorttext 0.3.3 released.

  • 03/28/2017: shorttext 0.3.2 released.

  • 03/14/2017: shorttext 0.3.1 released.

  • 02/23/2017: shorttext 0.2.1 released.

  • 12/21/2016: shorttext 0.2.0 released.

  • 11/25/2016: shorttext 0.1.2 released.

  • 11/21/2016: shorttext 0.1.1 released.

What’s New

Released 1.6.1 (December 21, 2023)

  • Updated package requirements.

Released 1.6.0 (August 26, 2023)

  • Pinned requirements for ReadTheDocs documentation;

  • Fixed bugs in word-embedding model mean pooling classifiers;

  • Updated package requirements.

Release 1.5.9 (June 19, 2023)

  • Support for Python 3.11;

  • Removing flask.

Release 1.5.8 (September 23, 2022)

  • Package administration.

Release 1.5.7 (September 22, 2022)

  • Removal of requirement of pre-installation of numpy and Cython.

Release 1.5.6 (August 29, 2022)

  • Speeding up inference of VarNNEmbeddedVecClassifier. (Acknowledgement: Ritesh Agrawal)

Release 1.5.5 (May 28, 2022)

  • Support for Python 3.10.

Release 1.5.4 (December 15, 2021)

  • Non-negative stop words.

Release 1.5.3 (July 11, 2021)

  • Documentation updated.

Release 1.5.2 (July 6, 2021)

  • Resolved bugs regarding keras import.

  • Support for Python 3.9.

Release 1.5.1 (April 10, 2021)

  • Replaced TravisCI with CircleCI in the continuous integration pipeline.

Release 1.5.0 (April 09, 2021)

  • Removed support for Python 3.6.

  • Removed buggy BERT representations unit test.

Release 1.4.8 (February 11, 2021)

  • Updated requirements for scipy for Python 3.7 or above.

Release 1.4.7 (January 11, 2021)

  • Updated version of transformers in requirement.txt;

  • Updated BERT encoder for the change of implementation;

  • Fixed unit tests.

Release 1.4.6 (January 3, 2021)

  • Bug regarding Python 3.6 requirement for scipy.

Release 1.4.5 (December 28, 2020)

  • Bugs fixed about Python 2 to 3 updates, filter in shorttext.metrics.embedfuzzy.

Release 1.4.4 (December 24, 2020)

  • Bugs regarding SumEmbedVeccClassification.py;

  • Fixing bugs due to Python 3.6 restriction on scipy.

Release 1.4.3 (November 10, 2020)

  • Bugs about transformer-based model on different devices resolved.

Release 1.4.2 (October 18, 2020)

  • Documentation requirements and PyUp configs cleaned up.

Release 1.4.1 (September 23, 2020)

  • Documentation and codes cleaned up.

Release 1.4.0 (September 2, 2020)

  • Provided support BERT-based sentence and tokens embeddings;

  • Implemented support for BERTScores.

Release 1.3.0 (July 23, 2020)

  • Removed all dependencies on PuLP; all computations of word mover’s distance (WMD) is performed using SciPy.

Release 1.2.6 (June 20, 2020)

  • Removed Python-2 codes (urllib2).

Release 1.2.5 (May 20, 2020)

  • Update on gensim package usage and requirements;

  • Removed some deprecated functions.

Release 1.2.4 (May 13, 2020)

  • Update on scikit-learn requirements to >=0.23.0.

  • Directly dependence on joblib;

  • Support for Python 3.8 added.

Release 1.2.3 (April 28, 2020)

  • PyUP scan implemented;

  • Support for Python 3.5 decommissioned.

Release 1.2.2 (April 7, 2020)

  • Removed dependence on PyStemmer, which is replaced by snowballstemmer.

Release 1.2.1 (March 23, 2020)

  • Added port number adjustability for word-embedding API;

  • Removal of Spacy dependency.

Release 1.2.0 (March 21, 2020)

  • API for word-embedding algorithm for one-time loading.

Release 1.1.6 (December 1, 2019)

  • Compatibility with TensorFlow 2.0.0.

Release 1.1.5 (September 24, 2019)

  • Decommissioned GCP buckets; using data files stored in AWS S3 buckets.

Release 1.1.4 (July 20, 2019)

  • Minor bugs fixed.

Release 1.1.3 (July 7, 2019)

  • Updated codes for Console code loading;

  • Updated Travis CI script.

Release 1.1.2 (June 5, 2019)

  • Updated codes for Fasttext moddel loading as the previous function was deprecated.

Release 1.1.1 (April 23, 2019)

Release 1.1.0 (March 3, 2019)

  • Size of embedded vectors set to 300 again when necessary; (possibly break compatibility)

  • Moving corpus data from Github to Google Cloud Storage.

Release 1.0.8 (February 14, 2019)

  • Minor bugs fixed.

Release 1.0.7 (January 30, 2019)

  • Compatibility with Python 3.7 with TensorFlow as the backend.

Release 1.0.7 (January 30, 2019)

  • Compatibility with Python 3.7 with Theano as the backend;

  • Minor documentation changes.

Release 1.0.6 (January 29, 2019)

  • Documentation change;

  • Word-embedding model used in unit test stored in Amazon S3 bucket.

Release 1.0.5 (January 13, 2019)

  • Minor versioning bug fixed.

Release 1.0.4 (October 3, 2018)

  • Package keras requirement updated;

  • Less dependence on pandas.

Release 1.0.3 (August 6, 2018)

  • Bugs regarding I/O of SumEmbeddedVecClassifier.

Release 1.0.2 (July 24, 2018)

  • Minor bugs regarding installation fixed.

Release 1.0.1 (July 14, 2018)

  • Minor bugs fixed.

Release 1.0.0 (July 14, 2018)

  • Python-3 compatibility;

  • Replacing the original stemmer to use Snowball;

  • Certain functions cythonized;

  • Various bugs fixed.

Release 0.7.2 (June 18, 2018)

  • Damerau-Levenshtein distance and longest common prefix implemented using Cython.

Release 0.7.1 (May 30, 2018)

  • Decorator replaced by base class CompactIOMachine;

  • API included in documentation.

Release 0.7.0 (May 17, 2018)

  • Spelling corrections and fuzzy logic;

  • More unit tests.

Release 0.6.0 (February 27, 2018)

  • Support of character-based sequence-to-sequence (seq2seq) models.

Release 0.5.11 (January 19, 2018)

  • Removal of word-embedding keras-type layers.

Release 0.5.10 (January 15, 2018)

  • Support of encoder module for character-based models;

  • Implementation of document-term matrix (DTM).

Release 0.5.9 (December 14, 2017)

  • Support of Poincare embedding;

  • Code optimization;

  • Script ShortTextWord2VecSimilarity updated to ShortTextWordEmbedSimilarity.

Release 0.5.8 (November 8, 2017)

  • Removed most explicit user-specification of vecsize for given word-embedding models;

  • Removed old namespace for topic models (no more backward compatibility).

  • Integration of [FastText](https://github.com/facebookresearch/fastText).

Release 0.5.7 (October 27, 2017)

  • Removed most explicit user-specification of vecsize for given word-embedding models;

  • Removed old namespace for topic models (hence no more backward compatibility).

Release 0.5.6 (October 17, 2017)

  • Updated the neural network framework due to the change in gensim API.

Release 0.5.5 (September 28, 2017)

  • Script ShortTextCategorizerConsole updated.

Release 0.5.4 (September 8, 2017)

  • Bug fixed;

  • New scripts for finding distances between sentences;

  • Finding similarity between two sentences using Jaccard index.

End of GSoC Program (September 2, 2017)

Chinmaya summarized his GSoC program in his blog post posted in RaRe Incubator.

Release 0.5.1 (August 22, 2017)

  • Implementation of Damerau-Levenshtein distance and soft Jaccard score;

  • Implementation of Word Mover’s distance.

Release 0.4.1 (July 28, 2017)

  • Further Travis.CI update tests;

  • Model file I/O updated (for huge models);

  • Migrating documentation to [readthedocs.org](readthedocs.org); previous documentation at Pythonhosted.org destroyed.

Release 0.4.0 (July 26, 2017)

  • Maximum entropy models;

  • Use of gensim Word2Vec keras layers;

  • Incorporating new features from gensim;

  • Use of Travis.CI for pull request testing.

Release 0.3.8 (June 16, 2017)

  • Bug fixed on sumvecframeworks.

Release 0.3.7 (June 12, 2017)

  • Bug fixed on VarNNSumEmbedVecClassifier.

Release 0.3.6 (June 2, 2017)

  • Added deprecation decorator;

  • Fixed path configurations;

  • Added “update” corpus capability to gensim models.

Google Summer of Code (May 30, 2017)

Chinamaya Pancholi, a Google Summer of Code (GSoC) student, is involved in the open-source development of gensim, that his project will be very related to the shorttext package. More information can be found in his first blog entry .

Release 0.3.5 (May 16, 2017)

  • Refactoring topic modeling to generators subpackage, but keeping package backward compatible.

  • Added Inaugural Addresses as an example training data;

  • Fixed bugs about package paths.

Release 0.3.4 (Apr 27, 2017)

  • Fixed relative path loading problems.

Release 0.3.3 (Apr 19, 2017)

  • Deleted CNNEmbedVecClassifier;

  • Added script ShortTextWord2VecSimilarity.

More Info

Release 0.3.2 (Mar 28, 2017)

  • Bug fixed for gensim model I/O;

  • Console scripts update;

  • Neural networks up to Keras 2 standard (refer to this ).

Release 0.3.1 (Mar 14, 2017)

  • Compact model I/O: all models are in single files;

  • Implementation of stacked generalization using logistic regression.

Release 0.2.1 (Feb 23, 2017)

  • Removal attempts of loading GloVe model, as it can be run using gensim script;

  • Confirmed compatibility of the package with tensorflow;

  • Use of spacy for tokenization, instead of nltk;

  • Use of stemming for Porter stemmer, instead of nltk;

  • Removal of nltk dependencies;

  • Simplifying the directory and module structures;

  • Module packages updated.

More Info

Release 0.2.0 (Dec 21, 2016)

Home: Homepage of shorttext