News ==== * 12/21/2023: `shorttext` 1.6.1 released. * 08/26/2023: `shorttext` 1.6.0 released. * 06/19/2023: `shorttext` 1.5.9 released. * 09/23/2022: `shorttext` 1.5.8 released. * 09/22/2022: `shorttext` 1.5.7 released. * 08/29/2022: `shorttext` 1.5.6 released. * 05/28/2022: `shorttext` 1.5.5 released. * 12/15/2021: `shorttext` 1.5.4 released. * 07/11/2021: `shorttext` 1.5.3 released. * 07/06/2021: `shorttext` 1.5.2 released. * 04/10/2021: `shorttext` 1.5.1 released. * 04/09/2021: `shorttext` 1.5.0 released. * 02/11/2021: `shorttext` 1.4.8 released. * 01/11/2021: `shorttext` 1.4.7 released. * 01/03/2021: `shorttext` 1.4.6 released. * 12/28/2020: `shorttext` 1.4.5 released. * 12/24/2020: `shorttext` 1.4.4 released. * 11/10/2020: `shorttext` 1.4.3 released. * 10/18/2020: `shorttext` 1.4.2 released. * 09/23/2020: `shorttext` 1.4.1 released. * 09/02/2020: `shorttext` 1.4.0 released. * 07/23/2020: `shorttext` 1.3.0 released. * 06/05/2020: `shorttext` 1.2.6 released. * 05/20/2020: `shorttext` 1.2.5 released. * 05/13/2020: `shorttext` 1.2.4 released. * 04/28/2020: `shorttext` 1.2.3 released. * 04/07/2020: `shorttext` 1.2.2 released. * 03/23/2020: `shorttext` 1.2.1 released. * 03/21/2020: `shorttext` 1.2.0 released. * 12/01/2019: `shorttext` 1.1.6 released. * 09/24/2019: `shorttext` 1.1.5 released. * 07/20/2019: `shorttext` 1.1.4 released. * 07/07/2019: `shorttext` 1.1.3 released. * 06/05/2019: `shorttext` 1.1.2 released. * 04/23/2019: `shorttext` 1.1.1 released. * 03/03/2019: `shorttext` 1.1.0 released. * 02/14/2019: `shorttext` 1.0.8 released. * 01/30/2019: `shorttext` 1.0.7 released. * 01/29/2019: `shorttext` 1.0.6 released. * 01/13/2019: `shorttext` 1.0.5 released. * 10/03/2018: `shorttext` 1.0.4 released. * 08/06/2018: `shorttext` 1.0.3 released. * 07/24/2018: `shorttext` 1.0.2 released. * 07/17/2018: `shorttext` 1.0.1 released. * 07/14/2018: `shorttext` 1.0.0 released. * 06/18/2018: `shorttext` 0.7.2 released. * 05/30/2018: `shorttext` 0.7.1 released. * 05/17/2018: `shorttext` 0.7.0 released. * 02/27/2018: `shorttext` 0.6.0 released. * 01/19/2018: `shorttext` 0.5.11 released. * 01/15/2018: `shorttext` 0.5.10 released. * 12/14/2017: `shorttext` 0.5.9 released. * 11/08/2017: `shorttext` 0.5.8 released. * 10/27/2017: `shorttext` 0.5.7 released. * 10/17/2017: `shorttext` 0.5.6 released. * 09/28/2017: `shorttext` 0.5.5 released. * 09/08/2017: `shorttext` 0.5.4 released. * 09/02/2017: end of GSoC project. * 08/22/2017: `shorttext` 0.5.1 released. * 07/28/2017: `shorttext` 0.4.1 released. * 07/26/2017: `shorttext` 0.4.0 released. * 06/16/2017: `shorttext` 0.3.8 released. * 06/12/2017: `shorttext` 0.3.7 released. * 06/02/2017: `shorttext` 0.3.6 released. * 05/30/2017: GSoC project (`Chinmaya Pancholi `_ ). * 05/16/2017: `shorttext` 0.3.5 released. * 04/27/2017: `shorttext` 0.3.4 released. * 04/19/2017: `shorttext` 0.3.3 released. * 03/28/2017: `shorttext` 0.3.2 released. * 03/14/2017: `shorttext` 0.3.1 released. * 02/23/2017: `shorttext` 0.2.1 released. * 12/21/2016: `shorttext` 0.2.0 released. * 11/25/2016: `shorttext` 0.1.2 released. * 11/21/2016: `shorttext` 0.1.1 released. What's New ---------- Released 1.6.1 (December 21, 2023) ---------------------------------- * Updated package requirements. Released 1.6.0 (August 26, 2023) -------------------------------- * Pinned requirements for ReadTheDocs documentation; * Fixed bugs in word-embedding model mean pooling classifiers; * Updated package requirements. Release 1.5.9 (June 19, 2023) ----------------------------- * Support for Python 3.11; * Removing flask. Release 1.5.8 (September 23, 2022) ---------------------------------- * Package administration. Release 1.5.7 (September 22, 2022) ---------------------------------- * Removal of requirement of pre-installation of `numpy` and `Cython`. Release 1.5.6 (August 29, 2022) ------------------------------- * Speeding up inference of `VarNNEmbeddedVecClassifier`. (Acknowledgement: Ritesh Agrawal) Release 1.5.5 (May 28, 2022) ----------------------------- * Support for Python 3.10. Release 1.5.4 (December 15, 2021) ----------------------------- * Non-negative stop words. Release 1.5.3 (July 11, 2021) ----------------------------- * Documentation updated. Release 1.5.2 (July 6, 2021) ---------------------------- * Resolved bugs regarding `keras` import. * Support for Python 3.9. Release 1.5.1 (April 10, 2021) ------------------------------ * Replaced TravisCI with CircleCI in the continuous integration pipeline. Release 1.5.0 (April 09, 2021) ------------------------------ * Removed support for Python 3.6. * Removed buggy BERT representations unit test. Release 1.4.8 (February 11, 2021) --------------------------------- * Updated requirements for `scipy` for Python 3.7 or above. Release 1.4.7 (January 11, 2021) -------------------------------- * Updated version of `transformers` in `requirement.txt`; * Updated BERT encoder for the change of implementation; * Fixed unit tests. Release 1.4.6 (January 3, 2021) ------------------------------- * Bug regarding Python 3.6 requirement for `scipy`. Release 1.4.5 (December 28, 2020) --------------------------------- * Bugs fixed about Python 2 to 3 updates, `filter` in `shorttext.metrics.embedfuzzy`. Release 1.4.4 (December 24, 2020) --------------------------------- * Bugs regarding ``; * Fixing bugs due to Python 3.6 restriction on `scipy`. Release 1.4.3 (November 10, 2020) --------------------------------- * Bugs about transformer-based model on different devices resolved. Release 1.4.2 (October 18, 2020) ---------------------------------- * Documentation requirements and PyUp configs cleaned up. Release 1.4.1 (September 23, 2020) ---------------------------------- * Documentation and codes cleaned up. Release 1.4.0 (September 2, 2020) --------------------------------- * Provided support BERT-based sentence and tokens embeddings; * Implemented support for BERTScores. Release 1.3.0 (July 23, 2020) ----------------------------- * Removed all dependencies on `PuLP`; all computations of word mover's distance (WMD) is performed using `SciPy`. Release 1.2.6 (June 20, 2020) ----------------------------- * Removed Python-2 codes (`urllib2`). Release 1.2.5 (May 20, 2020) ---------------------------- * Update on `gensim` package usage and requirements; * Removed some deprecated functions. Release 1.2.4 (May 13, 2020) ---------------------------- * Update on `scikit-learn` requirements to `>=0.23.0`. * Directly dependence on `joblib`; * Support for Python 3.8 added. Release 1.2.3 (April 28, 2020) ------------------------------ * PyUP scan implemented; * Support for Python 3.5 decommissioned. Release 1.2.2 (April 7, 2020) ----------------------------- * Removed dependence on `PyStemmer`, which is replaced by `snowballstemmer`. Release 1.2.1 (March 23, 2020) ------------------------------ * Added port number adjustability for word-embedding API; * Removal of Spacy dependency. Release 1.2.0 (March 21, 2020) ------------------------------ * API for word-embedding algorithm for one-time loading. Release 1.1.6 (December 1, 2019) -------------------------------- * Compatibility with TensorFlow 2.0.0. Release 1.1.5 (September 24, 2019) ---------------------------------- * Decommissioned GCP buckets; using data files stored in AWS S3 buckets. Release 1.1.4 (July 20, 2019) ----------------------------- * Minor bugs fixed. Release 1.1.3 (July 7, 2019) ---------------------------- * Updated codes for Console code loading; * Updated Travis CI script. Release 1.1.2 (June 5, 2019) ----------------------------- * Updated codes for Fasttext moddel loading as the previous function was deprecated. Release 1.1.1 (April 23, 2019) ------------------------------ * Bug fixed. (Acknowledgement: `Hamish Dickson `_ ) Release 1.1.0 (March 3, 2019) ----------------------------- * Size of embedded vectors set to 300 again when necessary; (possibly break compatibility) * Moving corpus data from Github to Google Cloud Storage. Release 1.0.8 (February 14, 2019) --------------------------------- * Minor bugs fixed. Release 1.0.7 (January 30, 2019) -------------------------------- * Compatibility with Python 3.7 with TensorFlow as the backend. Release 1.0.7 (January 30, 2019) -------------------------------- * Compatibility with Python 3.7 with Theano as the backend; * Minor documentation changes. Release 1.0.6 (January 29, 2019) -------------------------------- * Documentation change; * Word-embedding model used in unit test stored in Amazon S3 bucket. Release 1.0.5 (January 13, 2019) -------------------------------- * Minor versioning bug fixed. Release 1.0.4 (October 3, 2018) ------------------------------- * Package `keras` requirement updated; * Less dependence on `pandas`. Release 1.0.3 (August 6, 2018) ------------------------------ * Bugs regarding I/O of `SumEmbeddedVecClassifier`. Release 1.0.2 (July 24, 2018) ----------------------------- * Minor bugs regarding installation fixed. Release 1.0.1 (July 14, 2018) ----------------------------- * Minor bugs fixed. Release 1.0.0 (July 14, 2018) ----------------------------- * Python-3 compatibility; * Replacing the original stemmer to use Snowball; * Certain functions cythonized; * Various bugs fixed. Release 0.7.2 (June 18, 2018) ----------------------------- * Damerau-Levenshtein distance and longest common prefix implemented using Cython. Release 0.7.1 (May 30, 2018) ---------------------------- * Decorator replaced by base class `CompactIOMachine`; * API included in documentation. Release 0.7.0 (May 17, 2018) ---------------------------- * Spelling corrections and fuzzy logic; * More unit tests. Release 0.6.0 (February 27, 2018) --------------------------------- * Support of character-based sequence-to-sequence (seq2seq) models. Release 0.5.11 (January 19, 2018) --------------------------------- * Removal of word-embedding `keras`-type layers. Release 0.5.10 (January 15, 2018) --------------------------------- * Support of encoder module for character-based models; * Implementation of document-term matrix (DTM). Release 0.5.9 (December 14, 2017) --------------------------------- * Support of Poincare embedding; * Code optimization; * Script `ShortTextWord2VecSimilarity` updated to `ShortTextWordEmbedSimilarity`. Release 0.5.8 (November 8, 2017) -------------------------------- * Removed most explicit user-specification of `vecsize` for given word-embedding models; * Removed old namespace for topic models (no more backward compatibility). * Integration of [FastText]( Release 0.5.7 (October 27, 2017) -------------------------------- * Removed most explicit user-specification of `vecsize` for given word-embedding models; * Removed old namespace for topic models (hence no more backward compatibility). Release 0.5.6 (October 17, 2017) -------------------------------- * Updated the neural network framework due to the change in `gensim` API. Release 0.5.5 (September 28, 2017) ---------------------------------- * Script `ShortTextCategorizerConsole` updated. Release 0.5.4 (September 8, 2017) --------------------------------- * Bug fixed; * New scripts for finding distances between sentences; * Finding similarity between two sentences using Jaccard index. End of GSoC Program (September 2, 2017) --------------------------------------- Chinmaya summarized his GSoC program in his blog post posted in `RaRe Incubator `_. Release 0.5.1 (August 22, 2017) ------------------------------- * Implementation of Damerau-Levenshtein distance and soft Jaccard score; * Implementation of Word Mover's distance. Release 0.4.1 (July 28, 2017) ----------------------------- * Further Travis.CI update tests; * Model file I/O updated (for huge models); * Migrating documentation to [](; previous documentation at `` destroyed. Release 0.4.0 (July 26, 2017) ----------------------------- * Maximum entropy models; * Use of `gensim` Word2Vec `keras` layers; * Incorporating new features from `gensim`; * Use of Travis.CI for pull request testing. Release 0.3.8 (June 16, 2017) ----------------------------- * Bug fixed on `sumvecframeworks`. Release 0.3.7 (June 12, 2017) ----------------------------- * Bug fixed on `VarNNSumEmbedVecClassifier`. Release 0.3.6 (June 2, 2017) ---------------------------- * Added deprecation decorator; * Fixed path configurations; * Added "update" corpus capability to `gensim` models. Google Summer of Code (May 30, 2017) ------------------------------------ Chinamaya Pancholi, a Google Summer of Code (GSoC) student, is involved in the open-source development of `gensim`, that his project will be very related to the `shorttext` package. More information can be found in his first `blog entry `_ . Release 0.3.5 (May 16, 2017) ---------------------------- * Refactoring topic modeling to generators subpackage, but keeping package backward compatible. * Added Inaugural Addresses as an example training data; * Fixed bugs about package paths. Release 0.3.4 (Apr 27, 2017) ---------------------------- * Fixed relative path loading problems. Release 0.3.3 (Apr 19, 2017) ---------------------------- * Deleted `CNNEmbedVecClassifier`; * Added script `ShortTextWord2VecSimilarity`. `More Info `_ Release 0.3.2 (Mar 28, 2017) ---------------------------- * Bug fixed for `gensim` model I/O; * Console scripts update; * Neural networks up to Keras 2 standard (refer to `this `_ ). Release 0.3.1 (Mar 14, 2017) ---------------------------- * Compact model I/O: all models are in single files; * Implementation of stacked generalization using logistic regression. Release 0.2.1 (Feb 23, 2017) ---------------------------- * Removal attempts of loading GloVe model, as it can be run using `gensim` script; * Confirmed compatibility of the package with `tensorflow`; * Use of `spacy` for tokenization, instead of `nltk`; * Use of `stemming` for Porter stemmer, instead of `nltk`; * Removal of `nltk` dependencies; * Simplifying the directory and module structures; * Module packages updated. `More Info `_ Release 0.2.0 (Dec 21, 2016) ---------------------------- Home: :doc:`index`