Introduction

This package shorttext is a Python package that facilitates supervised and unsupervised learning for short text categorization. Due to the sparseness of words and the lack of information carried in the short texts themselves, an intermediate representation of the texts and documents are needed before they are put into any classification algorithm. In this package, it facilitates various types of these representations, including topic modeling and word-embedding algorithms.

The package shorttext runs on Python 3.8, 3.9, 3.10, and 3.11.

Characteristics:

Before release 0.7.2, part of the package was implemented using C, and it is interfaced to Python using SWIG (Simplified Wrapper and Interface Generator). Since 1.0.0, these implementations were replaced with Cython.

Author: Kwan Yuet Stephen Ho (LinkedIn, ResearchGate, Twitter)

Home: Homepage of shorttext