textnets: A Python package for text analysis with networks

Social scientists increasingly rely on computational tools to make sense of vasts amounts of unstructured data generated in the wake of the ever-expanding digitization of social life. Electronic text, in particular, is a growing area of interest thanks to the social and cultural insights lurking in social media posts, digitized corpora, and web content, among other troves (Evans & Aceves, 2016; Ignatow, 2015).


Background
Social scientists increasingly rely on computational tools to make sense of vasts amounts of unstructured data generated in the wake of the ever-expanding digitization of social life. Electronic text, in particular, is a growing area of interest thanks to the social and cultural insights lurking in social media posts, digitized corpora, and web content, among other troves (Evans & Aceves, 2016;Ignatow, 2015).
This package aims to fill that need. textnets represents collections of texts as networks of documents and words, which provides powerful possibilities for the visualization and analysis of texts.
The package can operate on the bipartite network containing both document and word nodes. Figure 1 shows an example of a visualization created by textnets. The underlying corpus is a collection of statements by U.S. Senators following the conclusion of the impeachment trial against the president in February 2020. Documents appear as triangles (representing the Senators who issued the statements), and words appear as yellow squares.
textnets can also project one-mode networks containing only document or word nodes, and it contains tools to analyze them. For instance, it can visualize a backbone graph with nodes scaled by various centrality measures. For networks with a clear community structure, it can also output lists of nodes grouped by cluster as identified by a community detection algorithm. This can help identify latent themes in corpus texts (Gerlach, Peixoto, & Altmann, 2018).
Another implementation of the textnets technique exists in the R programming language by its originator (Bail, 2016); it can be found at https://github.com/cbail/textnets. Featurewise, the two implementations are roughly on par. This implementation in Python features a modular design, which is meant to improve ergonomics for users and potential contributors alike. This package aims to make text analysis techniques accessible to a broader range of researchers and students. Particularly for use in the classroom, textnets aims at seamless integration with the Jupyter ecosystem (Kluyver et al., 2016).
textnets is well documented: its API reference, contribution guidelines, and a comprehensive tutorial can be found at https://textnets.readthedocs.io. For easy installation, the package is included in conda-forge and the Python Package Index. Its code repository and issue tracker are currently hosted on GitHub at https://github.com/jboynyc/textnets. A test suite is run using Travis, a continuous integration service, before new releases are published to avoid regressions from one version to another. Archived versions of releases are available at doi:10.5281/zenodo.3866676.

Statement of Need
With textnets it is possible to visualize and analyze textual data in novel ways. These are some of the package's distinguishing features: • Existing text analysis packages, such as Benoit et al. (2018), typically visualize texts as word clouds, not as network graphs. Unlike word clouds, network graphs can visualize not just the frequency and co-occurrence of text features, but also their linking role within corpora. • The discovery of topics is typically performed using latent Dirichlet allocation (LDA), while textnets uses community detection on the term graph for that purpose. Unlike topic modeling using LDA, this does not require specifying a fixed number of topics. • textnets can also cluster documents using community detection on the document graph. This can serve as an alternative to techniques like k-means clustering.