Wordcloud

Angelegt Montag 11 September 2023




Eine Wordcloud ist ein Bild, das dadurch entsteht, dass die Worte eines vorliegenden Textes so in eine vorgegebene Form eingepasst werden, dass die häufigsten Worte des Textes am prominentesten erscheinen. Dies ist in gewisser Weise eine Inhaltsangabe des Textes in Bildform.


Unter Linux gibt es die Software python3-wordcloud die sich in der Anwendungsverwaltung/Paketmanager installieren lässt. Das Paket ist durchaus umfangreich und benötigt ca. 2 GB Plattenplatz.


Um das Bild oben aus der man-pages von wordcloud zu erstellen bin ich wie folgt vorgegangen:


$ man wordcloud_cli > input.txt
$ wordcloud_cli --text input.txt --imagefile wordcloud.png --width 1024 --height 768


Es gibt eine Vielzahl weiterer Parameter, die es gestatten das entstehende Bild anzupassen:



$ wordcloud_cli -h
usage: wordcloud_cli [-h] [--text file] [--regexp regexp] [--stopwords file] [--imagefile file] [--fontfile path] [--mask file] [--colormask file] [--contour_width width] [--contour_color color]
[--relative_scaling rs] [--margin width] [--width width] [--height height] [--color color] [--background color] [--no_collocations] [--include_numbers] [--min_word_length min_word_length]
[--prefer_horizontal ratio] [--scale scale] [--colormap map] [--mode mode] [--max_words N] [--min_font_size size] [--max_font_size size] [--font_step step] [--random_state seed]
[--no_normalize_plurals] [--repeat] [--version]


A simple command line interface for wordcloud module.


optional arguments:
-h, --help show this help message and exit
--text file specify file of words to build the word cloud (default: stdin)
--regexp regexp override the regular expression defining what constitutes a word
--stopwords file specify file of stopwords (containing one word per line) to remove from the given text after parsing
--imagefile file file the completed PNG image should be written to (default: stdout)
--fontfile path path to font file you wish to use (default: DroidSansMono)
--mask file mask to use for the image form
--colormask file color mask to use for image coloring
--contour_width width
if greater than 0, draw mask contour (default: 0)
--contour_color color
use given color as mask contour color - accepts any value from PIL.ImageColor.getcolor
--relative_scaling rs
scaling of words by frequency (0 - 1)
--margin width spacing to leave around words
--width width define output image width
--height height define output image height
--color color use given color as coloring for the image - accepts any value from PIL.ImageColor.getcolor
--background color use given color as background color for the image - accepts any value from PIL.ImageColor.getcolor
--no_collocations do not add collocations (bigrams) to word cloud (default: add unigrams and bigrams)
--include_numbers include numbers in wordcloud?
--min_word_length min_word_length
only include words with more than X letters
--prefer_horizontal ratio
ratio of times to try horizontal fitting as opposed to vertical
--scale scale scaling between computation and drawing
--colormap map matplotlib colormap name
--mode mode use RGB or RGBA for transparent background
--max_words N maximum number of words
--min_font_size size smallest font size to use
--max_font_size size maximum font size for the largest word
--font_step step step size for the font
--random_state seed random seed
--no_normalize_plurals
whether to remove trailing 's' from words
--repeat whether to repeat words and phrases
--version show program's version number and exit


viel Spaß!