wordgenerator.py v1.2

This is a small project by Gareth Latty (Lattyware).

A generator for word-like strings that follow the 'feel' of a given input language.

It's useful for, for example, naming procedurally generated content in a way that avoids real words, but remains easy to pronounce.

Example output:

Here is some example output with the default british-english output:

./wordgenerator.py british-english -n 20 --min 5 --max 15

fanglas
freering
oxyger
jubbensetrier
amenet
moquiets
mystilaxation
consutey
untive
masestes
curchers
anchottollon
filles
symborse
prasting
protithest
weeloats
dupliquding
autobency
proscolicends

Using a dictionary of italian words, for example, produces italian-sounding words:

./wordgenerator.py italian -n 10 --min 5 --max 15

impiate
aliersi
inaudartererai
ottardiscrerei
addoluccio
deredicassella
coibinarei
impresto
accreste
storano

Warning: This library does no checking on the produced words. They may be real words, and there is the possibility of it generating profanity. If this is unacceptable, then filter the output of this script, or pre-generate a set of words to use which you check. You have been warned.

Requirements:

Python - now works with 2.x and 3.x!

Get it:

Download:

Release: wordgenerator 1.2.0, or clone the github project.

Usage:

This program functions both as a library and a command line application.

You will need a dictionary to use this program - a newline delimited list of words to use as a source. Surprisingly good results can be obtained from very small input dictionaries, but best results will be gained from a good, varied selection from the language you wish to emulate. Linux users will often find good samples in /usr/share/dict. british-english - distributed with this, is taken from Arch Linux's words package, and the associated wbritish.copyright is taken from the same package and provides information on copyright with regards to that file.

Saving as JSON means that you don't need to do the expensive parsing of the dictionary again, and with large dictionaries will produce smaller files. If you intend to use the generator with the same input dictionary multiple times, doing this is highly reccomended.

As a command line application:

For usage as a command line application, see the below explanation of arguments:


usage: wordgenerator.py [-h] [-w BOOL] [-n N] [--min N] [-m N] [-s FILE] [-o]
                        [-l FILE] [--version]
                        [FILE]

positional arguments:
FILE                  The path to a dictionary file for a language - a list
                      of newline separated words. (default: read from
                      standard input)

optional arguments:
-h, --help            show this help message and exit
-w BOOL, --weighted BOOL
                      If true, a common segment in the language ismore
                      likely to show up in an output word. (default: True)
-n N, --number N      The number of words to generate. (default: 1)
--min N               The minimum length of words to generate. (default: 0)
-m N, --max N         The rough maximum length of words to generate.
                      (default: 14)
-s FILE, --save FILE  Save the library to disc as JSON data. When saving,
                      other operations will be ignored.
-o, --output          Save the library, sending output to the standard
                      output.
-l FILE, --load FILE  Load the library from JSON data on disc.
--version             show program's version number and exit

As a library:

Example usage:

from wordgenerator import WordGenerator

generator = WordGenerator("british-english")
for word in generator.generate((10, 15), 10): #Generate 10 words of length 10-15.
	print(word)

with open("british-english.json", 'w') as file:
	generator.save(file) #Save the dictionary to a JSON file for quick usage later.

from wordgenerator import WordGenerator
from itertools import islice

generator = WordGenerator()
with open("british-english.json", 'r') as file:
	generator.load(file) #Load from JSON.

for word in islice(generator, 25): #Get 25 words.
	print(word)

Authors:

Gareth Latty gareth@lattyware.co.uk

Licence:

This script is provided under the GPLv3 licence, see LICENCE or http://www.gnu.org/licenses/ for more.