Project: Generator

The Story

I'm a gamer; I often play role-playing games with my friends. Role-playing requires only a good imagination, but sometimes you get stuck in a rut describing the same old things in the same old ways. Random generators can help alleviate this rut and spice up your descriptions. For example:

After a long and ferocious battle, you have finally slain the dragon. In its treasure hoard is a pile of gems and a longsword +1.

Sure, that's cool and all, but it's not memorable. Try this on for size:

After a long and ferocious battle, you have finally slain the dragon. In its treasure hoard is a pile of gems and the legendary gnome-steel longsword named "Virtual Knife", forged in the great heat of a volcano by the legendary gnomesmith Zavurp Mountaincranker.

There are many generators on the Internet, but I was not satisfied with the quality of results from the vast majority of generators I tried. At best, the generators didn't have enough variety in their word banks and results were repetitive; at worst, they were grammatically incorrect and made no sense. Therefore, the logical solution is to write my own generator script.

I have also developed an API to interact with the generator via an IRC bot.

All the Features

Generator Class

This is the master class where all the actual generating, formatting, and grammar-checking gets done.

Patterns are given in the format [pattern]:

  • My shoes are [color].
  • My shoes are [adjective].
  • I have [roll:1d20] pairs of shoes.

These are basic patterns, of course, and they can get much more complex. The driving force behind every pattern is an associated list of words, phrases, or even other patterns (see "Recursive Pattern-Matching" below) that can be a potential result. For example, the [color] pattern might have a list that looks like this:

[color] red orange yellow green blue indigo violet black white ... etc

Therefore, whenever [color] is called, the generator will look through the database for all patterns matching the "color" key and then randomly pick one.

Recursive pattern-matching

The list of words/phrases associated with a particular pattern need not be restricted to plaintext. Patterns can contain other patterns, and the generator will just recurse until it has a result.

For example, let's say we have the following wordlists defined:

[color] red orange yellow green blue purple black white
[type-of-shoes] a pair of [color] shoes one [color] shoe and one [color] shoe lost my shoes

and the master pattern is:

I have [type-of-shoes].

Example generated results could be:

  • I have a pair of red shoes.
  • I have lost my shoes.
  • I have one green shoe and one orange shoe.
  • I have a pair of black shoes.
  • I have a pair of purple shoes.
  • I have one white shoe and one yellow shoe.
  • I have lost my shoes.

Even given this simplistic example, there are 73 total possible unique results for what color your shoes are (if you even have shoes at all...). Pretty neat, huh?

The downside to truly random generation is that you can end up with the same generated result for any pattern twice. For example, you could generate I have one red shoe and one red shoe. That's simply the nature of the beast, but with a large enough list of words for every pattern, duplication will be minimized as much as possible.

Strong English Syntax Parsing

Patterns can have modifiers to indicate to the tokenizer that the pattern should be modified to result in a more natural phrasing.

Examples:

[Pattern] Capitalizes the result
[pattern]s Pluralizes the result
[pattern]ed Returns the result in its participle form
[pattern]ing Returns the result in its gerund form
[pattern]er Returns the result in its adjective form
[pattern] + [pattern] Concatenates the results of both patterns

Checks are made with every result to ensure conformation with proper English syntax. For example, the adjectival of "bright" is "brighter", while the adjectival of "happy" is "happier" (not "happyier").

Custom Pattern Handling

Fantasy games have monsters, but sometimes you only want a random beast of burden instead of a dragon. Instead of creating a list of patterns of every possible permutation of a given database of monsters (for example, the monsters listed in the d20 SRD), which wouldn't be very extensible in case you decided later to add more monsters, I just created a database accessible via a specific syntax.

Custom patterns follow a query = value syntax that can be stacked to get a more complex result, just like URL query parameters. For example, to get a random undead monster, the pattern could be:

[monster|type=undead]

In Dungeons & Dragons, undead monsters can range from level 1 mooks (Skeletons) to über-powerful level 20+ creatures (the various Nightshades) that are practically gods in their own right. If you want to limit the pattern so that you don't get ridiculously powerful creatures, you could modify it like so:

[monster|type=undead:cr_max=5]

This pattern lets you limit the maximum level (5) of the specified monster type (undead).

Other special patterns currently defined are:

  • Critter bits - not all monsters have the same anatomy. A boar would have a hide, while humans have skin, and a fantastical creature like a will-'o-the-wisp has neither hide nor skin. I wrote this feature after a very early attempt at a potion ingredients generator that resulted in "shimmering werewolf wings". (Werewolves don't have wings, and even if they did, they wouldn't shimmer.) Oops.
  • Dicebot - a powerful dice-roller that can handle any number of sides of virtual dice, just like real-life polyhedral dice. In addition to returning decimal numbers, the dicebot can also format the result in words, Roman numerals, ordinals, and even ordinal words.

(Almost) Guaranteeing Proper English Syntax

Irregular Verbs Database

I created a database of approximately 650 irregular verbs. Each entry has the following:

  • base verb
  • past simple tense
  • participle
  • singular
  • gerund

This database is checked first whenever special pattern modifiers are indicated because irregular verbs just can't be converted on the fly. Regular verbs are much more consistent.

See an excerpt from the database
See an excerpt from the database

CMU Dictionary Database for Fixing Indefinite Articles

After all patterns are finished generating, the generator performs one last check to search the final result and fix any indefinite articles it finds. For example, this pattern:

I have a [adjective] umbrella.

might result in:

I have a ugly umbrella.

which would then would be changed to:

I have an ugly umbrella.

The generator makes extensive use of the Carnegie Mellon University Pronouncing Dictionary, which is an open-source plaintext pronunciation guide with 39 phonemes for many English and common foreign words. I downloaded their huge list of words and wrote a script to put everything into a MySQL table for quick lookup.

Vowels, particularly in English, are pretty tricky to associate with the correct indefinite article by code. With the CMU database, I can look up a target word, determine if it begins with one of the five vowel phonemes (six, in the rare case of y-as-a-vowel) and return the correct indefinite article.

It's the little things in life, even if it's just one letter, that can make all the difference.

API

( view API )

The API is just that, an accessible interface to the generator. You can use cURL or any other URL fetcher to send any whitelisted pattern and receive a plaintext generated result back.