Question: In the game of Anigrams, you unscramble successively larger, nested collections of letters to create a valid “chain” of six English words between four and nine letters in length.
For example, a chain of five words (sadly, less than the six needed for a valid game of Anigrams) can be constructed using the following sequence, with each term after the first including one additional letter than the previous term:
DEIR (which unscrambles to make the words DIRE, IRED or RIDE) DEIRD (DRIED or REDID) DEIRDL (DIRLED, DREIDL or RIDDLE) DEIRDLR (RIDDLER) DEIRDLRS (RIDDLERS) What is the longest chain of such nested anagrams you can create, starting with four letters?
For specificity, all valid words must come from Peter Norvig’s word list (a list we’ve used previously here at The Riddler).
Extra credit: How many possible games of Anigrams games are there? That is, how many valid sets are there of four initial letters, and then five more letters added one at a time in an ordered sequence, that result in a sequence of valid anagrams? (Note: Swapping the order of the first four letters does not result in a distinct game.)
The sketch shows a winning sequence of plays for a game of anigrams on the left, along with the underlying game on the right. For any given round, there may be multiple words that can be formed from the letter set.
To build a valid game of anigrams, we can pick a set of letters from a $4$-letter word, then add letters one at a time such that, at every stage, one or more real words can be formed from the letters.
This suggests an efficient way to build anigram games.
- turn all $4$-letter words into barcodes and add them to a set
- go through each $5$-letter word, remove one letter at a time to form the barcodes that could have preceded it, and check if any of them are already in the set of
good_barcodes. If at least one is, barcode the $5$-letter word and add it to the list of
- repeat the process for all $6$-letter words, $7$-letter words, and so on.
This ensures that all barcodes in the set have an ancestor in the set and therefore could be in a game of anigrams.
At the end of this process, the longest barcode(s) in
good_barcodes will be the anigram game(s) we’re looking for.
But knowing that the game exists is one thing — we’d also like to efficiently recreate it. To do this, we’ll log a precursor word for each of a word’s valid precursor-barcodes in the map
To start, we need to write a function to turn any word into the corresponding barcode
def word_to_barcode(word): return "".join(sorted(word))
and initialize the set
good_barcodes with the barcodes
# list of words of length 4 initial_words = [word for word in words if len(word) == 4] # keep track of which words have sub-barcodes good_barcodes = set(word_to_barcode(word) for word in initial_words)
To convert from barcodes back to words, we make a map from barcodes to words
from typing_extensions import DefaultDict barcode_to_word = DefaultDict() for word in words: barcode_to_word[word_to_barcode(word)] = word
Now, we build the set
good_barcodes with a linear pass over
# store precursor words precursors = DefaultDict(lambda: set()) for word in words: # make potential precursor barcodes for each word by removing each letter precursor_barcodes = [word_to_barcode(word[0:i-1] + word[i:]) for i in range(1, len(word)+1)] # loop over the potential precursor-barcodes to see if any already exist in good_barcodes # if so, add current barcode to good_barcodes, log a sample precursor-word in precursors for precursor_barcode in precursor_barcodes: if precursor_barcode in good_barcodes: good_barcodes.add(word_to_barcode(word)) precursors[word].add(barcode_to_word[precursor_barcode])
With this in hand, we can find the barcodes of maximum length
longest = max(good_barcodes, key=lambda x: len(x)) longest_barcodes = [bc for bc in good_barcodes if len(bc) == len(longest)] longest_words = [w for w in words if word_to_barcode(w) in longest_barcodes] list(zip(longest_barcodes, longest_words))
[('adeeiiimnnnorstt', 'indeterminations'), ('adeeiimnnorssttu', 'underestimations')]
So, there are two longest anigram games, terminating in the $16$-letter words
We can walk backwards through
precursors to resurrect sample gameplay.
indeterminations we find
['indeterminations', 'intermediations', 'determinations', 'antimodernist', 'terminations', 'nitrosamine', 'antinomies', 'nominates', 'sonatine', 'stanine', 'anenst', 'anent', 'neat']
['underestimations', 'underestimation', 'determinations', 'antimodernist', 'terminations', 'nitrosamine', 'antinomies', 'nominates', 'sonatine', 'stanine', 'anenst', 'anent', 'neat']
The games are identical up until the second to last word, where adding a
u or an
i is the determining step. At the outer limits of the game, we should expect exceptional sequences of mutations. Perhaps it’s not so surprising that the two longest games have the same origin.
We can use the
precursors map to count the number of valid anigram games starting from a given terminal word. We just recurse through
precursors and add up the number of possible paths, returning
1 when we reach a $4$-letter word. This is the proper counting because
precursors only stores one representative precursor word for each precursor barcode (e.g. it doesn’t log both
from functools import lru_cache @lru_cache(maxsize=1_000_000) def count_anigrams(word): # count the number of anigram paths for a given word if len(word) == 4: return 1 else: return sum(count_anigrams(p) for p in precursors[word])
Summing over all possible terminal words at each stage, there are $4,510,515$ possible regulation games of anigrams, and the number of distinct games hits a max around terminal words of length $11$ or $12$.
For the two longest games (ending in
underestimations), the possibilities split up into $1,071$ possibilities apiece.
Since there are no games of length $17,$ there can’t be games of length greater than $17,$ and $16$ is the maximum.