Somewhere in this extract of an article by John Paul Rathbone lurks a haiku. Not placed there deliberately, it, and others like it, have been sitting in plain sight, unrecognised for what they are.
Lastly, Mr Temer needs to build a Congressional majority. Although Mr Temer is a wily negotiator, this is perhaps his hardest task — especially as the Petrobras corruption scandal has fractured Congress into myriad whirlpools of seething factions, not only between parties but within them too. Mr Temer has already had to shelve earlier plans to reduce the number of ministers because such appointments are a traditional way of dealing out pork and thus building coalitions.
… and here it is:
- has fractured Congress
into myriad whirlpools
of seething factions
There are plenty more such haiku: identified by computer algorithm, selected by a human, and brought to light in ft.com/hidden-haiku.
What is a haiku?
As Wikipedia describes, “A haiku in English is a very short poem in the English language, following to a greater or lesser extent the form and style of the Japanese haiku”, sometimes having “a three-line format with 17 syllables arranged in a 5–7–5 pattern”.
For the purposes of this project, we have concentrated more on finding this 5-7-5 syllable structure, and less on “a focus on some aspect of nature or the seasons” and the other, more subtle, criteria. That being said, some of the haiku we have found have come pleasingly close to being powerful pieces of prose in their own right.
Recognising haiku
Inspired by the lovely New York Times Haiku project, we conducted a series of mini-experiments, looking at our content in new ways, exploring a variety of other avenues for the manipulation of search results and the automated identification and manipulation of ‘accidental’ poetry*. Along the way, perhaps inevitably, the tool we built was easily tweaked to become a haiku detector, and it seemed wrong not to point it at the FT articles to see what we could find. A conversation with Jacob Harris (@harrisj), formerly of the NYT, concluded that we’ve ended up in a similar haiku place.
The code for our haiku detector is available but comes with plenty of caveats, chief among which is that it was My First Golang Program (™). The tool itself is live but, at the time of writing, the part involving haiku is restricted to FT Staff accounts.
On startup, the haiku detector reads in the 134K defined words from the enormous and enormously useful phoneme dataset, CMU Pronouncing Dictionary, and some extra items added as part of this project (more details below).
An example defined word is: HAIKU: HH AY1 K UW0. There are two numbered phonemes, AY1 and UW0, so the word has two syllables, with the primary emphasis on the phoneme marked with a 1. Longer words can have a syllable with a secondary emphasis, e.g. DEFENESTRATION: D IY0 F EH2 N EH0 S T R EY1 SH AH0 N.
A word’s phonemes are then mapped to a string of emphasis points. In the case of the example word HAIKU, this would be “10”, indicating this is a two syllable word with emphasis on the first syllable.
The extra items added to the dictionary include
- straightforward definitions, such as BITCOIN and EUROSCEPTIC, which were missing from the original
- alternate (mis)spellings, such as CRITICISED → CRITICIZED
and extra functionality
- regexes for word boundaries, such as /\w+[‘’][sStdmM]/
- regexes for transforming awkward text, such as apostrophes, /(\w+)’([sStTdDmM])/$1’$2/
- marking certain words as being unsuited to terminating a phrase in poetry, such as AND and BUT
The user specifies the structure of the haiku, “….. ……. …..”, representing contiguous blocks of 5+7+5 syllables, with no particular required emphasis, but an individual word cannot be split across two blocks of syllables. This structure, aka meter, is turned into a regular expression to be used to match against concatenations of strings of emphasis points.
The user specifies some articles, such as those currently on the FT.com homepage (usually 15), or the most recently published articles, or Lucy Kellaway’s latest thoughts. The content of each article is pulled in via our Search and Content APIs, and split into words. Each word is looked up in the dictionary to obtain its emphasis string, which defaults to “?” if there is no matching definition and ensures this word will not be a candidate to match the meter regex. The emphasis strings for all the words in the article are concatenated into a space-separated string and the meter regex is applied to look for matching sequences of emphasis points. The matches are unpacked to give the original article text.
The haiku detector lists all the specified articles, and for each one lists the fragments of text which match the haiku meter and do not end with unsuitable words, presented in such a way as to make it easy for the user to visually scan through large numbers of them.
The ‘unsuitable’ haikus are listed at the end, along with all the unrecognised words, such as (at the time of writing) BREWDOG and WHISTLEBLOWING.
Numbers, dates, percentages, etc, could be converted into text, and then be candidates to match within a haiku, but that has been left as an exercise for (maybe, but probably not) later.
Choosing haiku
There’s no subtle way of saying it other than, you have to wade through an awful lot of nonsense to get to the good ones. No concrete stats yet, but a haiku hit rate of 1 in a 100 seems about right, i.e. 1 “Hm, maybe” to 100 “No”s.
Returning to John Paul Rathbone’s article, here are all the matching, suitable haiku:
week is a supposed procedural glitch announced on Monday that could |
the lower house will decide any differently than it did last month |
faces four daunting challenges although even these are not unique |
rule Mauricio Macri Argentina’s new president faces |
they or candidates of a similar stature can inject a dose |
private Brazilian companies can currently pay to their partners |
Congress including the head of the lower house Eduardo Cunha |
scandal has fractured Congress into myriad whirlpools of seething |
has fractured Congress into myriad whirlpools of seething factions |
into myriad whirlpools of seething factions not only between |
whirlpools of seething factions not only between parties but within |
of seething factions not only between parties but within them too |
here is that in both countries most citizens care little for party |
reasonably well run and are willing to give fresh leaders a chance |
run and are willing to give fresh leaders a chance to do that at least |
Not shown here, the (perhaps double the number of) matching ‘unsuitable’ haiku.
The selection of haiku from this list of candidates is highly subjective, and done by eye, at speed. No doubt there are still some nice haiku which remain hidden even after this scan.
Categorising haiku
It is early days, but over the course of 3 months we have accumulated approximately 300 haiku which pass the “Hm, maybe” test. There are enough to attempt to categorise them into different types, ranging from the mundane to the really rather profound. Here are some of the (overlapping) categories:
- Cropping changing meaning
- approval ratings
have sunk to new lows prompting
calls for a party
- approval ratings
- Humour
- The company was
almost forced out of business
due to collapsing
- The company was
- Gnomic/profound
- with them to find out
if they will be good or bad
for humanity
- with them to find out
- Imagery
- the two main parties
taking turns to rip themselves
apart in public
- the two main parties
- Reportage
- toilet was destroyed
in a controlled explosion
by army experts
- toilet was destroyed
- Very FT
- Idle capital
is an opportunity
cost for the system
- Idle capital
Publishing haiku
We are taking some small, early steps to establish if there is interest among our readers (existing or prospective) in having these haiku brought to their attention, with a weekly collection being published as an article: ft.com/hidden-haiku. These will be tweeted and posted onto Facebook.
Perhaps we will provide an option for our readers to get all their news in haiku form. You heard it here first.
* Awful poetry
Albeit not used in this haiku project, having the details of the syllable emphasis within each word plus the final syllable (e.g. K UW0, sounds like “coo”) gives the raw material from which to auto-generate cringingly awful, metered, rhyming poetry. The user can specify other meters, such as iambic pentameter, “0101010101”, and resulting matches are sorted by final syllable.
The reader is spared some examples here, and may have to wait for a followup post.