Debateworthy

by FT Labs,

“Most commented” on FT.com uses a proprietary index from Livefyre, which combines the number of comments with the age of the comments to produce a top ten list of articles. This fails to surface content on which renewed discussion is taking place after a long hiatus, or distinguish between genuinely insightful comments and those which are simply flame wars. Could we make an algorithm that does it better?

Most commented slot on FT.com

Working with Lisa Pollack, editorial stakeholder, on a timeboxed, 3 week project, we used heuristic indicators to try and arrive at a more complete picture of where the most interesting discussion is taking place across all FT content, with sufficient confidence that we can alert our twitter followers to especially interesting discussions that they may want to get involved in – threads that are ‘debateworthy’.

The outcome is a working app which serves an RSS feed of ‘debate-worthy’ articles and a test page to compare the feed with the Livefyre-based most commented, also exposing our algorithm’s config params for dynamic adjustment.

Comparison of Livefyre and Debateworthy

Choice of heuristics

The indicators we chose to use represented a compromise between the data we could easily get hold of in near-real-time and the effectiveness of that data in producing good results. We ended up using the following indicators:

  • A reputation score for each commenter based on the number of likes on their prior comments (with likes from FT staff counting for more), aggressively discounted for each comment that has been removed by our moderation team
  • A score for the thread which we call weightedNumberOfCommenters, equal to the sum of all the commenter reputation scores for all the comments in the thread within a configurable time window
  • A second thread score, weightedNumberOfInteractions which is the total number of likes on the comments, weighted by the time proximity to their parent which falls off as a linear inverse function
  • Finally, commentersPerCollection is calculated as the lower boundary of the wilson interval between numberOfCommenters and numberOfComments. This value represents the amount of back and forth between the commenters. We want lots of discussion so we optimise for a lower value of this.

Evaluating results

To evaluate how successful Debateworthy is, we created a page that shows two ‘most commented’-style lists side by side – one powered by Livefyre’s heat index, and the second from our Debateworthy algorithm. There turned out to be too many ‘tells’ to make this a truly blind test, but the comparison proved helpful in showing where the algorithm could be improved, which allowed us to iterate on it in the final week.

Feedback:

“As a concept, I think this is brilliant and a number of editorial people are excited about it. I think it will take a bit of time to get the calibration right, and to get to know the service, e.g. is it good in the morning or stronger in the evening? How reliable is it?”

Resources

This project is not open source so the following links will work only for FT staff