BLOG

Words swords words.

We got a FORGE 2022 Torchie Award!

Cheers to the Blinkist Signal, the GPT-3-driven editorial project I’ve been working on for the better part of 2022 with my colleagues from the Data Science & Business Insights and CRM team at Blinkist! The Signal just won a FORGE 2022 Torchie Award for Best Data Application of the Year!

The Signal aims to take current news events, contextualise them to Blinks of nonfiction titles that can provide more information, and serve up these insights directly to Blinkist members’ devices via push notifications. We scraped current news events, trained our internal GPT-3 model to write short notifications about them and an algorithm match a Blink relevant to the topic. The raw copy the GPT-3 generated wasn’t wrong, but it left a lot to be desired—and for anyone with knowledge about the corpus of source data (aka the Internet), it was easy to see that the model was regurgitating phrasing from both written news media and transcripts of radio and broadcast news items. Not awful, but also not ideal for creating the shortest and best-quality path to a good push notification—and in the Blinkist tone of voice, to boot.

To create a model for better quality, more variable phrasing fitting our brand voice, I worked with the team to provide linguistic parameters around channel-appropriate phrasing, repeatable substrates vs liquid/novel substrates, and tone-calibration. Key questions to guide the process included:

Is this characteristic of spoken speech, written speech, or both?
Push notifications are ideally a mix of both. Written speech to optimise for visual scanning patterns and information procession, but spoken speech to shorten the pathway to emotional identification and mode of processing specific to content on today’s devices.

How can we linguistically quantify the twist in the lede that provides the curiosity factor that the Signal hinges on?
This was perhaps the toughest question to answer since one could easily say that it’s a case by case basis. However, by analysing a sample of new stories over a longer period we were able to perceive trends in what’s commonly covered in our corpus sources vs what intersects with our library, the relative volume of titles in any one category in the library, and finally the “closeness” of connection of the source item vs what is returned as the best-fit title. From there we can create a corpus of phrasings with associated composite values that help to better select for curiosity-based phrasing.

The Signal is still running and we’re trying to determine what its overall scalability, impact, and potential for optimisation is. More to come!