Why Book Publishing Seeks Artificial Intelligence

by Holly Payne
as published on Huffington Post

Six years ago, I finished a book that I believed was my best work to date. I had written three books, published in 11 countries by big publishers, and was hoping Damascena would be my breakout novel—but after submitting it to agents for more than a year, it was rejected.

I was confused until one of the most respected agents told me, “We love the story and the writing, but we don’t know how to sell it.” I didn’t know how to respond.

After much probing, the agent finally told me that the problem wasn’t the writing. Or the story. The problem was the marketing. Marketing? My book didn’t fit into a single genre, making it impossible to find a readership, and this, from her perspective, was a huge problem.

Damascena, she explained, was a work of literary fiction but also historical, suspenseful, inspiring, haunting, and oh yeah, spiritual. Which was it? The agent’s comment gnawed at me for two years until it struck me: the value proposition for any book is the impact it has on its reader. Haunting, inspiring, suspenseful. The agent had listed all the experiences that Damascena offered to readers—and planted the seed for my tech startup named Booxby.

There had to be a better way for books to find their readership by analyzing the content itself and linking it to reader experience. Using machine learning—a branch of artificial intelligence, Booxby identifies, quantifies and predicts reader experience based on the text itself, representing the foundation for a radical new way to approach book discovery.

 

Publishing’s Biggest Problem

 

The number of books published annually in the U.S. has exploded by 400% in the last decade, according to Berrett-Koehler’s September 2016 report. With 1 million books published each year, publishers find it increasingly difficult to find readers for each book, making book discovery the publishing industry’s biggest problem.

Two waves of technological disruption drove this book discovery dilemma. First, Amazon disrupted procurement and distribution by opening the first online bookstore in 1995. Second, digital publishing disrupted production, making it easy for anyone to publish a book via ebook or POD (print on demand) with companies like Lightning Source, Smashwords, CreateSpace, Lulu and others offering easy and affordable ways for independent authors to produce a book.

Compounding the disruption was the closing of big chain bookstores, the shrinking size of newspaper book review sections (and in some cases, their total elimination) and the shifting center of the conversation around books with so many other forms of media to compete with the attention and time of consumers. Over time, these forces made book discovery a huge problem.

The numbers coming out of the indie author sector highlight the problem. According to a 2015 report from Bowker, the only U.S. company that sells ISBNs (a book’s identification number), the independent author community has grown 36% per year since 2008. Approximately 770,000 ISBNs were issued to independent authors in 2015, making their output ten times the size of traditional publishers.

The resulting signal-to-noise crisis has forced traditional publishers to “pulp” or destroy 25% of their inventory, according to The Latest Outrage Dec. 11, 2009 piece “Pulping is The Publishing Industry’s Dirty Little Secret.” This is a wasted opportunity, considering that roughly 3% of books make a return yet account for nearly $28 billion in annual sales.

The glut of books challenges consumers as well. With millions to choose from each year, how do readers find what’s right for them? Last year, Booxby, interviewed hundreds of avid readers in the Bay Area and polled 700 college-educated readers from around the country to learn how they discover books. They reeled off numerous sources: Amazon, Goodreads, best seller lists, podcasts, and bookseller and friend recommendations. While the time and effort they spent on these sources gave them a slew of options, very few suited their taste and mood (the top two criteria for buying a book, according to 84% of our respondents). These readers’ hit rate was abysmal, hovering around 1 in 25 books that met their needs.

Clearly, the problem of book discovery looms large for authors, publishers and readers. Authors and publishers lose money, and readers waste time trying to find what works for them. We simply don’t have the human resources to surmount the problem. This is where artificial intelligence can step in and provide meaningful analytics to inform acquisition decisions, understand a book’s full market potential, and create an effective mechanism to connect books to the readers who would most enjoy them—none of which currently exist. Through innovations in machine learning backed by a grant from the National Science Foundation, Booxby is addressing these challenges.

 

Current Solutions

 

“The publishing world is full of lore about what sells and what gets read, but precious little of the lore is informed by data and analysis,” writes Thomas Davenport, author of Only Humans Need Apply: Winners and Losers in the Age of Smart Machines in a 2014 article for the Harvard Business Review.

A lot has changed since 2014, and publishers are now beginning to invest in reader analytics. Companies like Jellybooks and Inkitt engage readers and then analyze the book experience by embedding tracking software into digital Advanced Reader Copies (ARCs). The tracking software is activated by roughly 300 focus group readers, who sign up to get free ebooks in exchange for providing information about their reading experience.

While the insights gained are helpful, the challenges with this process are time, scale and sampling bias. According to Andrew Rhomberg, the founder and CEO of Jellybooks, “the data collection is usually completed within 2-4 weeks.” This is time lost, and it would cost an inordinate amount of money to get every book analyzed pre-publication by the focus group readers. Beyond that, biases inevitably exist among the people who sign up as readers; and it’s uncertain if those readers mirror the optimal market for the books they are being asked to read.

On the consumer side, metadata and collaborative filtering have been the key techniques for solving book discovery. Unfortunately, neither offer personalized recommendations. The “customers who bought X also bought Y” analysis doesn’t understand why anyone choose X or Y. Of the avid readers we interviewed, none were satisfied with Amazon’s book recommendations—and most distrusted or ignored them.

Why isn’t purchase history helping? Because the algorithms are founded on Bayesian logic (using the past to predict future events). While helpful in many pursuits, this technique doesn’t generate recommendations specific to a reader’s varying taste and mood. Just because I purchased a guilty pleasure doesn’t mean I don’t want something more challenging for my next read. These algorithms do not generate recommendations relevant to what I want to read now.

Without understanding why consumers read a certain book, it is difficult to build an effective system that makes deeply personalized recommendations. While purchase history may help understand consumer behavior, it does not understand the product itself—in this case, the actual text. Machine learning makes it not only possible but probable that the next frontier for acquiring, positioning and marketing books will be informed by the analysis of the text itself. Know the product well to serve the consumer well.

 

Let A.I. Work for the Reader

 

As an author-led company, Booxby understands that a writer’s creation is nothing without the reader’s imagination. While no other medium is so dependent on the consumer for actualization, authors, editors and publishers have few mechanisms to understand the experience they are offering. Without that ability, acquiring, positioning, and marketing books become endeavors based on guts and guesswork. The result are books lumped into broad genres—neither helpful to book nor reader.

The agent rightly passed on my book Damascena because she was afraid to take on intellectual property that didn’t fit within the structure forced upon her. She liked my book, but couldn’t identify a market using the rudimentary tools at the publishing industry’s disposal. But what if she had unbiased data based on the text of the book to better understand how (and which) readers will respond?

The Morning Email

Wake up to the day's most important news.

Armed with this information, guesswork disappears, and the agent (or acquisition editor) can get a real sense of the book’s market potential. A.I. at Booxby isn’t about replacing human input, but augmenting it with data and a fresh perspective. “Data helps shape decisions. It does not make decisions,” writes Jellybook’s founder, Rhomberg, in the March 17, 2016, Digital Book World article “Data Vs. Instinct: The Publisher’s Dilemma.”

Booxby’s algorithms analyze a book to generate comps (comparable titles) and predict reader experience. Think of comps as a compass providing the most critical point of reference for publishers when positioning a book. If a publisher gets these comps wrong, their marketing will be directed at the wrong audience—drastically impacting R.O.I.

Currently, comps are determined by sales data, genre and human intuition. Booxby’s comps are blind to sales data and genre, helping a book find its true tribe beyond the limitations of genre. We use artificial intelligence to help each book find its rightful readers, so that both authors and publishers can make a return on their investment.

 

Nuts & Bolts of Booxby’s A.I.

 

Our solution to generating useful data to solve book discovery involves teaching computers how to read—not as a human would, but in a manner that creates a workable dataset.

Booxby applies natural language processing (NLP) and machine learning (ML) to understand the author’s unique style (which we refer to as Literary DNA) and then maps it onto the way a reader experiences that style. This allows us to create truly personalized recommendations based on the content’s effect on the reader, and it allows us to help publishers efficiently and effectively position and market a book so that each reaches its full market potential.

Natural language processing breaks human language into its component parts by quantifying readability, phonology (lyricism—the rhythmic quality of a voice), writing density and many others. It works by configuring feature vectors that measure each distinctive linguistic facet of a piece. At Booxby, we think of each vector as a genetic code representing some expression.

Machine learning, a branch of A.I., builds predictive models by studying patterns over a large training set. The larger the training set, the more accurate the prediction. In other words, to do this right, you’ve got to sample a lot of Literary DNA, then study its ‘code’ in order to understand and predict its expression. For example, a certain confluence of genes generates blue eyes, and a certain confluence of words and phrases equal suspenseful. Or haunting. Or beautiful.

While machines won’t be brought to tears, join a cause, or feel transported to another time and place, they have an extraordinary ability to recognize patterns by linking words on a page to the experience of readers themselves. The end result is match making. Connecting book to reader in order to facilitate a deeply satisfying, immersive experience for each person.

 

Not a Fad

 

Just a decade ago, nobody in the top publishing houses considered using A.I. Today, A.I. has changed the face of industry, and the publishing world recognizes it must keep pace. 59% of enterprises are transforming the way they work by utilizing data and analytics to “turn insight into action faster,” according to IBM. A.I. is pervasive across industries, helping companies operate, market, invest, stay secure, and manage people to name just a few applications.

The desire for data-driven innovation is taking hold in the publishing world and affecting every player in its ecosystem. The March 2017 Vol. 35. No. 3 issue of the Independent Book Publishing Association’s magazine, Independent, focuses solely on data-driven innovation. In the cover story, “Publishing by The Numbers,” Deb Vanasse writes, “today’s publishing innovators are reaching deep into the vast stores of data generated by 21st century technologies to make smarter, nimbler decisions.”

Judith Curr, the president and publisher of Atria Books, a division of Simon & Schuster (one of the Big Five publishers), is one of the publishing industry’s pioneers in the age of machine learning. Having led Atria for 15 years, Curr’s mission is to ‘break out authors.’ In a recent meeting, she told Booxby how she and Atria Books are adopting technology to create a customized, analytics dashboard to help position and market the imprint’s front list titles.

Booxby also met with various global offices of Hachette, another pioneer, and learned they, too, are actively vetting cutting edge technologies, including the application of artificial intelligence. Hachette launched its Global Innovation Project to do just this, sourcing innovations around the world that will bring positive and sustainable change to the publishing industry.

Not only is A.I is here to help publishers and editors be more efficient with their resources, it’s here to help authors evolve, and it’s here to connect readers with that perfect book. Far from limiting, replacing, or curtailing the creative process, A.I. driven technology and the pioneers who implement it will inform and form the connection between author, book and reader.

 

Democracy of A.I.

 

What if we could better inform the decisions going into acquisition, positioning and marketing with truly effective data? What if we created an algorithm that was blind to genre so that books could find their audience based solely on the experiences they offered their readers and the quality of their writing? What if A.I. helped to develop the voices of those who have been squandered? Or politely dismissed with, “we don’t know how to sell it.”

But everyone knows how to sell inspiring, beautiful, haunting, hilarious. Pee your pants funny.

Books are experiences that engender human emotion. And human emotion binds us, transcending our ethnicity, our gender, our culture, our politics and our beliefs. Machine learning offers us an opportunity to express ourselves fully and have equal access to a readership—a firmly democratic experience. The Booxby platform is making this a reality.

 

A.I. supercharges Intuition

 

The need for data-based analysis is critical in publishing, where so much emotion is involved.

Most of us in book publishing are highly intuitive people. When we are right, all is good. When wrong, things go bad quickly, with poor books coming to market, gross advances paid for books that don’t sell and, worst of all, wonderful, powerful books that never find an audience.

Would Damascena have fared better with the help of content-based analytics? Six years ago, no way. Today, yes. The content analysis that A.I. offers will be a godsend for writers and will allow editors to acquire well-honed works, and then make best use of their own talent to foster the narrative and other aspects of the creative process that far surpass a computer’s capacity.

But developing a great manuscript is the first of many hurdles in reaching the right reader. Many good books drown in the sea of millions. We now have the tools to link each book with readers hungering for just that style and experience. We have the means to connect the right book, to the right reader—at the right time. This is an unprecedented time in publishing as A.I. becomes the bridge connecting book to reader. Harnessing this cutting edge technology, Booxby looks to amplify the unique voice of each writer and guide his or her work to an eager audience.