Technology Trends in Indie Publishing:
A professional perspective

OCTOBER 2022

Nat 'Nose' Connors

<nat@kindletrends.com>

This is a web-based slide presentation.  To go forward, click the slides and press the space bar, use the arrow keys, or swipe if you're on a touchscreen.
To see all the slides at once, press the 'o' key.

What are we talking about?

Technology trends relevant to publishing

Impact on creative activity

Impact on real-world team activity

'Artificial Intelligence'

'Cloud computing'

GPT-3/DALL-E/Stable Diffusion

How are we going to do it?

  1. A bit about me and my background
  2. Primer on machine learning and computational linguistics
  3. Generative models - covers, speech, text
  4. Long-form fiction analysis: a hard problem
  5. Workflow consequences
  6. Plagiarism: another hard problem
  7. Commercial/legal issues
  8. Roundup - how will it affect you as publishing industry professionals?

Fun!

But first, me

(my favourite subject)

Tech writing

Startup culture

Enterprise software development

Cancer research

(gene network modelling)

Author services

(data analysis)

Kindletrends

A weekly and monthly genre research newsletter

Research done for you, in your inbox every week

Machine learning

Asking 'what number comes next in the sequence?'

Lots of confusing definitions and taxonomic argy-bargy

Artificial intelligence

Deep learning

Neural networks

Random forests

But here's my definition:

What number comes next in the sequence?

The numbers might represent different things

We might express the output in different ways

2

4

6

8

10

Input data

Output prediction

2

4

6

8

'probably between 9 and 11'

'A dog'

'A dog'

'A dog'

'A dog'

The point

Ask 'where did the data come from?'

Ask 'what is done with the prediction?'

More important than the fine details of the method or galactic amounts of computing power

'Hickory dickory...

How did this happen?

'Artificial intelligence' was quite different 1960-2010s (although taxonomy isn't very important)

Made possible by: 1)accessible computing power, and 2)oodles of 'free' metadata-tagged information

But profound questions of attribution, bias, provenance and consequences remain

"The Internet was...the boot loader for AI"

Computational linguistics

Understanding language using algorithms

(= sets of rules)

Computational linguistics

Text analysis

...but these tools are oriented toward business writing

Computational linguistics

Some claims about 'using computers to analyse bestsellers'...

...but I've set to see actionable insights at the story level (why? I think I know...)

Computational linguistics

Translation

Computational linguistics

Interpretation (question answering)

Computational linguistics

Actual language research in the humanities

Making 'new' content

(generative models)

Cover art

Just like our dog example

Associate pictures with words, and group 'similar' items together.

Input a phrase or picture -> get back a 'similar' picture

'dog' using DALL-E 2

'A dog'

'A dog'

'A dog'

Examples

'beast romance fantasy photorealistic'

'man chest black and white'

Variations

Models

DALL-E/DALL-E 2 (OpenAI)

Stable Diffusion (StabilityAI)

Imagen (Google)

Midjourney (maybe)

Craiyon (open source)

Services

https://www.artbreeder.com/

https://www.wombo.art/

https://accomplice.ai

https://www.midjourney.com

https://replicate.com/

https://nightcafe.studio/

etc.

Making new models is hard work/expensive

Our questions

What are they trained on?

What is the output?

LAION/Common Crawl

MSCOCO

YFCC100M

ImageNet

Text

Related image

Related image

Image

The consequences

'Related' is defined wholly by the data used to make ('train') the model

So you get some unexpected effects

Greg Rutkowski

Who?

Prolific fantasy artist who has a large portfolio online

Carefully added metadata to his portfolio

Result > 50x more frequently used as a prompt than Michaelangelo/Picasso/da Vinci

...he's not making a cent off it

Unexpected effects

Logos turning up in pictures (sort of)

https://twitter.com/amoebadesign/status/1534542037814591490

Unexpected effects

Attribution/copyright is murky

"Images for ImageNet were collected from various online sources. ImageNet doesn't own the copyright for any of the images."
"If you become aware that any social media website uses any Content in a manner that exceeds your license hereunder, you agree to remove all derivative works incorporating Content from such Social Media Site, and to promptly notify Shutterstock of each such social media website's use."

Unexpected effects

Bias in the input data > bias in the output

'a CEO seated at a desk'

Stable Diffusion

(via Accomplice)

DALL-E 2

So what?

(for the publishing industry)

If you want anything outside the norm for a cover/other illustration, it may not be easy to produce

It will be much more likely to bear a close resemblance to the input data -> increased commercial risk

Publishing outside the Anglosphere requires quite different stylistic elements in cover design (cf. Cozy Mysteries)

Synthesised speech

Here, sequences of numbers represent sounds (words or parts of words)

But the same principles apply

Our questions

What are they trained on?

What is the output?

Common Voice (Mozilla)

Speech recognition services? (Google, Microsoft, Apple)

Probably large private datasets

Text

Audio waveform

Fine-tuning required

'Authenticity' of speech is heavily affected by context

...but the results can be impressive

Will this work for publishing?

...possibly, but 'synthetic voice producer' will be a new job

Text generation

Has a long history predating machine learning

'Musical dice games' were popular in the 18th century
'Plotto', a book from 1928, generates (quite involved) plots from a systematic process

I made a (simple) generator for 5K short romance stories!  It was...not the money-printing machine I had hoped for

Text generation

Computational approaches started with 'Markov chain' methods

 - read in a lot of text and look at what word follows what

- kind of like a dice game too

I

am

a

fireman

lamp-post

0.5

0.5

Markov chains

Actually work very well for short, structured things without much context

The sea darkens; 
the daffodil
bending at a post
This first snow
falls from the water jar  
   on the year ending rain

Wine reviews (a favourite!):

Fancier examples

There once was a young man

Who never smiled as he walked.

And his smile always looked to be

A frown, and his eyes were cold.

He didn't laugh or grin,

Just glared at the world around him.

I looked at the pistol on the mantelpiece, and

as I did so a thought occurred to me. It was a logical thought, but it took some time before I realised that I had never seen myself as being part of the logic.

'You're going to have to kill me,' I said.

Models

GPT-2/GPT-3 family (OpenAI)

Services

https://www.sudowrite.com/

https://inferkit.com/

https://novelai.net/

Making new models is hard work/expensive

BERT family (Google)

Our questions

What are they trained on?

What is the output?

'BooksCorpus'

Text

Text (sentence/

paragraph) completions

Distributions of words and phrases

But

Most of these methods 'fall off a cliff' at about 500 words or so

Why?

Modelling the structure of fiction is hard

Fiction has 'long-range' structures as well as short-range ones

Phrases

Sentences

Paragraphs

Plot arc(s)

Character arc(s)

Themes

Motifs

And those structures are complex

(which is a fancy way of saying 'carrying meaning that isn't explicitly encoded')

And detecting plagiarism is hard too

Computational notions of similarity don't match human notions of similarity

(possibly because of the previous point)

I made a plagiarism detector based on 'n-gram overlap'

It sort-of worked but cannot talk humans out of their feelings/beliefs

Commercial/legal issues

Attribution is murky

CC/other permissive licenses aren't the free-for-all they are believed to be

Those making the tools don't profit from the images

Similarity is a subjective issue

Bias in datasets, both present and introduced

But there are some useful workflow consequences

Guided iteration on a concept

Change management for books - a whole separate (interesting!) issue

Customers iterating toward options/choices on their own?

Roundup - how will this affect you?

Creatives

Don't panic

Creative work will change its toolset, but it will remain creative

Example: 'AI prompt generator' is now a job!

Familiarise yourself with the major generative tools in your area

Work out how they were trained

Use your nous+experience to perceive their consequent biases and act accordingly

An upside: generative technologies may help with work-for-hire

Editors

There may be value in positioning your client's work relative to others

Consider 'visualisation' tools to explain to clients what you are doing - a fascinating area

Help me make an editing tool which provides actionable insights at the whole story level!

Marketers

Positioning relative to other titles may become more precise/fine-grained

Rapidly generating variations may allow for targeting to specific audiences (e.g. audio)

New channels are coming out all the time - stay alert to the possibility of generative content for them (= Tiktok!)

Publishers

'Success' metrics should be treated with suspicion

Don't rely on your legal team to safeguard you against copyright/plagiarism exposure

Support your team to learn about new technologies - they will see the specifics where you may not

Take-home messages

  • Change is likely to be more evolutionary than revolutionary (despite what you might read)
  • I don't think whole jobs will 'cease to exist' (although they may well change quite a bit)
  • Significant issues remain with attribution/derivation in both a legal/commercial and a professional sense - and there is no great appetite to tackle them.
  • If you have children, tell them to pursue a career in copyright law

Acknowledgements

Harry for giving me the chance to speak

Susie for organising everything

Authors and artists who helped with this talk

Lana Love

Sansa Rayne

Sacha Black

Rachael Herron

Steffanie Holmes

The SPA Girls

Relay staff

Thank you for listening!

Useful links

Accomplice AI:  https://accomplice.ai/ - a good starting point for playing with image generation

Have I Been Trained: https://haveibeentrained.com - a resource for artists to check their work

How DALL-E 2 works: https://www.assemblyai.com/blog/how-dall-e-2-actually-works/ - a good nontechnical explanation

State of AI 2022 presentation:  https://www.stateof.ai/ - long and very general (not publishing-focused) but comprehensive

Prompter:  https://www.thedreamingstate.com/portfolio/art/prompter/ - a useful tool for getting started with making image prompts

Whisper (OpenAI): https://openai.com/blog/whisper/ - alarmingly good speech recognition

Technology Trends in Indie Publishing

By Nat Connors

Technology Trends in Indie Publishing

  • 70