View in browser
Part I
Who are we?

Good news! The data cleaning is done! The nice thing about working with a survey this small is that it takes me hours instead of days to get it ready to analyze. We had a total of 220 authors volunteer their information (thank you!) for the good of this community. Out of that, 199 of the group were completely self-published, so for now, that is the population I am going to focus on. I may bring that extra 21 back into the fold later, but because we are focused on this market, they get to chill out on the side and watch. 

The next step in doing this sort of analysis is to get to know my data. This is usually where most survey "analyses" stop. They show us a bunch of pie charts or histograms and then talk about them like they are demonstrative of key trends. 

New flash: that's not necessarily the case. Which is where stats comes in. 

That said, these kinds of charts ARE important because they give me some ideas about what kinds of key differences to look for. Those trends we see from RWA or Publisher's Weekly? They really need to be analyzed for their "statistical significance," which is a jargon-y way of asking if we are at least 95% positive that trend appears to be a legitimate pattern instead of just the luck of the sample. I won't go into all the ways we determine that (it involves calculus, matrix algebra, and a lot of Greek symbols), but just believe me: we can do it. 

But more importantly to me, these charts indicate which variables are likely to be most reliable, which might not, and which might suggest ways the data needs to be compartmentalized in order to identify patterns specific to different subgroups.

I'll recap some of my initial thoughts below on some of the first key variables that pop out to me. If you are familiar with this kind of analysis and think differently, send me a note—I love chatting stats with people, and I'm always open to alternative thoughts on interpreting the data.

In a regression (a predictive model) or an ANOVA (an analysis of differences between groups), I would convert this to numbers of years in the publishing industry. But this is easier to read for us. 

Two things I notice. One you probably already knew: we are a new industry. Check out that boom between 2014 and 2017. Everyone said self-publishing exploded in the last five years, but you can really see it right there. 

The second thing I notice is that this is a pretty normal distribution—e.g. it has mostly that nice bell-curve shape—which makes it an initially solid candidate for analysis...with the exception of those outliers before 2011. I'm guessing these authors used to be traditionally published and moved over, but it's hard to say. If this variable gives us some significant results, I'll have to check those people in particular to see if they are doing anything weird.

I apologize for the weird legend at the bottom of some of these. It's late, and I've run out of patience with Excel. But hopefully you can see the essentials. So, a couple of things here. That weird "skew" (that's when a histogram's data looks like it's pushed to one side) and the tail (when it drifts out a ways) are both present again, like the previous chart. One of our authors reported publishing over 96 books, and another reported 115! That's amazing, but clearly, as we can see, out of the ordinary. We also have another set of authors who have published more than 45 books in their career. My hunch is those people are doing something different than other authors—we'll want to keep an eye on them too. Are they the ones who have been publishing since 2000? Or is this a reflection of book length or rapid release? We shall see...

The beautiful bell curve of this graph makes me really happy—except, of course, for those outliers above 12 books a year. I'll talk about this later, but a pattern is starting to emerge here, no? We look at multiple frequencies with this kind of data to see if patterns in skew emerge too. Again, I'm wondering who these outliers are, and if they are doing something extreme enough to affect the larger trends we might find. We'll be watching.

Moving on from histograms. I like to use pie charts to look at what we call "categorical" data—that's data that measures categories of people that have no numeric logic linking them. For instance, here we have series types: standalones, series standalones, cliffhanger series, and other.

As a cliffhanger writer myself, I depend a lot on my series read-through to maximize my return on investment. I'm curious if  series type impacts on income levels. Unfortunately, we may not have enough cliffhanger writers in the survey to determine that. 19 people isn't a very big sample, nor does it meet the general 10% rule of thumb you need to examine subgroups against others. But we're close, so I'll probably give it a go, if only for my own curiosity. 

Another question I have as a LONG book writer (do you see me in the 100k-150k group?) is whether or not my investment in lots of words really does affect income. As someone who has several books in KU, I sort of depend on it, but I do wonder sometimes if readers see "430 pages" in the product description and decide they don't have time for this Dickensian crap. So that will be something to look for, but again, that 4.5% (all 9 of us!) will probably be too small of a sample to say for sure. What might be more interesting is the potential contrast between shorter novels (40k-70k) and longer (70k to 100k). As the majority of the sample, they are going to dominate any patterns we find.

Speaking of KU and pages...I have my doubts about this metric, but we'll try connecting it to revenue or income. When 14% of the entire samples decides just not to tell me what's going on, I have to wonder if this is like income—one of those metrics people get cagey about (hey, no judgment). What I'll probably do here is code a separate category for the no-responses to see if they are doing something funky with other metrics. There's a lesson for you: just because you don't answer a question doesn't mean your lack of response won't mean something. Especially if the other people doing it are a lot like you.

Last chart of the day (yay!). Obviously I asked a lot more questions than just these seven, but they give us a good portrait of whom we are looking at. This is the big one—one of two questions I asked about income. 

Like the KU question, a solid percentage of people did not answer the question about direct revenue, and spoiler: they didn't reveal any specific patterns in early analyses either. But for the bracket question, only four people neglected to answer. 

I'll be real: it's a LOT easier to build predictive models around a numeric measure instead of a categorical one. So I'm not giving up on the revenue question just yet. But this one will give me a good basis to see how we split ourselves up as a group. Already I see one thing: low income authors are my control—another way of saying, they'll be the group I will frequently compared everyone else to. 86 people is nearly half the sample, but it also sounds about right for this field. For the sake of analysis, I'll also probably have to group everyone earning $500k and above together, even though that will still only get them to 2%. Here's the real question that I genuinely don't know: are there more than 20 $500k+ earners out there? Would they be willing to share their basic metrics? Or is it top secret? This is why that revenue question will come in handy. We can see how each of these groups are clustering and if they need to be reorganized to help with our analysis.

All right. That's it for today. I do have a book to finish, and since I write those cliffies, there might be literal pitchforks at my door if I don't get it done. Next I'll send range and means for some of the numeric data and talk about a very important concept: standard deviations and what they help us do. 

If you have any questions, thoughts, or comments, send me a note. Good research is collaborative, after all. 

- Nic

Nicole French Romance

You received this email because you signed up on our website or made a purchase from us.