Good news! The data cleaning is done! The nice thing about working with a survey this small is that it takes me hours instead of days to get it ready to analyze. We had a total of 220 authors volunteer their information (thank you!) for the good of this community. Out of that, 199 of the group were completely self-published, so for now, that is the population I am going to focus on. I may bring that extra 21 back into the fold later, but because we are focused on this market, they get to chill out on the side and watch.
The next step in doing this sort of analysis is to get to know my data. This is usually where most survey "analyses" stop. They show us a bunch of pie charts or histograms and then talk about them like they are demonstrative of key trends.
New flash: that's not necessarily the case. Which is where stats comes in.
That said, these kinds of charts ARE important because they give me some ideas about what kinds of key differences to look for. Those trends we see from RWA or Publisher's Weekly? They really need to be analyzed for their "statistical significance," which is a jargon-y way of asking if we are at least 95% positive that trend appears to be a legitimate pattern instead of just the luck of the sample. I won't go into all the ways we determine that (it involves calculus, matrix algebra, and a lot of Greek symbols), but just believe me: we can do it.
But more importantly to me, these charts indicate which variables are likely to be most reliable, which might not, and which might suggest ways the data needs to be compartmentalized in order to identify patterns specific to different subgroups.
I'll recap some of my initial thoughts below on some of the first key variables that pop out to me. If you are familiar with this kind of analysis and think differently, send me a note—I love chatting stats with people, and I'm always open to alternative thoughts on interpreting the data.