All right. That right there is a chart with the "descriptives" of some of the main numeric variables we recorded. I'm going to talk about what I notice about each one.
Up top, we have a few measures of platform size: newsletter size (NLSubscribers), Facebook page size, Facebook group size, and Facebook friend count. Yes, I could have asked for others (Twitter, Instagram, etc.). If there are signs that a particular social media account really affect sales, that might be a question to track overall. But current logic says that the majority of interaction and advertising happens on Facebook right now (with a good chunk to AMS and Bookbub). And it's where many of us build our primary platforms. So I started there.
The N-value is the number of people who answered that question. Notice how they aren't the same for every one? It's because some people didn't answer each of the questions. That's typical, but it's also why I wanted a large enough sample size to account for missing people. Our list wise valid N with all of these variables is 120, after all participants with missing data in ANY of those measures are "deleted." This means if I want to keep my prediction model solid, I'll have to keep the variables down to about 9 or 10 max. Hopefully it doesn't come to that. But this is why it's good to answer ALL questions in a survey. You might not think there is a connection, but the researcher probably does. If you don't answer, your data is not included in that assessment.
Minimum and maximum are pretty self-explanatory (if they aren't, let me know). Mean is another word for average—all of the reported values in the sample divided by the number of them. It's telling, for instance that the mean for Revenue is $80k. Even without looking at a histogram, that tells me there are a lot more low-earners than high earners—otherwise it would be closer to 1.25 million, halfway between the minimum and maximum. It also tells me that the high earners are outliers. As much as we all want to be like them, they might skew our results. But we'll keep them in because we want to know their patterns.
The last thing to really understand here is that column for standard deviation. This is a hard one to explain without a chalkboard, but I'll do my best with the graphic I stole from Wikpedia.