Round-up of my tweets at #nicar16

Localizing the NYT Data Viz on Police Racial Disparity

One of the most important attributes of data driven journalism is that it scales, and the primary goal of my OpenRural, Open N.C. and data dashboard projects has been to democratize data so that we start seeing the same types of reporting and presentation in small community papers that we see in the big national news sites. So when I saw Thursday’s New York Times graphic on the race gap in America’s police departments, I immediately thought that something similar could be done pretty quickly that would look at North Carolina towns.

Rebecca Tippett at UNC’s Carolina Demography service was able to pull and clean the data within about three hours. She posted the data in CSV format to her blog, along with a nice explanation.

Being a words guy rather than a picture guy, I used data visualization software Tableau to put together a prototype of something similar to what The Times had done. It is absolutely no where near as good as what they did, but I copied their concept, color scheme and fonts. And about two hours later I had something that told the same story.

Click on the image to see the interactive version of an embeddable graphic that can easily — and at no cost — be dropped in to any news site or blog (except this one … because I’m still hosting it on javascript-averse

Click to view the interactive version on Tableau Public.
Click to view the interactive version on Tableau Public.

The graphic alone doesn’t tell the whole story. Tippett pointed out when I showed her the chart that most of the Latinos in Siler City aren’t even eligible to join the city’s police force — 40% are not adults, and 80% of adult Hispanics there are not citizens.

And many of these police forces are very small, which makes it easy for them to end up with huge percentage disparities in the racial breakdowns of their police and residents. Tiny Biscoe, for example, only has nine police officers. Wagram has two police officers — half of which are white and half of which are “other.”

The other potential problem with the data is that it’s seven years old. But so is the data used by The Times.

This is just an example of how we might continue to democratize data. This graphic could be emailed to an editor of each news outlet in North Carolina, along with a list of suggested questions that local reporters could ask to quickly make the data more relevant.

Suggested Questions to Localize This Data Driven Story

  • “This data is seven years old. Does it still look accurate to you? Can you provide me with some more recent data of the racial and ethnic breakdown of the police department?”
  • “Why do you think your department has a higher percentage of white officers than the residents?”
  • “How does the racial disparity between the police department and local residents effect the way your department works?”
  • “Walk me through the hiring process for new officers. How does a candidate’s race factor in to hiring decisions, if at all?”
  • “How do you publicize vacancies in the department? Do you do anything to recruit minority applicants?”
  • “What percentage of your officers live in the city? How important is it that officers come from within the city? Why?”
  • Also, seek opinions of others — both insiders such as city council members and community leaders as well as people on the street. Consider using social media such as Facebook or Twitter to ask people what they think about the data and these questions. This is the start of a conversation, not the end. Be sure to get a diversity of perspectives — age, gender, geography and certainly race and ethnicity.

The Challenge: News Deserts

But even if we acquire, clean and produce data along with some simple story guides, data driven journalism may still not find its way into smaller newspapers if nobody is there to receive our help. At many papers, this would still be seen as enterprise reporting. As an editor with a staff you can count on one hand, do you send a reporter out prospecting for answers to these somewhat uncomfortable questions? Or do you have them write up the day’s arrests? Or preview this weekend’s chamber of commerce golf tournament?

North Carolina also has broad news deserts — whole counties that have no reporters shining light in dark places, holding powerful people accountable and explaining an increasingly complex and interconnected world. Siler City, for example, is in a county of 65,000 people with a single newspaper that reaches only 12 percent of them. The News & Observer — provides scant coverage of the county.

What other story templates would you like to see? What would make them easier to use?

Why ‘Robot Reporters’ Are a Good Thing

First of all, let’s not let allow the alluring alliteration to distract from we’re really talking about — not robot reporters, but robot writers.

Mashable’s Lance Ulanoff asked me what I thought about the news that Durham’s Automated Insights would be writing automated business stories for the Associated Press.

This trend excites me about the future of journalism. I’ve been talking with folks about it for about five years, since I first saw similar work that was being incubated by Northwestern’s journalism school. That effort grew into the company Narrative Science, which has been writing earnings preview stories for The Los Angeles Times uses an algorithm to write earthquake stories. The Washington Post has looked into using Narrative Science for high school sports stories.

The Guardian learned how hard it is to build a robot writer, but the automated stories I’ve seen written by both Automated Insights and Narrative Science are pretty good. And 46 media and communications undergrads couldn’t distinguish a computer written story from one written by a human.

The trend in automation should free up the best writers and best reporters to add the how and why context that still needs to be done by humans. If I were a beat reporter at a newspaper I’d be working as fast I could to convince by editor to let a computer write the scut stories I have to write and free me up to do more explanatory and accountability reporting, or to craft beautifully written narratives.

One significant risk is that for the last decade we’ve seen “good enough” journalism growing in popularity. News organizations that continue to have a strategy of harvesting profits rather than investing in growth will no doubt cut reporters if machines can write commodity news at a lower cost.

If I were a young journalist looking for my first job, I’d be looking for news organizations that are sustaining a small margin and growing both expenses and revenues — the ones that are using both bots and humans.

The trend toward automation will result in an emphasis on the news value of impact. Mass customization is going to change the nouns in the leads of stories from the third person to the second — “investors” will become “you.”

The trick is how to make money off this. News organizations that continue to see themselves as manufacturers of goods will probably increase the volume of digital commodity content they publish and continue to drive down ad rates.

But smart content companies are evolving from a manufacturing industry to a service industry, and trying to create, explain and capture the value they provide to each client by getting the right information to the right people at the right time.

What we see now as data is as unsophisticated as what many of us thought of data when Google first made its mission organizing all of it. We think of data now as numbers in tables — scores, money, temperatures, but we’ll soon see data as behavior and content metadata. And we will see automated stories that incorporate the user’s data and the data of her social network as well.

That level of concierge news service, though, is going to come at a price for users. If we’ve seen the democratization of media this automation trend has the potential to create a world of media haves and have nots — the haves will pay premium subscription fees to get highly personalized news from bots. The have-nots will get generic news (maybe written by bots as well).

The one thing from which I think everyone will benefit is an increase in the quality and frequency of narrative writing, and of explanatory and accountability reporting.

To aid that transition I’m working on the idea that we can use digital public records to build a newsroom dashboard system that will alert beat reporters to possible story ideas. Automated Insights and Narrative Science are scaling commodity news stories. I want to see if we can lower the human reporters’ opportunity cost of pursuing enterprise stories that land with much bigger and much longer lasting impact.

If you want a pithy quote from a journalism prof. on the effect that robot writers are going to have on the job market for journalism students, here it is: “My C students are probably screwed. My A students are going to do better than ever.”

Data Journalism Class Exercise (Or, Teaching Critical Thinking)

Here’s a great exercise for journalism professors who are introducing their students to data-driven journalism. It provides a good opportunity to show them that they have to get over the common perception that data is unbiased — clean and clear. It gives instructors an opportunity to talk about the need to “interview” the data.

The assignment is deceptively simple: Have the students download the Census Bureau’s list of rural and urban counties and calculate the population density for the counties in your state.

That’s it. Tell them no more. Depending on where they get stuck, slowly reveal to them the clues they need to complete the project. What you may not be surprised to find is that too many college undergrads seem to be accustomed to following step-by-step instructions and too few know how to break down a problem into smaller, sequential pieces. This is the kind of critical thinking skills that they need to be good journalists. Or, as I like to say, think journalistically regardless of their eventual profession.

Helping Them Get Unstuck

Force your students to get a quick start. Don’t let them sit and stare at their computer screens for even a second. Agitate them in whatever way you need to make them feel like an asteroid is about to smash the earth to smithereens. They can’t solve the whole problem all at once, so what are the pieces of the problem hidden inside this big problem?

  • Where can you find the Census list of rural and urban counties?

The answer — of course — is Google. So, there’s an opportunity to teach efficient search strategies.

Students will click around the Census site a bit trying to find what they want. Ask how skimmed and how many read every word on each page. A good opportunity to talk about the way people use information online.

You can help students find the data they need. And from there you can show them basic file-management and Excel techniques. Where does the file download on their computer? What’s the difference between a .csv and a .xlsx file?

With the data open in Excel, they’ll need to sort to filter out just their state. But now what? Ask the students what they think each of the columns represent. What does it mean that something has a POP_UA of 10791 and a STATE of 37?

Once they figure that out, they may note that the data includes some pre-calculated population density. But it’s not the information you asked them to find, so they’ll have to calculate population density — a commonly-needed, very simple journalism math equation.

This gives you a chance to explain that numbers are only meaningful in relation to other numbers. And how to do basic calculations in Excel.

The students will do the math correctly, but they won’t get answers that make any sense. A chance for you to talk with them about how data still has to pass the sniff test. Why doesn’t the data make sense? They can find the answer back on the Census website.

Once they’ve made the correct calculations (how many meters are in a mile anyway?), you can talk with them about how you still need to find the story in the data. Even though their calculations have added value to the data — essentially refining raw ore — mere presentation is of marginal value.

You can top off the conversation by coming back to language, and that journalistic aspiration for precision and objectivity. What does “rural” mean anyway? What does the dictionary say? Is it an abstract concept or something you can measure? How (many different ways) does the Census measure it? How is it different than the USDA’s definition? Which is better? Why?

This is a project that could take several weeks as a module in a college class, or as a MOOC or quick conference or newsroom workshop. Its strength is its scope and flexibility. Just like a good journalist.