I posted an earlier draft of this proposal already, but here is the full version of the proposal that has been selected as one of the finalists by the Knight Foundation in their 2011 Knight News Challenge. Let me know your thoughts. Thanks to everyone who has helped spawn and cultivate this project. Every conversation has me more excited about what we’ll be able to do for rural newspapers in North Carolina and across the country.
Describe your project
We will build a not-for-profit clearinghouse of data from state, county and municipal governments in North Carolina and deploy them through pilot OpenBlock installations at the websites of nine rural newspapers in the state. The datasets will include public records of particular interest, such as crime reports, real estate and restaurant inspections.
We have already conducted research, funded by the McCormick Foundation, that indicates deployment of OpenBlock on the websites of small and mid-sized papers could provide significant digital revenue potential — given the interest readers have in understanding their communities. But that the main barrier to implementing OpenBlock is a lack of technical expertise at small papers as well as the high cost of ongoing data collection.
At the end of our 27-month funding period, we will have reduced the costs of acquiring, aggregating and publishing public data at community newspapers. We will also have developed one or more revenue models that demonstrate how meeting the information needs of a community can also be good business, even in small towns and rural counties of fewer than 75,000 people.
Rural news organizations are struggling to move to the digital age in part because their staffs are so small that they don’t have the capacity to identify, digitize, re-aggregate and map all the various public records available at the state and local level into databases that can be accessed intelligently by both reporters and the public.
This project tackles the lack of capacity at rural papers from two directions. It will create a centralized clearinghouse of state, county and city schemas and datafeeds that could be easily used in OpenBlock. It will also create compelling editorial content that will draw new, young readers to community information presented in a format and medium they want. Audiences for this kind of editorial product are loyal. They generate repeat visits by returning to seek updates on crime especially. And they also generate page-views and increased time-on-site as they search and sort the information.
We expect through this project to lower the costs of data acquisition and organization through a variety of methods that we will be able to assess and compare. In some cases, volunteers will pick up CDs of data from county offices. In others, journalists may scan and upload PDFs of hand-written police incident reports. In still other cases, people would key into a database the information on those PDFs. This job is so big that no single small news organization could do it. But with the support of a not-for-profit organization that provides centralized technical, editorial and advertising expertise, we could create a model for gathering valuable public records from rural America. To individual communities, these records are necessary to foster an informed civic dialog and healthy economy. But in aggregate, these records may also be able to shed light on trends in rural America that would otherwise go unreported.
This project will demonstrate one way that universities can support and advance journalistic activity – by providing a launchpad for new ventures that draws upon broad faculty expertise and student workers to lower the costs of professional, independent public affairs journalism and by absorbing some of the risk associated with new editorial product development.
Knight funding will get us off the ground and put us in a position to be a self-sustaining not-for-profit company, serving North Carolina journalists and citizens and providing a model for other states and regions to adopt.
Improving Delivery of News and Information to Geographic Communities
In small towns and rural America, the local newspaper is more than just a source of information and an engine of commerce. More importantly, it fosters and builds geographic community and sets the agenda for public policy debate. By making public records readily available and well-organized, we will support decision-making and accountability in local and state government.
This project most clearly improves the delivery of news and information to geographic communities by helping rural community newspapers make the transition to the digital age and remain relevant for younger audiences that are less informed and engaged in their own communities.
We expect several community newspapers to incorporate crowd-sourcing – a technique once known in their newsrooms as “neighbors editors” – into the process of data acquisition. Where this happens, we expect an increase in civic and community engagement. — first, by forming a network of knowledgeable volunteer citizen-journalists and by creating greater demand for truly open government records.
Unmet Needs
In many cases, data that is readily available in GeoRSS or at least CSV format from big cities is simply not available even in print from rural governments. For example, journalism students at the University of North Carolina working last semester to gather and organize public records in two rural counties for an OpenBlock application met with a number of obstacles – ranging from significant photocopying fees to inappropriate redactions and denial of access to public information.
Even when acquisition of public datasets is relatively simple – for example, public health restaurant inspections — someone must request that data from a specific county be exported in fielded data format. It is inefficient for each rural news organization to make separate requests for this data in each of North Carolina’s 100 counties. In these cases, our centralized organization would outline an initial request for the data for each county.
When Rick Thames, the editor of The Charlotte Observer, reviewed our proposal, he offered his enthusiastic endorsement. “There is no question but what this would fill a need,” he said. “Small papers can’t do this sort of work on their own. So, sadly, it just isn’t getting done. What a gift this would be for those communities. A very worthy effort that would be warmly received by the editors and publishers of every small and mid-sized paper that I know.”
What’s New?
Currently there is no tool or service that can efficiently gather, format and publish public records on rural news organizations’ sites. In part, this is a technology problem that may soon be overcome with the alpha rollout of OpenBlock later in 2011. But a much bigger piece of the problem is the data itself – neither OpenBlock nor any other technology has the ability to obtain public records as fielded digital data and create a newsworthy user interface for all the various types of records that a news organization might need.
Without a project like this there is no indication that OpenBlock will be a viable option for papers in rural communities.
What Will Change?
By the end of the project, we will have …
1. About 95 up-to-date feeds of local government data in standardized, fielded formats such as GeoRSS. These feeds will be available under a Creative Commons Attribution, Share Alike license. By providing public information in this format, we will lower the barriers to North Carolinians interested in researching trends or patterns in public policy and we’ll provide the raw material for the development of mashups or entrepreneurial applications we haven’t even thought of yet.
2. Nine community newspapers using OpenBlock to publish fresh, local government data to their audiences. These newspapers will be on the frontlines of a statewide effort to get complete and current government datasets in open, machine-readable formats. They will demonstrate multiple approaches to implementation that will be relevant to others’ during the broader roll-out.
3. Identified new revenue opportunities structured around the presentation and analysis of this data that will support their journalism.
4. Journalists and citizens interested in public policy issues will have a new tool for analyzing trends and patterns in rural issues such as environmental stewardship, public health, crime and justice, education, and economic development. Community newspapers will be able to more easily compare the experiences of their communities with the experiences in other places across the state.
5. A cost-effective model for building similar independent, not-for-profit data repositories in other states.
6. Most importantly, we will have raised public awareness of open government and we will start seeing rural counties and towns publish public data in standardized, machine-readable formats on the Web.
Why are you the right person?
This project would be tested in North Carolina and rolled out nationally. While the Raleigh-Chapel Hill area is a hub for information technology, the state has a high percentage of rural counties (roughly 70 out of 100) and a strong tradition of quality community news organizations.
The project builds on extensive and longstanding collaborations between the University of North Carolina and North Carolina Press Association.
“WOW! This is an interesting and ambitious project and I know there will be many Carolina newspapers that will want this service,” Beth Grace, the director of the N.C. Press Association, told us. “At a time when papers have lost staff and have had to postpone in-depth/investigative and trend reporting, this could bring some of that information back to papers and their readers. The North Carolina Press Association stands ready to assist — we can work with you to help assess what records most papers –and importantly, their readers — would want.”
This project will address a critical need that’s been identified through the work of UNC Knight Chair Penny Muse Abernathy with three rural papers, and in partnership with professor Ryan Thornburg, whose students have already begun collecting digital public data in these rural counties. The project was funded by the McCormick Foundation to develop sustainable business models for community news organizations.
“Our newspaper has worked with Ryan Thornburg for the past year as we try to figure out how to take advantage of OpenBlock for Whiteville.com,” said Les High, the editor and publishers of The Whiteville News-Reporter. “As is the case in most rural communities in this state, the public information we plan to display is not readily downloadable to the site. This project would provide an important community service to residents of all of North Carolina’s 100 counties – bringing the benefits of the digital highway to even the most remote areas. And just as important, OpenBlock could well be an important source of new revenue for community newspapers everywhere. This is a very important first step in making OpenBlock economically feasible for small papers to implement and use.
A database of local information – and we believe OpenBlock is the best solution at this point — is a central component of the financial strategy in the digital age. Yet, the obstacles in collecting and digitizing loom as a barrier to successful implementation.”
What tasks/benchmarks need to be accomplished to develop your project and by when will you complete them?
The project has three phases, each with its own tasks and benchmarks. We have developed a detailed timeline and budget that are available immediately upon request.
Phase I is underway with funding from the McCormick Foundation to install the OpenBlock codebase on a virtual machine, to format, ingest and publish two datasets from North Carolina local governments, and write a public report on the technical risks of the project.
The report and any code we develop will be shared with the OpenBlock community. We will publish this report by April 15. (We understand this is before funding would be available from the Knight News Challenge.)
This summer, with Knight funding, journalism students and community newspaper reporters around the state will conduct a census of public records. We will pay participants to complete forms describing the location and characteristics of state and local datasets.
At the completion of Phase I, we will publish a directory of the datasets and a report that describes the economic cost to journalism of governments not publishing data in machine readable format.
Phase I will end September 30, 2011.
Phase II: The focus in this Phase will be on reducing the costs of deploying OpenBlock at rural papers as well as the costs of acquiring, organizing and maintaining data feeds that can be easily integrated into the OpenBlock application.
By the end of this Phase, we will install eight additional OpenBlock sites, publish relevant data feeds and make them freely available under a Creative Commons non-commercial, share alike license.
We will also design, test, iterate and document sample data-collection processes for a variety of scenarios we expect to encounter during the statewide deployment of OpenBlock installations. The documentation will be critical to news organizations across the country as they plan and budget their own efforts.
Phase II will run from October 2011 through September 2012..
Phase III: During the final phase of the project we will focus on generating for community newspapers revenue models that will be used to support and encourage the continuing maintenance and development of our OpenBlock installations.
During this phase we will begin a phased, statewide rollout of OpenBlock to community newspapers and we will have a comprehensive, statewide collection of public records feeds available from our clearinghouse.
Phase III will also see the incorporation of a not-for-profit organization that will house the project after the end of the grant. It will be funded with the annual membership fees from community newspapers at which we have installed OpenBlock. This organization would also maintain the clearinghouse of public data, some of which may come from places where we don’t have media partners. Finally, it will provide editorial guidance to anyone interested in using the data to create their own data tools or to write stories about trends or patterns revealed by the aggregated data.
Phase III will run from October 2012 to August 2013.
How will you measure progress?
We will measure progress primarily by meeting our benchmarks on deadline and within budget. We will recruit partners, successfully install OpenBlock at community news websites, and collect and distribute feedback from partner newspapers.
We have developed a detailed timeline and budget that are available immediately upon request.
Ultimately, we hope to see a statewide movement to support laws and systems that make government documents and data more easily accessible to North Carolina citizens. With those public policy shifts, we believe we will see more and better public affairs journalism as well as faster and more equitable resolution of civic debates.
Do you see any risk in the development of your project?
The risks of our project fall into three categories: data acquisition, data management and publication, and revenue generation.
Data Acquisition — The goal of the project is to reduce the cost of acquiring current and complete local government data in small communities. The costs now make widespread deployment of the OpenBlock application prohibitive for small publishers.
Challenges to low-cost data acquisition are technical, political and legal. The technical problems are all surmountable – at some cost, perhaps higher than we hope. In our early going, we anticipate many data sources that will require manual entry. The risks with these data sources will be accuracy and efficiency. We hope to test various quality assurance methods across our nine initial sites.
But the real challenge we believe will be reticent government agencies and uncooperative vendors with government that make their money through government contracts for digital data storage and management.
Our experiences with student efforts to collect digital, fielded data in rural communities give us a pretty good idea of the type of challenges, if not their scope. For this project, we intend to employ reporters within each community to leverage their community-based knowledge and relationships to help overcome these challenges.
Through conversations with attorneys for the N.C. Press Association, we don’t see any legal reason that we cannot gather the data we need for the feeds to be editorially meaningful.
Data Management and Publication — This project depends significantly on the successful alpha launch of the OpenBlock installations at The Columbia Daily Tribune and The Boston Globe. We anticipate these Knight-funded launches to happen late Spring 2011.
To mitigate this risk, we have already consulted with developers to help us more clearly see the technical challenges that might stand between data collection and our goal of deploying the OpenBlock application at nine community newspapers by the end of August, 2012.
As we understand it now, the technical challenges involve scraping data, developing locally meaningful schemas for various datatypes, the development of a simple user-interface for data editing on the backend, and customizing the front-end look and feel of OpenBlock to match the websites of existing community newspapers, many of which use the commercial TownNews CMS/service.
To ensure this is adequately addressed by those with sufficient technical experience to assess and solve these problems, we will hire a qualified and cost-effective group of developers to help us.
Revenue Generation – Community newspaper publishers will participate in this project only if we can demonstrate a positive return on their investment. While Foundation support is essential to the launch of this project, sustaining it will only be worthwhile if we can help small newspapers generate revenue. On the cost side, we quickly discover the most efficient strategies for data acquisition and maintenance. On the revenue side, Penny Abernathy has been working with three small newspapers to develop sponsorship models that we believe will yield enough revenue for publishers to justify the annual cost of the service.
How will people learn about what you are doing?
There are three critical audiences for this project. First, is a national audience that we will reach through trade websites and conferences as well as the OpenBlock community that is being so well nurtured by OpenPlans and its Knight-funded efforts.
In North Carolina, we have a statewide audience of newspaper publishers, editors and engaged citizens. Our affiliations with the N.C. Press Association, the N.C. Open Government Coalition, and the School of Journalism and Mass Communication at the University of North Carolina will help us identify partner newspapers and datasets that are editorially significant.
Our most important audiences will be the local news site users and advertisers. We expect and need these citizens to become consumers of public records and advocates for digital, fielded government local government data. In many cases, we also expect that these audiences will also be our collaborators and key elements of the data collection workflow. For these audiences, the local newspaper partners will be our most important channels of communication.
Is this a one-time experiment or do you think it will continue after the grant?
The information needs of our state’s communities will be best served if this project continues beyond the term of the grant. Our application anticipates that and asks the Knight Foundation to help us create a sustainable not-for-profit organization that will be self-funding at the end of the grant.
But even if one of the risks we’ve outlined prevents us from creating a self-funding not-for-profit, the journalism community at the end of this grant will have several hard deliverables that will be used to guide further efforts:
- A description of state and local datasets in one of the nation’s most populated states. (August 15, 2011)
- A Paper that describes the economic cost to journalism of governments not publishing data in machine-readable format, compared to the costs of the governments – and taxpayers – to do so. (September 30, 2011)
- A clearinghouse of state and local government datasets, in open, machine-readable format. (September 30, 2012) A handbook of data collection processes suitable for six different public records request scenarios. (September 30, 2012)
- Nine installations of the OpenBlock application at community newspapers. (July 2011 to August 2012)
- Scraperscripts, schemas and other contributions to the OpenBlock Project. (April 2011 to August 2013)