2. Assessing the Availability of Open Election Data

Understanding when and where to collect open election data will help inform observation plans. Groups should conduct a preliminary assessment of the open election data environment, including what data is likely to be feasible or viable, where it resides, and when it may be available. This assessment should also help organizations determine what datasets may be associated with the election procedures and processes they’ve identified as high priority.

Electoral Processes Associated with Open Election Data

The most valuable information regarding election day processes and environment is still the information that well-trained and deployed election monitors can collect about the integrity of the voting, counting, and tabulation processes. However, there are a number of aspects of the electoral process throughout the pre- and post-election periods that may have accompanying open data that can enhance observation findings. Groups should familiarize themselves with these key categories if they have any questions about what they entail.

Open Election Data: Feasibility and Viability

There will be limits to what election data actually exists, what data is available to the public, and whether that data is useful or usable. In order to maximize time and resources, monitoring groups should only attempt to pursue election data that is both feasible and viable. Feasibility means that monitoring groups can reasonably access the data, or reasonably expect the government to have such data. Viability means that the data is both relevant to your electoral analysis and can accurately be used for analysis.

Open election data is not feasible if:

The data isn’t collected in the first place. For instance, monitoring organizations may want to use ballot qualification data to determine how many candidates from a certain ethnic or religious background were accepted, however if that information is not captured in candidate nomination forms, then it likely doesn’t exist. If data is not being collected and should, then such a dataset can feed into your advocacy plan (covered in Step 4).
The data exists but is not available in the timeframe needed. For instance, if a group wants comprehensive information on electoral disputes a week after the elections but the dispute resolution process is ongoing, then that data is not feasible at that time.
The data is legally protected from the public. Some data, or some fields in a dataset, may be protected by law. This is often the case for personally identifiable data—such as national ID numbers—or for protected individuals such as witnesses, victims of abuse, some security personnel, etc. However, data produced and collected by EMBs is collected and paid for by citizens and should inherently be public data. Additionally, this data is often available publicly during voter registration and election day even if in limited scope. Just as elections and government belong to the people, public data belongs to the people.
Other barriers exist that make the data impossible to reasonably acquire. For instance, if fees associated with accessing the information are prohibitively expensive (which also violates basic principles for open election data).

Open election data is not viable if:

The data is incomplete. While missing data may prevent groups from conducting comprehensive analysis, groups will have to determine what, if any, analysis can occur with the limited data. For example, completeness of data is especially important when there’s a geographic component to the data because missing data for a particular region, state, or district can give the impression that an EMB is biased against a particular party or candidate.
The data is inaccurate or of such poor quality it cannot be trusted. Groups will likely struggle in conducting analysis with data that is inaccurate. There are situations where names and other information is no longer accurate due to digitizing and transcribing processes such as those from non-Latin characters. There are some instances where how the data was collected may be unclear or not well documented. In addition, if the dataset is perceived to be particularly sensitive and the exact process of how it was collected is unclear or inconsistent, then it may not be viable.
There is no government consensus regarding what data is “official.” In some cases, data from different government sources may contradict one another or otherwise not be considered formal or official. If there is no clarity regarding which data holds precedent or is officially used by the government or EMB, then it may not be viable. However, groups should still try to attain as much of this information as possible because comparing multiple datasets can provide insights into the data quality.
It is impossible to receive the data in a format that can be analyzed within the necessary timeframe. This is particularly important when working with files that contain large datasets that must be converted to a machine readable format. There are some formats that may take time or special expertise to convert (such as PDFs) but can be managed with the right resources. However, the process for converting other file formats, such as image files, can be very long and laborious and put groups at risk of not being able to conduct timely analysis.
The data does not enhance your analysis. – Just because some open data exists doesn’t mean that it will lead to fruitful analysis regarding electoral integrity. Always consider the goals of analysis before wasting time acquiring data that will not ultimately be helpful.

Groups should take the time to assess what desired data is actually feasible and viable as they develop their open election data strategy.

Sources

Generally speaking, most open election data will reside within the election management body (EMB). This includes items such as voter registration lists, candidate or party registration information, polling station lists, campaign finance reports, and most other administrative election data. However, there is a chance some data of interest may exist with other government or semi-government institutions. For instance:

Legislative bodies/executive offices – Lawmaking and budgetary bodies may contain valuable data related to the most consolidated and up-to-date legal framework for elections, as well as some budget or procurement data relevant to the electoral process.
Judicial bodies – Election tribunals, appeals courts, and regular judiciary bodies will likely be important for election complaint and dispute resolution data, and may also be relevant to other election charges (for instance, in ballot qualification appeals or challenges to aspects of the legal framework or election boundaries).
Census/statistics bureaus – Departments that work with population data, such as census or statistics bureaus, can provide information on vital statistics, population, and demographic information.
Interior or land ministries/geological surveys – Departments that deal with geopolitical and spatial information, such as an interior or land ministry, may have information, like official maps, shape files, or GPS data, that groups may consider using in their analysis.
Peace and security actors – Security forces may have information regarding electoral violence incidents, campaign or election violations, or other information related to electoral security and criminal election offenses.
Third party vendors – Occasionally governments and EMBs may subcontract out management of some election data to third party vendors. While governments should, in theory, be able to access and disseminate this data from their contractors at any time, there may be situations where it is easier to work directly with vendors to acquire data.

Open-Source Spatial Tools

Geopolitical information, such as information on administrative boundaries (like prefectures, counties, and states) and electoral boundaries (electoral districts with associated elected representatives) should be retrieved from official government sources. However other helpful analytical tools, such as GPS information, shape files, and mapping software, can often be obtained for free from other sources that are open-source, proprietary and cloud-based. For example, adding GPS coordinates to voter registration centers may help identify geographic gaps and be correlated with lower rates of registration in some districts.

Every government is different so keep in mind any other local or national institutions that could have relevant information. For instance, in some places voter registries are linked to civil registries – like national ID services – and those bodies may contain significant data relevant to a group’s needs.

Timing

Not all open data will be available at the same time, and the electoral calendar established by the EMB will have a substantial role to play in when organizations can reasonably expect data to be released. For instance, it is unreasonable to expect the EMB to have a polling station list available a year before election day. Polling station lists are often drawn up or modified following voter registration exercises to ensure polling stations are well distributed among the voting population.

Some data may be relatively static (remains unchanged after it is generated) throughout the electoral process, such as census data or spatial data. Therefore some static data should be able to be acquired at any point. In some cases, groups may even be able to use open election data to provide some analysis on parts of the electoral process that they may have otherwise missed. For instance, electoral boundaries may be drawn several years in advance of an election when a group may not yet be ready to conduct analysis. However, because that information is likely to remain static after it is generated, groups can use that information and comment on the process whenever they are ready to conduct pre-election analysis.

Groups should consider and plan on when other electoral data is likely to be generated. In addition, groups should consider whether the data is likely to be complete and static after generation (such as a final candidate list following ballot qualification) or variable and requiring regular collection (such as reports of electoral violence as they occur).

Illustrative Datasets, Sources, and Timing Considerations

Groups should always consider the broad range of data available to them that can support their observation priorities. See Illustrative Datasets, Sources and Timing Considerations in the Supplementary Materials for the key electoral categories associated with open election data, with datasets that may be relevant to them. This list is illustrative, as many datasets and sources will be unique to the country or political context.

Versatile Datasets

As you can see from the table, some data sources can be used for analysis of multiple electoral processes. Such datasets may be considered a high priority to capture because they serve multiple purposes. For instance, a set of consolidated election laws and amendments and EMB regulations can help the analysis of all major parts of the electoral process. Population and demographic data—such as a census, demographic or social surveys, development reports, etc.—can provide important context and baselines for assessing access and participation issues. Voter registration lists and statistics can be used to analyze several electoral processes. In addition, most of these datasets are typically publicly available and don’t tend to change after they’re produced. The chart below highlights some of the more common sets of data that can support analysis in multiple electoral categories.

Dataset	Relevant Election Processes for Analysis
Consolidated electoral laws and amendments, and EMB regulations	Legal framework Electoral boundaries EMB administration EMB processes Election security Political party registration Ballot qualification Election campaign Campaign finance Voter registration Voter list Voter education Polling stations Election results Electronic voting/counting
Population and demographic data	Electoral boundaries Voter registration Voter list Voter education Election results
Voter registration list and statistics	Electoral boundaries Voter registration Voter list Voter education Election results
Maps and spatial data	Electoral boundaries Voter registration Polling stations Election results
EMB procurement data and budget	EMB processes EMB administration Electronic voting/counting

Matching Observation Priorities with Open Data Availability

Open election data should ideally meet the nine principles for open data, including being available for free on the internet. In these cases, groups should be able to access data easily and relatively effortlessly. However, sometimes data does not meet all of these criteria. For instance, data may not be available in as timely a manner as it should, may not include the level of granular detail necessary, or may be in a format that is not computer analyzable such as a PDF. While reviewing each election data priority, groups should consider:

Is this data both feasible and viable?
What data is immediately available and accessible, such as available for free on the internet? Or, if it’s too early in the process, what data do you believe will be easily available and accessible?
Can you use the current format available to conduct your analysis, or will the format need to be changed?
Is it granular enough for substantive analysis?
What, if any, datasets will you need to request from government institutions? What, if any, alterations to pre-existing datasets – such as formatting changes – will you need to request from government institutions?

Open Data and Cross-Cutting Issues

While groups should try to prioritize their observation and data collection plan based on major election functions, it’s important to keep in mind cross-cutting issues that are relevant throughout the electoral process, especially the participation of women, youth, minorities, and vulnerable groups.

The use of open data can be particularly helpful in collecting information regarding women and other marginalized populations in many key election procedures. For instance, voter registration and turnout data, candidate information, and EMB staff profiles can and should be disaggregated by gender and other demographic information when possible. In addition, the distribution of polling stations, electoral boundaries, or security forces can be overlaid with demographic information to identify any trends that could impact certain populations.

Take only the top eight election categories as identified by the risk assessment and review the feasibility, viability, and compliance with open data principles for each category. Answer the questions included on the assessment forms for each category.