Research Habits and Strategies
- Ask LOTS of questions and regularly examine your assumptions
- Build a foundational understanding of your topic/problem, familiarize yourself with the context(s) within which it is studied, and learn related terminology
- When researching how to address a problem, consider both direct and indirect factors and solutions. For instance, increasing access to food assistance like foodbanks could reduce food insecurity, but what factors lead to a family to need food assistance? Could addressing these factors have more impact?
- Skim relevant literature reviews (found in library databases, PubMed Central, Google Scholar, etc.) to learn what’s known and debated about the topic among scholars
- To find models, search for your topic/problem plus words like model, index, indicator(s), etc.
- Consider the potential bias of the statistics-gathering or model-creating agency
- Strategies for finding datasets
- Some experts recommend defining your problem, variables or units of analysis (e.g., college students), time frame, and location of interest first; however, examining available data may also inform these decisions (note: some locations may be statistical, such as block groups)
- Search/browse relevant data repositories (note: you may be able to search/limit by variable, raw data, etc.; data access may require registration)
- Search sites of related government agencies, trade/industry groups, NGO’s, research centers or institutes at universities, or other organizations likely to gather data on the topic to see if data sets are available (note: it’s rare for private companies to make data available for free)
- Find relevant scholarly research in library databases, Google Scholar, open source databases (e.g., PubMed Central), and elsewhere (note: in PMC, you can limit to articles with “Associated Data” as shown below). Check references for leads to datasets. For more tips about leveraging studies to find health/medical datasets, see this guide from Yale Libraries. You can even search a special database from ICPSR designed to help you “discover data via the literature.”
- Strategies for finding statistics (or using stats as an indirect route to finding data)
- Search for your topic along with words like statistics, data, report, analysis, findings, etc., or possibly surveillance, monitoring, or a unit of analysis related to your topic (e.g., accidents)
- Few or no results? Try synonyms or related terms for a data point you’re seeking (e.g., fatalities, deaths, mortality rate) as well as broader terms for your topic (e.g., crops instead of corn)
- Check the sites of government agencies and university research centers or institutes with a stake in your topic (e.g., bicycle share program) for reports, research, publications, or white papers – sometimes under subtopics like education or advocacy – and skim for factors that could become data points (e.g. showers for cyclists in DC). Then check the references for additional leads.
- Repeat previous step on sites of NGO’s, trade/industry groups (e.g. American Public Transportation Association), advocacy groups, or special interest groups (e.g. International Bicycle Fund)
- You may wish limit results by desired file format (e.g. filetype:xls, filetype:pdf)
Cite your data sources. Check this Quick Guide for citing data or the detailed How to Cite Data guide from librarian Hailey Mooney at Michigan State
Understanding Foundation & Context
Finding Published Literature
Data repositories covering many topics
(from the folks who built the system that runs data.gov)
– data hub that’s part of the Medium ecosystem
(portal for market and consumer data)
Research Tip: If applicable, look for options to search by variable, mathmatical method, etc.
: If a site with historical data is no longer active, you can sometimes find an archival copy on the Internet Archive’s Wayback Machine
or in government archives, such as this BSE Inquiry
site archived by the UK’s National Archives. Since these site copies are not maintained, some links may no longer work. If a site is just temporarily down, you may be able to click the down-arrow next to its title in your Google search results to view the cached page.
: Investigating a food safety issue that a government agency may be tracking? Along with the disease or contaminant, search for words like surveillance or monitoring. Sample result: BSE Surveillance Information Center
(USDA). Such sites may only highlight key points from a more detailed plan; if you find the underlying plan, the references may provide additional leads.
American Community Survey
(ACS) – annual survey; see questions
and why each is asked; searchable by 1-yr, 3-yr, and 5-yr estimates – shorter period includes fewer geo areas
Education and Health
: A solid foundational understanding of your problem/topic, including recon about what data sources others tackling the problem are using can inform your solutions and your data-finding strategies (e.g., Data sources section on p.35 of this global report: A New Model for Water Access
Labor and Trade
– financial data sets (set filter to free to avoid for-fee data)
For industry-specific data, check sites of trade/industry organization sites for sections labeled research, publications, reports, or white papers, possibly under headings like education or advocacy. Sample paper: Myths and Statistics
(Owner-Operator Independent Drivers Association), a trucking-related organization
Politics and Social Science
(PEW) – opinion of US and of China, confidence in American president
How to retrieve block level data from the ACS
The American Community Survey
is one of the richest data sets compiled by the U.S. Census Bureau and is used by local governments, emergency services, and non-profit organizations to anticipate community needs. The most granular level of data is by Census Tract or Block Group. The tutorial below demonstrates how to use the Summary File Retrieval Tool
(in conjunction with the Tech Document related to the Summary File you’re using) to retrieve this CSV formatted data using Excel 2007 (or higher).