A dataset, alternatively spelled as ‘data set’, encompasses raw statistics and information derived from research studies. These datasets, often produced by government agencies or non-profit organizations, are typically available for free download. However, datasets originating from for-profit entities may require payment for access.
Locating datasets typically involves identifying the agency or organization specializing in the desired research area. For instance, for insights into public opinion on social issues, the Pew Research Center is a reputable source. Similarly, for population-related data, the U.S. government’s Population Estimates Program via American FactFinder serves as a reliable resource.
The concept of “open data” is gaining traction globally, advocating for the unrestricted accessibility of data. Governments and businesses alike are embracing this philosophy, spurred by initiatives led by entities such as the Open Knowledge Foundation. The Open Data Handbook serves as a comprehensive resource for delving deeper into this movement. Furthermore, the emergence of “Big Data” and data visualization techniques is revolutionizing data analysis, enabling the exploration of vast datasets for novel perspectives and insights.
For those unsure of where to commence their search, here are some recommended starting points:
Site | Structure | Source Type | Topics |
Data.gov | Repository | Public | U.S. Environment, Climate, Health, Government |
DataPlanet | Repository | Public | Multidisciplinary |
Dept. of Education | Website | Public | Education, Educational Institutions |
Google Dataset Search | Search Engine | Public | Multidisciplinary |
Harvard Dataverse | Repository | Public | Multidisciplinary, *Social Sciences |
Healthdata.gov | Repository | Public | Health, Healthcare |
ICPSR | Repository | 3rd Party | Multidisciplinary, *Social Sciences |
NCES | Repository | Public | Education, Educational Institutions |
Pew Research Center | Website | Public | Social Science Demographics, Trends |
Re3 | Registry of Repositories | Public | n/a |
Statista (*contains mostly aggregated data, raw data may be available through clicking on “source link”) | Database | Subscription (provided by NU Library) | All, *Business |
- The Evolution of Big Data, and Where We’re Headed
- Data Visualization and Infographics
For further guidance on locating statistics, refer to our dedicated Statistics page.
Subject Specific and Additional Dataset Resources
Business
- Damodaran Online: Corporate Finance and Valuation
Access corporate finance and valuation resources curated by Dr. Aswath Damodaran from NYU’s Stern School of Business. - IMF DataMapper
Gain access to the International Monetary Fund’s extensive fiscal rules dataset spanning from 1985 to 2013, along with a wealth of other data and statistics. - National Longitudinal Surveys
Delve into longitudinal survey data provided by the Bureau of Labor Statistics, offering valuable insights into economic trends and labor market dynamics. - Organization for Economic Co-Operation and Development Data
Tap into economic data and statistics from the OECD, covering a wide range of topics including international trade, employment, and economic development. - Quandl
Explore Quandl’s extensive collection of time-series numerical data, specializing in economics, finance, markets, and energy. Utilize their step-by-step wizard for streamlined data retrieval. - Statistical Abstract of the United States (2012)
Access banking, finance, insurance, and business enterprise data compiled in the 2012 edition of the Statistical Abstract of the United States. - Surveys of Consumers
Access consumer survey data compiled by Thomson Reuters in collaboration with the University of Michigan, offering insights into consumer sentiment and behavior. - U.S. Bureau of Economic Data
Access economic data and statistics provided by the U.S. Bureau of Economic Analysis, offering comprehensive insights into the nation’s economy. - Mergent Online
Access financial records, country and industry reports, and news articles on recent mergers and acquisitions. Searchable by company name, country, number of employees, and more, with up to 15 years of historical data.
These resources offer a wealth of information for business research, analysis, and decision-making.
Computer science
- ACM (Association for Computing Machinery)
ACM provides a comprehensive research, discovery, and networking platform for computer science professionals. Its database offers access to a wide range of scholarly resources, including journals, conference proceedings, technical magazines, newsletters, and books. Features include author listings, dataset search filters, and sorting options by citation count, facilitating efficient exploration of relevant content. - IEEE (Institute of Electrical and Electronics Engineers)
IEEE offers access to full-text peer-reviewed journals, transactions, magazines, conference proceedings, and published standards covering electrical engineering, computer science, and electronics. It also provides access to the IEEE Standards Dictionary Online, enabling users to stay abreast of current industry trends. Similar to the ACM database, IEEE’s platform includes a dataset search function, enhancing the accessibility of valuable research data. - Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of disciplines, including computer science. It provides a convenient way to search for academic papers, theses, books, conference proceedings, and technical reports. Google Scholar’s citation metrics and related articles feature further aid in exploring relevant research topics. - arXiv
arXiv is a preprint repository that hosts research papers in various scientific disciplines, including computer science. It offers access to a vast collection of manuscripts covering topics such as artificial intelligence, machine learning, cryptography, and more. Researchers can discover cutting-edge research, engage in discussions, and access papers before they are formally published in peer-reviewed journals. - Scopus
Scopus is a comprehensive abstract and citation database covering a wide range of academic disciplines, including computer science. It provides access to peer-reviewed literature, conference proceedings, patents, and more. With powerful search and analytical tools, Scopus enables researchers to track citations, identify emerging trends, and discover potential collaborators. - DBLP (Digital Bibliography & Library Project)
DBLP is a computer science bibliography database that indexes scholarly articles, conference papers, and proceedings from major computer science conferences and journals. It offers a convenient way to explore publications by author, venue, or topic, making it an invaluable resource for researchers, students, and practitioners in the field.
Education
These resources offer a wealth of information and tools for computer science professionals, facilitating research, collaboration, and innovation in the ever-evolving field of computing.
- Barro-Lee Dataset
Access datasets from the article by Barro and Lee, providing information on educational attainment worldwide from 1950 to 2010. This dataset offers valuable insights into global education trends and patterns over time. - Child Care and Early Education Research Connections
Explore datasets curated by the National Center for Education Statistics (NCES) focusing on child care and early education. These datasets enable researchers to analyze trends and outcomes in early childhood education and development. - Education Data.gov
Dive into educational datasets available on Data.gov, providing access to a wide range of educational data from various government agencies. These datasets cover topics such as school performance, student demographics, funding, and more. - Higher Education General Information Survey (HEGIS) Series
Access datasets from the HEGIS series, offering comprehensive information on higher education institutions, students, faculty, and finances. These datasets are valuable for analyzing trends and patterns in higher education. - Integrated Postsecondary Education Data System (IPEDS)
Explore datasets from IPEDS, a comprehensive source of data on postsecondary education institutions in the United States. These datasets cover enrollment, graduation rates, finances, and other key metrics for colleges and universities. - National Center for Education Statistics (NCES)
Access datasets and statistical reports from NCES, providing a wealth of information on education in the United States. These datasets cover K-12 education, postsecondary education, literacy rates, educational attainment, and more. - Statistical Abstract of the United States (2012): Education
Explore educational datasets from the Statistical Abstract of the United States, offering a wide range of statistics and data on education topics such as enrollment, expenditures, and educational attainment. - U.K. Department of Education Datasets
Access datasets from the U.K. Department of Education, providing insights into educational trends, policies, and outcomes in the United Kingdom. These datasets cover areas such as school performance, funding, and student demographics.
These resources offer valuable datasets and information for education researchers, policymakers, and practitioners, facilitating analysis, decision-making, and innovation in the field of education.
Psychology
- American Psychological Association (APA)
Access links to datasets and repositories curated by the APA, providing researchers with a wealth of resources for psychological research. These datasets cover various topics in psychology, including mental health, cognition, and behavior. - Children Born to Unwed Parents between 1998-2000 (Princeton)
Explore datasets related to children born to unwed parents between 1998 and 2000, offering insights into family dynamics, child development, and social outcomes. - Childstats.gov
Access datasets and resources from the Forum on Child and Family Statistics, offering comprehensive data on children and families in the United States. These datasets cover areas such as health, education, and socio-economic status. - Gender & Achievement Research Program
Access datasets related to gender and achievement research, providing insights into factors influencing academic performance and achievement disparities between genders. - The Kinsey Institute Data Archives
Explore datasets from The Kinsey Institute, offering valuable data on human sexuality, relationships, and behavior. These datasets are useful for researchers studying sexual health, intimacy, and social attitudes. - National Archive of Criminal Justice Data
Access datasets related to criminal justice and forensic psychology research, providing information on crime rates, law enforcement, and criminal behavior. These datasets enable researchers to analyze trends and patterns in crime and justice systems. - National Data Archive on Child Abuse and Neglect
Access datasets and resources related to child abuse and neglect research, offering valuable information on risk factors, interventions, and outcomes for children and families affected by abuse. - National Longitudinal Study of Adolescent Health (Add Health)
Access longitudinal datasets on adolescent health and development, providing insights into the social, behavioral, and health outcomes of adolescents over time. - Neuroscience Information Framework (NIF) Data Federation
Access neuroscience datasets and resources through the NIF Data Federation, offering a comprehensive collection of data on brain structure, function, and disorders. These datasets are valuable for researchers studying neuroscience and neuropsychology. - Substance Abuse and Mental Health Data Archive (SAMHA)
Access datasets related to substance abuse and mental health research, providing information on prevalence, treatment outcomes, and risk factors for substance use disorders and mental illnesses.
These resources offer a diverse range of datasets and repositories for psychology researchers, facilitating data-driven analysis and advancements in the field of psychology.
Public opinion/surveys
Explore a variety of resources for accessing public opinion surveys and datasets:
- Gallup.com
Access global datasets on important social issues, financial behavior, and literacy from people around the world. Gallup.com offers comprehensive insights into public opinion trends and attitudes. - General Social Survey (GSS)
Conducted on American society since 1972, the GSS provides valuable insights into social trends and attitudes. Datasets are available in SPSS and STATA formats, with additional options for analysis. - International Social Survey Programme (ISSP)
Affiliated with the GSS, the ISSP has been conducting surveys since 1980, offering comparative insights into international social trends and attitudes. - The Latin American Databank
Provides access to Latin American datasets acquired, processed, and archived by the Roper Center for Public Opinion Research. Users can browse data by country or decade, with keyword search options available. - Pew Research Center
Download datasets from the Pew Research Center’s main projects, covering a wide range of social and political topics. Free registration is required to access the datasets. - Roper Center Public Opinion Archives
Access over 20,000 datasets spanning from 1935 to the present, offering a rich collection of public opinion data. Users can set up RSS feeds for updates on new datasets. - World Values Survey
Download datasets from surveys dating back to 1981 in SPSS, SAS, and STATA formats. The World Values Survey provides insights into global values, beliefs, and attitudes across different cultures and societies.
These resources offer a wealth of public opinion surveys and datasets, allowing researchers to analyze trends, attitudes, and social behaviors across various regions and time periods.
Social sciences
Discover a plethora of resources for social science data and research:
- Consortium of European Social Science Data Archives (CESSDA)
CESSDA offers access to a wide range of social science data archives across Europe. Researchers can explore diverse datasets covering various topics and regions within the European context. - Gapminder
Gapminder, a non-profit organization, provides access to over 500 demographic indicators from reputable sources like the World Bank and Lancet. Users can download data in Excel format and utilize visualization tools to gain insights into global trends. - Inter-university Consortium for Political and Social Research (ICPSR)
ICPSR hosts one of the largest collections of social and behavioral research data. Researchers can access datasets in formats such as SPSS, SAS, and CSV, covering a broad spectrum of topics. - National Archive on Criminal Justice Data
Access datasets related to criminal justice research, allowing scholars to analyze trends and patterns in crime and law enforcement. - National Center for Health Statistics (NCHS)
NCHS provides extensive tutorials to assist researchers in incorporating health-related data into their studies. Explore a wealth of health statistics and demographic information for informed research. - The Odum Institute Dataverse
Hosted by the University of North Carolina Chapel Hill, this repository offers access to a diverse range of social science datasets contributed by researchers worldwide. - U.S. Department of Housing and Urban Development (HUD)
Access housing and housing market data provided by the U.S. government, enabling researchers to analyze trends in housing affordability, homelessness, and urban development. - U.K. Data Service
Sponsored by the U.K. Economic & Social Research Council (ESRC), the U.K. Data Service offers a comprehensive collection of social science datasets for researchers interested in British demographics and social trends. - Association of Religion Archives
Explore datasets related to religious studies, providing insights into religious demographics, practices, and beliefs. - U.S. Bureau of Labor Statistics
Access economic and labor market data from the U.S. government, facilitating research on employment trends, wage disparities, and workforce demographics. - U.S. Census Bureau
Dive into population demographics provided by the U.S. Census Bureau, offering valuable insights into the composition and characteristics of American society.
These resources cater to the diverse needs of social science researchers, offering access to a wide array of datasets and statistics for rigorous analysis and informed decision-making.
Social Media or Community Driven Datasets
Explore a variety of social media and community-driven datasets to enhance your research:
- Guardian (UK) Datablog
The Guardian’s Datablog offers a wealth of datasets curated from various sources, providing insights into a wide range of social, economic, and political issues. Researchers can access credible data for analysis and exploration. - Kaggle
Kaggle serves as a third-party, multi-disciplinary crowd-sourcing platform, hosting diverse datasets contributed by individuals and organizations worldwide. Researchers can evaluate the credibility of data provided by institutions not affiliated with academic or professional institutions, enabling collaborative data-driven projects and competitions. - Social Computing Data Repository
Arizona State University’s Social Computing Data Repository offers downloadable datasets from popular social networks such as Twitter, FourSquare, and YouTube. Researchers can analyze social media interactions, trends, and user behavior to gain insights into online communities and communication patterns. - Stanford Large Network Dataset Collection
Stanford University’s collection features datasets from social networks, online reviews, and other online platforms. Researchers can explore network structures, user interactions, and content analysis to understand the dynamics of online communities and information dissemination.
These resources provide valuable datasets for studying social media dynamics, community interactions, and online behavior, enabling researchers to delve into the complexities of digital society and communication.
Additional Dataset Resources
Discover more valuable resources for accessing datasets:
- Registry of Open Data on AWS
The Registry of Open Data on AWS provides access to a diverse range of datasets hosted on the Amazon Web Services (AWS) platform. Notable datasets include the NASA Nex Project, offering satellite imagery and Earth observation data, and the 1000 Genome Project, which provides genomic information from human populations worldwide. Researchers can leverage these datasets for various applications, including environmental monitoring, genetics research, and data-driven decision-making. - Figshare
Figshare is a versatile third-party, multi-disciplinary repository that hosts a vast collection of datasets from researchers and institutions worldwide. Users can search for datasets by keyword or browse by subject area, making it easy to find relevant data for their research projects. With datasets covering diverse fields such as biology, physics, social sciences, and more, Figshare serves as a valuable resource for accessing and sharing research data across disciplines.
These additional dataset resources expand the range of options available to researchers, offering access to high-quality data from various sources and domains. Whether seeking satellite imagery, genomic data, or multidisciplinary datasets, researchers can utilize these platforms to enhance their research endeavors and uncover valuable insights in their respective fields.
Large datasets
- Africa Open Data
Access over 900 datasets from countries across Africa, covering diverse topics and regions. File formats include csv, zip, and shapefile (shp), suitable for use with Geographic Information System (GIS) software. Researchers can leverage these datasets for various purposes, including socioeconomic analysis, environmental monitoring, and urban planning. - American Fact Finder
Operated by the US Census Bureau, American Fact Finder offers datasets from censuses and surveys conducted by the Bureau. Researchers can explore demographic, economic, and social data at various geographic levels, enabling in-depth analysis and understanding of population trends and characteristics in the United States. - Data.gov
Dive into the vast repository of over 90,000 datasets provided by the U.S. government through Data.gov. This platform serves as a gateway for discovering and accessing government data across diverse domains, empowering researchers, policymakers, and the public to explore and utilize valuable information for research, analysis, and decision-making. - Data.gov.uk
Search through more than 17,000 datasets from the government of the United Kingdom via Data.gov.uk. Users can filter search results by theme, format, and publisher, facilitating efficient discovery of relevant datasets for various research and analytical purposes. - European Union Open Data Portal
Access data produced by EU member institutions through the European Union Open Data Portal. The portal features a wide range of datasets, with options to download in pdf or zip formats. Researchers can explore datasets on topics ranging from economics and finance to social issues and environmental sustainability. - National Digital Archive of Datasets
Delve into datasets spanning from 1997 to 2010 from the U.K. National Archives. Fully searchable and available in multiple formats including html, csv, and xls, these datasets offer valuable historical insights and trends across various sectors and disciplines. - Open Data Canada
Search and download datasets in different formats including csv, xml, and zip from Open Data Canada. Featured datasets cover a wide range of categories, providing researchers with valuable information for analysis and decision-making in areas such as demographics, transportation, and government expenditures. - United Nations Data
Gain access to data and statistics for UN-supported projects, including the Monthly Bulletin of Statistics, through the United Nations Data portal. Researchers can utilize these datasets to explore global trends, monitor progress towards development goals, and conduct cross-national comparative analysis. - UN Statistical Databases
Explore a directory of UN statistical databases provided by the United Nations Dag Hammarskjöld Library. These databases offer a wealth of statistical information covering various aspects of global development, social progress, and economic indicators. - World Bank
Browse and search datasets across a wide range of indicators and categories through the World Bank. From basic demographic data to advanced economic indicators, researchers can access and download datasets to support research, policy analysis, and development initiatives. Additionally, users can utilize the World Bank Databank tutorial to enhance their understanding of dataset navigation and utilization.
Searchable sites
- Datacatalogs.org
Discover a comprehensive collection of open data catalogs worldwide, encompassing both governmental and non-governmental sources. Browse or search through a vast array of datasets to find valuable information for your research. - Datacite
Access a repository of openly available datasets accessible online. Easily navigate through datasets by subjects, publishers, and descriptions, with direct links to dataset homepages for further exploration. - Dryad
A meticulously curated resource dedicated to making research data openly discoverable, reusable, and citable. With a broad spectrum of data types, Dryad serves as a reliable repository for diverse research datasets. - Google Public Data
Utilize Google’s freely available tool to explore a wide range of public datasets. Discover, import, save, and link datasets effortlessly, leveraging powerful tools to enhance your research endeavors. - Harvard Dataverse Network
Access an extensive network housing over 50,000 research studies, providing researchers with an open platform to share and access valuable scientific data. - Qualitative Data Repository
A specialized archive dedicated to storing and sharing digital data generated through qualitative and multi-method research in social sciences and related disciplines. Access accompanying documentation alongside qualitative data for comprehensive research exploration. - Figshare
Figshare serves as a versatile repository where researchers can share their research outputs in a citable, shareable, and discoverable manner, fostering collaboration and knowledge dissemination. - Re3 Data
A global registry of research data repositories spanning various academic disciplines. Promoting a culture of sharing and increased access to research data, Re3 Data facilitates permanent storage and accessibility of datasets for researchers, funding bodies, publishers, and scholarly institutions. Explore the registry to enhance visibility and access to valuable research data across disciplines.
Datasets for Learning Purposes
- Kaggle
Kaggle, a for-profit company, offers data forecasting services for the energy industry and hosts predictive modeling competitions. Engage with a team to participate in competitions and challenge yourselves to hone your data analysis skills. - SAGE Research Methods: Datasets
This resource provides practical guides to data analysis, featuring peer-reviewed datasets and tools for data management. Ideal for learning and practicing data analysis techniques, including data cleaning and normalization. - Sociology Data Set Server
Access datasets from St. Joseph’s University’s Department of Sociology, offering valuable resources for sociological research and analysis. - SPSS Data Page
East Carolina University’s Department of Psychology, under Dr. Karl L. Wuensch, offers datasets suitable for SPSS analysis, providing valuable resources for psychology students and researchers. - SPSS Data Sets
Butler University’s Department of Psychology, led by Dr. Roger J. Padgett, offers SPSS datasets for educational purposes, aiding students in gaining practical experience in statistical analysis. - Statistical Reference Datasets
The National Institute for Standards & Technology provides reference datasets for statistical analysis, offering reliable resources for research and educational purposes. - Statistics for Psychology
Explore datasets from the University of Bath’s Department of Psychology, curated by Dr. Ian Walker, to enhance understanding and application of statistical methods in psychology research. - Teaching with Data
While not offering downloadable datasets, this platform offers excellent resources for locating datasets and utilizing data in educational settings, providing valuable guidance for educators and students alike. - UCI Machine Learning Repository
Primarily focused on computer sciences, this repository also offers social sciences datasets. Each dataset includes cited references, making it a valuable resource for interdisciplinary research and learning. - V7 Open Datasets
Access over 500 high-quality datasets through this open-access searchable platform, offering diverse datasets suitable for learning, research, and analysis across various fields.
Tools for Data Analysis
- National Map The National Map website offers datasets showcasing U.S. government data through various map tools. These include The National Atlas of the United States, U.S. Topo, Historical Topographic Map Collection, and the National Map Viewer, providing comprehensive geographic insights.
- Nesstar (Norwegian Social Science Data Services) Nesstar is an open-access, web-based tool designed for publishing and analyzing data, particularly in the social sciences. Its user-friendly interface facilitates data exploration and analysis, making it a valuable resource for researchers and analysts.
- OpenRefine Formerly known as Google Refine, OpenRefine is a versatile, free tool tailored for intermediate to advanced users dealing with large datasets. It offers multiple options for data cleaning, transformation, and reconciliation, empowering users to efficiently manage and preprocess their data.
- Social Explorer Social Explorer empowers users to manipulate demographic and economic data from various sources, enabling them to create customized maps, interactive visualizations, and more. While the limited free version provides access to data from the 2000 US Census, the platform offers subscription-based plans for broader data access.
- Statwing Statwing offers a limited free version for data analysis and visualization, catering to users seeking to gain insights from their datasets. It provides intuitive tools for exploring and visualizing data, simplifying the analysis process for users across different skill levels.
- Tableau Public Tableau Public is a free, robust tool for visualizing data in diverse design options. With its user-friendly interface and powerful visualization capabilities, Tableau Public allows users to create interactive and engaging visualizations, making it ideal for sharing insights with a broader audience.
These tools offer a range of functionalities to suit different analytical needs, empowering users to extract valuable insights and unlock the potential of their datasets.
Health Dataset Sites
Diseases
- BRFSS (Behavioral Risk Factor Surveillance System) BRFSS conducts health-related telephone surveys to gather state-level data on various aspects of U.S. residents’ health. It focuses on risk behaviors, chronic health conditions, and preventive service utilization, providing insights crucial for public health interventions and policies.
- CDC (Centers for Disease Control and Prevention) Data The CDC offers a wealth of statistics covering major diseases, enabling researchers and healthcare professionals to analyze trends, assess prevalence rates, and identify emerging health threats. These datasets serve as foundational resources for epidemiological research and public health decision-making.
- CDC Wonder (Wide-ranging Online Data for Epidemiologic Research) CDC Wonder is a comprehensive platform providing access to a wide array of data on diseases, mortality, and prevention measures. It offers a user-friendly interface for querying and retrieving information, making it a valuable tool for conducting epidemiological studies and surveillance activities.
- World Health Organization (WHO) Global Health Observatory (GHO) The WHO GHO provides a vast repository of global health data, including statistics on diseases, health systems, and risk factors. With its extensive collection of datasets and indicators, it serves as a valuable resource for monitoring health trends worldwide and informing evidence-based decision-making.
- National Institutes of Health (NIH) Data Sharing Repositories NIH hosts several data sharing repositories focused on specific diseases and research areas, such as cancer, infectious diseases, and mental health. These repositories facilitate collaboration, data reuse, and scientific discovery by providing access to curated datasets and related resources.
- Global Burden of Disease Study (GBD) The GBD study, conducted by the Institute for Health Metrics and Evaluation (IHME), produces comprehensive estimates of global disease burden, mortality, and disability. Its datasets offer insights into the leading causes of morbidity and mortality worldwide, informing global health priorities and policies.
By leveraging these datasets and resources, researchers, policymakers, and healthcare professionals can gain deeper insights into disease patterns, risk factors, and health outcomes, ultimately contributing to improved prevention, diagnosis, and treatment strategies.
Hospitals and spending
- American Hospital Association (AHA) The AHA conducts an annual survey covering hospitals across the United States. This survey provides comprehensive data on various aspects, including the number of government hospitals, bed counts, and other vital metrics essential for understanding the landscape of healthcare facilities.
- **American
Hospital Directory® (AHD) The American Hospital Directory® offers a wealth of data, statistics, and analytics for over 7,000 hospitals nationwide. It aggregates information from diverse sources such as Medicare claims data, hospital cost reports, and commercial licensors. While AHD.com® is not affiliated with the AHA, its data are evidence-based and provide valuable insights into hospital operations and spending.
- Agency for Healthcare Research and Quality (AHRQ) AHRQ’s Healthcare Cost and Utilization Project (HCUP) offers HCUPnet, a free online query system. HCUPnet provides access to a wide range of health care statistics, including data on hospital inpatient, emergency department, and ambulatory care settings. Additionally, it offers population-based health care data at the county level, facilitating in-depth analyses of hospital utilization and spending patterns.
- HCUPnet HCUPnet, based on data from the Healthcare Cost and Utilization Project (HCUP), offers an online query system for accessing health care statistics. Users can retrieve data on hospital inpatient, emergency department, and ambulatory care settings, as well as population-based health care data for counties. This platform serves as a valuable resource for researchers, policymakers, and healthcare professionals seeking insights into hospital utilization and spending trends.
- Medicare Payment Data The Centers for Medicare & Medicaid Services (CMS) provides access to Medicare payment data, offering insights into hospital spending patterns and reimbursement rates. Researchers can analyze Medicare claims data to understand payment trends, healthcare utilization, and costs associated with hospital services.
- State Health Departments Many state health departments collect and publish data on hospitals within their jurisdictions. These datasets may include information on hospital finances, utilization rates, quality measures, and more. By accessing state-level data, stakeholders can gain insights into regional variations in hospital spending and performance.
- Healthcare Financial Management Association (HFMA) HFMA offers resources and publications related to healthcare finance, including data and insights on hospital spending trends. Their research reports, surveys, and educational materials provide valuable information for healthcare finance professionals and policymakers seeking to understand and address financial challenges facing hospitals.
By utilizing these diverse sources of data and statistics, stakeholders can gain a comprehensive understanding of hospitals and hospital spending, enabling informed decision-making, policy development, and resource allocation in the healthcare sector.
Medicaid and Medicare
- CMS.gov The Centers for Medicare & Medicaid Services (CMS), a division of the Department of Health and Human Services (HHS), offers a comprehensive repository of research, data, and statistics on Medicare and Medicaid. Through CMS.gov, stakeholders can access a wealth of information on program enrollment, utilization, expenditures, quality measures, and more. Researchers, policymakers, and healthcare professionals rely on CMS data to inform policy decisions, evaluate program performance, and drive quality improvement initiatives in Medicare and Medicaid.
- Medicare.gov Medicare.gov provides valuable resources for comparing the quality of care provided by hospitals and other healthcare facilities. Through tools such as Hospital Compare, users can access data on hospital performance indicators, including mortality rates, readmission rates, patient experience ratings, and adherence to clinical guidelines. This information empowers patients, caregivers, and healthcare providers to make informed decisions about healthcare services and choose high-quality providers.
- Medicaid.gov Medicaid.gov serves as the official website for the Medicaid program, offering a wealth of information on program eligibility, benefits, financing, and enrollment. While the website primarily focuses on program administration and policy guidance, it also provides access to research reports, data briefs, and statistical resources related to Medicaid. Stakeholders can leverage these resources to better understand Medicaid’s role in providing healthcare coverage to low-income and vulnerable populations and assess program performance at the state and national levels.
- State Medicaid Websites Each state operates its own Medicaid program, and many states maintain dedicated websites that offer data and statistics on Medicaid enrollment, expenditures, and health outcomes. These state-specific resources provide valuable insights into the implementation and impact of Medicaid programs within individual jurisdictions. Researchers, policymakers, and advocates can utilize state Medicaid data to conduct comparative analyses, evaluate program effectiveness, and identify areas for improvement in healthcare delivery and coverage.
- Medicare Payment Data The CMS also publishes Medicare payment data, which includes information on payments made to healthcare providers and suppliers for services rendered to Medicare beneficiaries. These datasets offer transparency into Medicare spending patterns, reimbursement rates, and utilization trends across different types of healthcare services and provider settings. Researchers and policymakers use Medicare payment data to monitor program integrity, identify potential fraud and abuse, and assess the efficiency and effectiveness of Medicare payment policies.
Multi-topic
- Propublica: Propublica offers a diverse range of healthcare datasets covering various aspects of the healthcare system. These datasets encompass information on Medicare utilization and spending, treatment outcomes, and nursing home quality metrics. Researchers, policymakers, and healthcare professionals can utilize Propublica’s datasets to conduct analyses, monitor healthcare trends, and identify opportunities for quality improvement in the delivery of healthcare services.
- HealthData.gov: HealthData.gov serves as a central repository for over 50 datasets, with a predominant focus on health-related topics, particularly those related to the COVID-19 pandemic. These datasets encompass a wide array of COVID-19 metrics, including case counts, testing data, vaccination rates, and healthcare facility capacity. Researchers, public health officials, and policymakers leverage HealthData.gov to access timely and comprehensive data for tracking the spread of COVID-19, assessing public health interventions, and informing evidence-based decision-making.
- Society of General Internal Medicine (SGIM): The Society of General Internal Medicine offers a curated list of public datasets spanning various topics relevant to internal medicine and healthcare. These datasets cover a broad spectrum of healthcare-related issues, including clinical outcomes, healthcare utilization patterns, health disparities, and patient satisfaction. Researchers and healthcare professionals affiliated with SGIM can access these datasets to support research endeavors, inform clinical practice guidelines, and advance the understanding of internal medicine topics.
- Data.gov: Data.gov is a comprehensive platform that hosts thousands of datasets across multiple domains, including healthcare, environmental science, energy, and public safety. While healthcare datasets on Data.gov cover a broad range of topics beyond COVID-19, they provide valuable insights into healthcare delivery, health outcomes, population health, and healthcare disparities. Researchers, policymakers, and analysts leverage Data.gov to access a wealth of open data for conducting research, developing data-driven policies, and fostering innovation in various sectors.
- CDC Data and Statistics: The Centers for Disease Control and Prevention (CDC) offers an extensive collection of datasets and statistical resources covering a wide range of public health topics. These datasets include epidemiological data, surveillance reports, health behavior surveys, and vital statistics. Healthcare professionals, researchers, and policymakers utilize CDC data and statistics to monitor disease trends, track public health indicators, and inform evidence-based interventions aimed at improving population health and reducing health disparities.
Non-Profit Hospitals
The IRS initiated the Hospital Compliance Project (Project) in May 2006 with the aim of investigating nonprofit hospitals’ adherence to regulations regarding community benefit and executive compensation reporting. As part of this initiative, the IRS distributed a detailed compliance check questionnaire to 544 nonprofit hospitals and meticulously analyzed the responses provided by these institutions.
The primary focus of the Project was to assess how nonprofit hospitals fulfill their obligations concerning community benefit activities and the disclosure of executive compensation. By examining the information provided by these hospitals, the IRS sought to gain insights into the practices and procedures employed by nonprofit healthcare organizations in these critical areas.
Through the comprehensive analysis of the questionnaire responses, the IRS aimed to ensure transparency and accountability within the nonprofit hospital sector. By scrutinizing executive compensation practices and evaluating the extent of community benefit activities, the IRS aimed to uphold regulatory compliance standards and promote the effective utilization of resources within the nonprofit healthcare sector.
The findings of the Hospital Compliance Project are instrumental in guiding regulatory oversight efforts and informing policy decisions aimed at enhancing transparency, accountability, and effectiveness in the nonprofit hospital sector. The insights gained from this initiative contribute to ongoing efforts to strengthen governance practices, improve reporting mechanisms, and ensure that nonprofit hospitals continue to fulfill their vital role in serving their communities.
Healthcare
The CDC’s National Center for Health Statistics (NCHS) provides access to a wealth of data derived from national health surveys. These surveys serve as vital tools for collecting comprehensive health information across various demographic groups and geographic regions within the United States.
Through the NCHS, researchers, policymakers, healthcare professionals, and the general public can access a wide range of health-related datasets, covering diverse topics such as chronic diseases, health behaviors, healthcare access and utilization, vital statistics, and much more.
These datasets play a crucial role in advancing public health research, epidemiological studies, health policy development, and healthcare decision-making. By analyzing and interpreting the data collected through national health surveys, stakeholders can identify health trends, disparities, and emerging issues, as well as evaluate the effectiveness of public health interventions and healthcare programs.
Moreover, the NCHS facilitates the dissemination of health data through various platforms and tools, including online databases, interactive dashboards, statistical reports, and data visualization resources. This enables users to explore, analyze, and interpret health-related information to inform evidence-based decision-making and improve population health outcomes.
Overall, the CDC’s National Center for Health Statistics serves as a critical resource for accessing reliable and comprehensive health data, empowering stakeholders with the information they need to address public health challenges and promote health equity across the nation.
Searching for Datasets Online
Google Dataset Search
Google Dataset Search functions as a specialized search engine designed to explore metadata for millions of datasets scattered across various repositories on the internet. Much like Google Scholar, Dataset Search enables users to locate datasets regardless of their hosting platform, be it a publisher’s website, a digital library, or an individual author’s webpage.
This tool caters to a diverse audience, encompassing researchers, policymakers, journalists, and anyone else in need of scientific, governmental, or journalistic data. By simply inputting relevant keywords or topics, users can sift through the search results and access the desired dataset on the repository provider’s website.
Moreover, Dataset Search offers the convenience of persistent links to datasets, which can be accessed by clicking on the share icon. This feature allows users to easily share or bookmark dataset locations for future reference or collaboration.
When searching for open data pertaining to a specific U.S. state or country, Google can be a valuable tool. By utilizing a search engine and including the keywords “open data” followed by the name of the state or country, individuals can easily locate relevant datasets. For instance, one might search for “open data California” to find datasets related to the state of California.
Additionally, Google can be used to find datasets on specific topics by including relevant keywords such as “raw data” or “datasets” in the search query. For example, if someone is interested in researching barriers to AI adoption, they could search for “barriers to AI adoption raw data” or “barriers to AI adoption datasets”.
Furthermore, Google allows users to search for specific file types, such as Excel files (.xls), which may contain raw data. By including the “filetype:xls” modifier in the search query, individuals can narrow down their search results to Excel documents. For instance, searching for “artificial intelligence filetype:xls” would retrieve Excel files related to artificial intelligence.
In summary, Google provides several effective methods for finding open data and datasets, including searching by state or country, using relevant keywords, and specifying file types. This versatility makes Google a valuable tool for researchers, analysts, and anyone else seeking access to data for analysis and exploration.
Locating an Original Dataset from a Journal Article
The ACM Digital Library serves as a comprehensive research, discovery, and networking platform in the field of computing and technology. This database offers access to a wide range of resources including journals, conference proceedings, technical magazines, newsletters, and books.
Purposefully curated for computing and technology research topics, the ACM Digital Library is an indispensable tool for scholars, researchers, and professionals seeking authoritative information in their respective fields.
One of its notable features is the capability to search for datasets associated with research articles. Here’s how you can locate datasets within the ACM Digital Library:
- Access the ACM Digital Library database from the A-Z Databases List.
- Utilize the search box to enter relevant keyword terms and locate research articles pertaining to your topic.
- Datasets from research articles may be available as Zip files or Txt files. To filter search results, navigate to the left-hand side of the search results page and under “Refine by Publications,” limit your results to Zip or Txt under Content Formats.
4. Ensure that icons for “Artifacts” are included in the search results. Artifacts encompass digital objects integral to the study, such as software systems, scripts used in experiments, input datasets, raw data collected during the experiment, or scripts employed in result analysis.
Look for badges denoting the availability of datasets, such as “Artifacts Available,” “Artifacts Evaluated-Functional,” and “Artifacts Evaluated-Reusable.”
Here is an example search result:
5. To access the artifacts associated with a specific search result, click on the article link, which will direct you to the resource page. Then, navigate to the “Source Material” tab and review the content links under “Linked Artifacts” to access the datasets.
By following these steps, users can effectively locate and access datasets associated with research articles within the ACM Digital Library database, enhancing their research capabilities and facilitating data-driven analysis.
IEEE Xplore Digital Library
IEEE Xplore Digital Library is a comprehensive platform offering full-text peer-reviewed journals, transactions, magazines, conference proceedings, and published standards covering various disciplines such as electrical engineering, computer science, and electronics. It provides users with valuable insights into the latest developments and research in technology-related fields.
One of the standout features of IEEE Xplore is its inclusion of datasets, allowing users to access and explore the raw data used in research articles. This feature enhances the platform’s utility for researchers and scholars interested in analyzing and understanding the underlying data behind published studies.
To make the most of the dataset search feature:
- Access the IEEE Xplore Digital Library from the A-Z Databases List available through your institution’s login portal.
- Utilize the search box to enter relevant keywords related to your research topic.
3. On the search results page, use the filters located on the left-hand side to select “Datasets” under Supplemental Items.
4. Apply the filter to refine your search results, which will now include research articles along with associated datasets.
5. Look for the “dataset icon” alongside search results, indicating the availability of datasets associated with specific articles.