Module 3: Data and Data Sources

Learning Objectives

What Counts as Data?

Data can be defined as anything that can be perceived through observation - e.g., symbols, stimuli, events, and properties of objects - but must be transformed or aggregated in some way in order to have meaning [1]. Thus, when we talk about data, we are essentially talking about the components that have potential but have yet to become information (data + meaning). Data scientists, through analyzing large sets of data, create new information. For instance, a list of numbers may represent the prices of cars sold in the year 2024; when we calculate the mean and standard deviation, we now have information about the average price of a car in the year 2024.

stock ticker

Data comes in many shapes and forms. When it comes to research, we often refer to data as “quantitative” (numbers or measurements) or “qualitative” (text, audio, video). These are two major delineations of “data” used by researchers and each come with their own techniques to analyze and create new meaning and understanding of our world. Data can also be classified by how it is organized. It can be structured (e.g., organized in a database with proper metadata) or unstructured (e.g., lacking structure such as a folder of video recordings). Data can also be arranged in massive datasets (big data) or come in small quantities. These attributes help determine what information can be derived from the data.

When teaching others about data, we might find it helpful to show some examples (e.g., datasets) and contrast this with other forms of representation like information. One could, for instance, show how individual words contain very little information by themselves (e.g., “this” and “is” and “a” and “sentence”) but carry significantly more information when combined together. One can similarly show how pervasive statistics are in society and discuss how individual data points (e.g., a single perspective of one individual), arranged on large scales (thousands of people), lead to the creation of these statistics (e.g., polling data).

Discussing Data Sources

The ability to effectively use data begins with developing an understanding of where data comes from and why it was initially collected. Looking specifically at the higher education context, our data may come from several sources, including internal data like enrollment trends, course evaluations, student success metrics, and library usage statistics. Often, we are aware this data exists and may even use it in some limited cases as needed for reporting and advocacy but do not nearly tap into the full potential of what this data can tell us [2]. Additionally, we might find meaningful data from the U.S. Census Bureau, the National Center for Education Statistics, or our local health departments. These are valuable sources of data for faculty, staff, and administrators.

student using computer

Our research faculty and students may be seeking data from places like data repositories or creating their own data through empirical research. Survey responses, interview transcripts, and citation numbers are all research data. Additionally, all members of the campus community are interacting with data in the form of statistics and visualizations - in news stories, YouTube and Tiktok videos, and in their textbooks.

Regardless of the source of the data, there are some common questions that should be asked: Who collected the data? For what purpose? How was it collected, and is the methodology transparent? Who or what might be excluded from the dataset? The answers to these questions are necessary to build trust in the data and avoid misinterpretation. In our data literacy activities, we can have students answer these questions for themselves, perhaps by selecting a source of data available online and then researching how and why this data was created and what the creator says about its meaning.

Evaluating Data Quality

It is also important that members of our campus community understand that not all data is equally trustworthy or useful, which is why the ability to evaluate data quality is a fundamental concept of data literacy [3]. Some of the questions we want to ask when evaluating data quality are:

person analyzing data visualizations

If we have survey data that purports to tell us whether a student is data literate but none of the questions in the survey actually asked about data or left out a major component of data literacy like data quality, then the quality will be quite poor. Conclusions drawn from the data will likely be inaccurate. We must encourage learners to consider and answer these questions each time they interact with data to avoid fatal errors. A bonus of developing these skills is that it not only benefits data literacy but information literacy as well to critically evaluate sources and quality.

Case Example

Colgate University Libraries has created a research guide for Data Sources and Tools, which provides a guide to common data sources used by members of their campus community. They also offer a series of tips and guidance on how to evaluate these sources and translate them into effective research or teaching tools. This research guide is a great resource for providing the campus with information about data sources (as well as access to said sources) in a format that does not require constant monitoring and updates. You can simply create a guide like this one and then share the link at presentations and events - or you can go further and link it out to other resources to build a data literacy toolkit of your own for your school.

Reflection Activity

In a notebook or digital document, reflect on:

Summary

Understand what data is, the sources from which it might originate, and how to evaluate its quality are foundational skills for data literacy. By helping members of your campus community reflect critically about data sources and their limitations, you can empower them to make more thoughtful, evidence-based decisions. Teaching these concepts is much more closely aligned with teaching information literacy than it is advanced data science or statistics - it is uniquely aligned to your skills as an academic librarian!

Additional Resources

References

  1. Rowley, J. (2007). The wisdom hierarchy: representations of the DIKW hierarchy. Journal of information science, 33(2), 163-180.
  2. Romero, C., & Ventura, S. (2010). Educational data mining: a review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (applications and reviews), 40(6), 601-618.
  3. Koltay, T. (2016). Data governance, data literacy and the management of data quality. IFLA journal, 42(4), 303-312.