Review of Everybody Lies by Seth Stephens-Davidowitz

Everybody Lies

The book Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz is a comprehensive compilation of research based on his PhD dissertation. As a data scientist, he primarily utilizes data from Google searches, supplemented with Facebook and Wikipedia data. The book, published between 2016 and 2018, delves into the vast potential of big data and data science, suggesting that these fields could revolutionize psychology, medicine, and social sciences.

The book Everybody Lies is an amalgamation of research conducted by Seth Stephens-Davidowitz, which serves as an extension of his PhD dissertation. Stephens-Davidowitz, a data scientist, sourced his data primarily from Google searches. Employing a variety of statistical methods not elaborated in his book, the author scrutinized Google search trends to glean insights into diverse aspects of human behavior and demographics, including geographical patterns, gender disparities, and racial distinctions. Although he briefly touches upon Facebook and Wikipedia data, as well as other sources, the book predominantly centers on the utilization of Google data.

Navigating Correlations and Causation

The author highlights some interesting correlations. However, the author seems to fall into the trap of attributing causation when only correlations exist. It leaves us questioning whether this is due to oversight, a perception of readers’ intelligence, or a pursuit of sensational impact over sound scientific practice. For instance, the author draws a correlation between shared friend groups between spouses and divorce but makes a causative claim against couples having shared friends.

Challenging Assumptions

The examination of the assumption that the NBA is predominantly composed of players from poor backgrounds also warrants scrutiny. The evidence provided to support this belief seems limited, relying heavily on the author’s personal belief and a statement from a coach. However, the text does touch on the idea of using converging lines of evidence, suggesting that the assumption may be more about the overrepresentation of players from such backgrounds rather than a majority.

The Expanding Landscape of Data Science

This book falls within the publication window spanning 2016 to 2018, discernible from references to the 2016 election of Donald Trump. During this period, the field of Big Data and data science began to realize its potential, although the author suggests there is still considerable untapped promise. The potential for major paradigm shifts extends beyond data science, with implications for fields such as psychology, medicine, and the broader social sciences.

Survey Data and Its Limitations

One method data scientists may not replace but significantly augment is survey data. It is essential to remember that people often misrepresent themselves or have limited self-awareness when responding to surveys. Therefore, surveys have historically been considered one of the least reliable sources of data. However, the emerging potential of data science could also be applied to enhance large-scale psychological studies. Specifically, many social science studies traditionally rely on small samples of college students, which may limit the generalizability of their findings. With data science, researchers can expand their subject pool and achieve more robust sample sizes, ultimately improving the quality of their studies.

Racism and Reviewer Skepticism

One of the earliest instances I came across this book was in the context of President Obama’s election. Many were intrigued by the author’s assertion that racism served as a potent predictor for those who did not support President Obama. It opened the eyes of numerous individuals to the lingering presence of racism in America. Remarkably, the author mentions in his book that his initial research, particularly the portion addressing racism and its connection to President Obama’s election, faced rejection multiple times. This rejection stemmed from the skepticism of some reviewers who couldn’t fathom the persistence of overt racism in the United States.

Balancing Data Access and Responsibility

This book offers a unique perspective that resonates with a recent and ongoing debate, particularly in the United States. In recent years, Facebook has faced significant scrutiny from the government due to reports of the platform granting access to its users’ data to third-party companies, often unbeknownst to the users themselves. While it’s worth noting that disclaimers regarding data usage are typically included in Facebook’s terms of service, the reality is that most people do not read these lengthy documents.

This situation has raised concerns about how personal data is leveraged, and rightfully so. Even though there are terms of service in place, many users are understandably alarmed by how their data is being utilized. The pushback against such practices has been substantial. It’s becoming increasingly evident that the issue extends beyond Facebook alone; third-party companies accessing user data is likely a far more widespread practice than many initially realized.

Data Utilization and Societal Benefits

From one perspective, it’s disconcerting that these third parties have the ability to harness our data for potentially manipulative, capitalistic purposes, exploiting their knowledge of our preferences and behaviors. However, this book sheds light on a potential positive aspect of this type of data access. It demonstrates that such access can serve a constructive purpose, enabling us to gain insights into critical issues like updates in public health, responses to child abuse, abortion trends, the prevalence of racism, and even identifying potential terroristic threats.

This nuanced view emphasizes the importance of responsible data usage and the potential for leveraging such access for societal benefits.

The Unfiltered Mind

Once again, people tend to keep their innermost thoughts and deepest secrets to themselves, often reluctant to share them with others. However, they are more inclined to enter such personal information into Google, where they believe their data isn’t under specific surveillance, and no one is scrutinizing them individually. This dynamic provides us with access to the genuine thoughts and concerns that occupy people’s minds. It’s important to note that this data is primarily presented in an aggregated format, reflecting collective patterns in data utilization.

Conclusion

In my assessment, this book offers a thought-provoking exploration of the relationship between data science, human behavior, and societal norms. It underscores the need for caution in inferring causation from correlations, urging us to acknowledge the limitations of such studies.

Moreover, “Everybody Lies” raises the exciting prospect of citizen scientists utilizing data for research. With the creative formulation of questions, everyday individuals can contribute to insightful research, thereby expanding the horizons of knowledge.

The book’s findings, such as the connection between racism and voting patterns, shed light on pertinent issues in contemporary debates. It also resonates with current concerns about data privacy and third-party use, as exemplified by the Facebook data scandal.

Original draft written in February 2019

One thought on “Review of Everybody Lies by Seth Stephens-Davidowitz

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.