Book Club : <Calling Bullshit: The Art of Skepticism in a Data-Driven World>

5 min readSep 2, 2022

I have the honor of hosting a bi-weekly book club at work. The book we are covering is Calling Bullshit: The Art of Skepticism in a Data-Driven World. I’ll be posting discussion questions prior to each meeting here, so feel free to follow along even if you are not part of our book club. I’m excited to discuss the book with my fellow data scientists and hear their thoughts!

Chapter 1 & 2

Liars are usually punished as they will lost people’s trust, while people who bullshit are able to get away with it. Is there something we can do to make people who bullshit face some consequences?
“Headlines are trying to make you click, rather than summarizing the story, and the ones that draw the most clicks are those that promise you an emotional experience “— headlines sort of become the ad for the story, trying to get people’s attention. Have you noticed the change in the functionality of headlines before reading this book?
Internet as public sector — financial institutes face all kinds of regulation, more so after the 2008 financial crisis, while there’s close to zero regulation for the Internet, where majority of people consume and produce information nowadays. What kind of regulation do you think we should prioritize, based on your knowledge about regulation for financial institutes?
“When sharing news on social media, you’re using your own social capital to back the information” — Would people share more responsibly if they knew this is the case? How can we make people aware of this?
The author thinks the most powerful approach to protect ourselves from misinformation and disinformation online is education. What are your thoughts?
The author argue that higher education in STEM disciplines doesn’t provide enough training on critical thinking to prepare people for identifying bullshit. Does this align with your experience? Do you agree or disagree with this statement?

Chapter 3 & 4

“Quantitative evidence seems to carry more weight than qualitative arguments” — Do you agree with this statement? If yes, why do people generally find quantitative evidence to be more believable?
“It’s human nature trying to find patterns and/or draw conclusions” — Is there something we can do to prevent ourselves from jumping into conclusions when there’s not enough evidence?
Have you ever seen a data visualization that’s misleading, either at work, in the news or in academia?
As a data scientist, have you ever encountered selection bias, either in your own work or analysis presented by others?

Chapter 5 & 6

Under what circumstances should we use percentages or percentage points to present the difference in numbers?
“When a measure becomes a target, it ceases to be a good measure.” (Goodhart’s law) — We often set goals/KPIs/OKRs in business settings. Have you seen examples where the chosen metrics led to a change in people’s behaviors? What can we do to avoid these situations?
“If something seems too good or too bad to be true, it probably is.” — Have you seen results in either your own or other people’s analysis where the numbers seem too good or bad to be true? What’s the underlying cause(s) of those numbers appearing to be so good or bad?
Observation selection effects occur when there’s an association between the very presence of the observer and the variable that the observer reports — Have you ever encountered an example of this effect?
Do you have an example of data censoring to share (e.g. customer churn)?

Note: The examples in Chapter 6 are hilarious — highly recommend reading through this chapter for a good laugh (and to learn something, too) !

Chapter 7

Among the various examples of “glass slippers” mentioned in this chapter, which graphic type/data visualization do you find to be the most common? Which example do you find to be the most intriguing or surprising?
The author suggested “bar graph needs to include zero on the vertical axes whereas line graph doesn’t” — Do you agree with this statement? Why or why not?
The author showed a great example of how “binning” could affect the story conveyed by the bar chart. Do you think the bin size should remain the same throughout the chart? What are your thoughts on choosing the appropriate bin size?
Do you agree with the principle of proportional ink as mentioned in this chapter? Why or why not? Are there any exceptions that the principle need not apply?

Chapter 8

The authors emphasized the importance of good data when building models. What does this mean for all the data scientists? And how does it affect the role of data scientists, if any?
What do you think about the results of the gaydar example in this chapter?
The book mentioned that most studies don’t take the extra step of “looking into” how machine learning models make decisions — What are the pros and cons of making this step mandatory?
The authors mentioned the tradeoff between transparency and efficiency — Are there any examples where algorithmic transparency is trivial as long as the outputs are highly accurate?

Chapter 9

For those that have worked in academia, have you ever seen or heard stories about p-hacking?
Do you agree or disagree with John Ioannidis’s argument — “Why Most Published Research Findings Are False”? Why or why not?
Have you ever received spam emails from predatory journals? What was the experience like?

Chapter 10 & 11

Has anyone ever use “reverse image lookup” before? What prompted you to use this search function?
Has anyone ever heard of fact-checking organizations or visited their websites before? What’s the experience like?
As deepfake technology becomes more advanced, how can we protect ourselves from these scams?
Has anyone ever deployed a null model to validate their analytical results, either at work or in academia?

Related resources

Deepfake video of Elon Musk
https://www.whichfaceisreal.com

News about technology companies trying to combat misinformation online

The following paragraph came directly from the newsletter of The Batch@ DeepLearning.AI

Misinformation Recognition

Google updated a key model behind the algorithm that ranks its search results to respond to the flood of misinformation on the web.

What’s new: The search giant aims to minimize the prominence of falsehoods in the information it presents near the top of search results, which it calls snippets.
How it works: Google revised its Multitask Unified Model to verify the accuracy of snippets.

The model evaluates how well the top results agree. It can compare pages on a given topic even if they use different phrases or examples.
If the model doesn’t have high confidence in available sources, instead of a snippet, it generates an advisory such as, “It looks like there aren’t many great results for this search.”
The model also recognizes misleading questions such as, “When did Snoopy assassinate Abraham Lincoln?” The update cuts inappropriate snippets in response to such queries by 40 percent.

Behind the news: Google isn’t the only major website to task AI with filtering the torrent of disinformation.

Facebook uses multimodal learning to detect misinformation related to COVID-19.
In 2020, YouTube deployed a classifier that downgraded recommendations for videos that contain conspiracy theories and anti-scientific misinformation.

Book Club : <Calling Bullshit: The Art of Skepticism in a Data-Driven World>

Misinformation Recognition

Written by The Adventures of Ellie

No responses yet