Thinking about a Data Deal
By Laurence Kotler-Berkowitz
American Jewish communal organizations have long collected data to better serve their communities. In the Spring 2024 issue of Sources, our colleague Ari Kelman, a Stanford professor of education and Jewish studies, introduced a provocative idea: a bold new “data deal.” What if American Jews, tech companies, Jewish organizations, and researchers joined forces to create a secure, ethical system for using personal data to better understand Jewish life? After all, many people already share data with “big tech” without realizing it. Could we harness that data for communal good? Kelman challenged his colleagues to think about it: “This prospective and partial vision is meant as a provocation to my colleagues in both research and Jewish organizations to think anew about data: how it is produced, and what it means for the American Jewish community.”
Kelman’s essay indeed provoked us to think about his ideas. As social scientists and long-time researchers in the Jewish communal sector, we at Rosov Consulting certainly support and share Kelman’s vision of American Jewish communities and organizations using data to inform policy, evaluate programs, and strengthen Jewish life. We agree that digital data, sometimes referred to as “big data,” offers new opportunities to communal organizations and researchers. We admire his call for our field to innovate and think creatively.
However, we have some key points we’d like to add—and concerns we want to raise about the foundation of his proposal. The vision as proposed misrepresents existing survey research, overestimates what big data alone can achieve, and sidesteps significant methodological challenges.
The use of distinctive Jewish names in survey research
For many years, survey researchers have used distinctive Jewish names, commonly referred to as DJNs, to help a identify American Jews for surveys. Kelman is highly critical of the practice. “American Jewish population studies regularly used the presence of ‘distinctive Jewish names’ to estimate population sizes,” he writes. “Despite its obvious problems, the reliability of this practice remained the subject of serious discussion among some scholars of American Jewry well into the 21st century.”
We caution against accepting Kelman’s assessment. Researchers have never used DJNs as the exclusive source of Jewish population estimates because they have always known that would yield biased results. Instead, they utilized DJNs as one of several components of sampling designs that, together with proper survey weights, took advantage of the efficiencies of DJNs, mitigated the biases produced by using DJNs alone, generated valid and reliable survey findings, and did so at significantly reduced costs compared to not using DJNs at all.
Over time, the use of DJNs in sampling has improved dramatically. Starting with a small group of common Ashkenazi surnames, DJNs now include a much more extensive set of Ashkenazi, Sephardi, Mizrachi, Israeli, Russian, and Persian first and last names. Researchers today typically use algorithms that combine this expanded set of DJNs with big data on consumer behaviors, geography, and other information to calculate the likelihood of someone (or some household) being Jewish for sampling purposes. As with DJNs in the past, the use of likely Jewish individuals (and households) today is just one of many components of complex sampling designs and survey weighting that jointly and cost-efficiently generate reliable and valid Jewish population estimates and characteristics.
It has not been the case, then, that researchers continued to discuss the reliability of DJNs well into the 21st century despite their “obvious problems.” Nor is it the case, as Kelman writes elsewhere, that “the Distinctive Jewish Names strategy has outlived its utility.” Researchers have long understood both the potential biases of DJNs, as well as their advantages when used prudently. Their 21st century efforts have focused on developing better methods, including the use of big data, to effectively leverage DJNs as one of many components of survey sampling. The suggestion that they have done otherwise is simply inaccurate.
Not-so-befuddling survey data about US Jews
In 2020, the Pew Research Center conducted its second nationally representative survey of U.S. Jews[1] (the first was in 2013). Kelman cites a single finding from the 2020 study—that 20% of respondents reported their “religious faith provides them with a great deal of meaning and fulfillment”—and then poses a series of questions: “Is that a lot of people or just a few people? Does it mean that Judaism, which the majority of American Jews claim as their religion, is not actually a popular source of fulfillment or meaning for American Jews? How does it compare to the situation a generation ago? Two generations ago? In the Middle Ages? Is 20% a spiritual trough or a peak? How do Jews compare with other Americans? How does the 20% compare to the 43% of American Jews who find spending time with their pets to be meaningful and fulfilling? How do you measure ‘meaning’ anyhow? The Pew Report is probably the best national study of American Jews we have, and still, it contains some strangeness that can be more befuddling than illuminating.”
Kelman’s questions are designed to raise doubts about the value of Pew’s survey in particular and conventional survey research in general. But Kelman ignores comparative data on the same page in the same report that provide important context and answer some of his own questions. For example, the item on “religious faith” was one of 7 items asked about in the same question. Religious faith placed last. The next closest item, “your job, career or education,” nearly doubled it at 38%. The top item, spending time with family, garnered 74%. In addition, the report shows that 40% of U.S. adults as a whole say their religious faith provided them with a great deal of meaning and fulfillment, twice the share of U.S. Jews, while the same proportion of US adults as Jewish adults say spending time with their pets (43%) and family (74%) provides them with a great deal of meaning and fulfillment.
From just these few additional data points, we can begin to infer that Judaism as a religious faith is not as popular a source of fulfillment or meaning for most American Jews as other factors are, and that it provides less fulfillment and meaning to Jews than other religious traditions do for their adherents. This does not mean, however, that being Jewish is unimportant to American Jews. The same report also shows that 76% of US Jews say being Jewish is very (42%) or somewhat (34%) important to them. Nor does it mean that US Jews understand being Jewish only in religious terms. Again, the same report shows that more than half of US Jews (55%) say being Jewish is mainly about culture while a third (36%) say being Jewish is mainly about religion.
When Kelman’s single data point is put in context with other readily available information from the same survey, we see that it is not so befuddling after all, but rather part of a bigger picture about the role of religion in American Jewish life. We also know from experience that the publicly available Pew survey data file, as well as other surveys and their data files, can give us even more specific answers to Kelman’s questions. All of which is to say that conventional surveys continue to provide us with important information.
But this example also reminds us of something fundamental about data that Kelman’s initial framing elided. We should never expect a single data point—whether from a survey as in this case, or from qualitative methods such as interviews, focus groups, ethnographic observations, or case studies—to tell us an entire story. To the contrary: understanding data requires context, comparisons, and an openness to complexity.
The data deal vs. community studies and evaluations
The idea that Jewish organizations can add to what they know about Jews by tracking their digital data is intriguing. But contrary to Kelman’s suggestion, we doubt that Jewish organizations can learn better from tracking digital data than they can from the community studies and evaluations they typically commission, or replace those studies as sources of information about American Jews.
Digital data can disclose behaviors, but they cannot reveal the attitudes, preferences, motivations, or priorities that shape behavior. For example, digital data cannot tell us about Jews’ preferences for accessing human services from Jewish organizations, the factors they consider when making Jewish educational choices for their children, their priorities for local communal funding, or how welcome or excluded they feel from their local Jewish community. Likewise, digital data cannot tell us about Jews’ often complex feelings toward Israel or toward other Jews in their local communities, their perceptions of and experiences with antisemitism, or their inability to engage in Jewish life due to financial constraints. These are the types of vital topics that community studies typically cover through surveys, focus groups, and in-depth interviews.
Furthermore, to evaluate a program or initiative, researchers undertake specific tasks that big data are not positioned to support. Researchers collect Jewish background information about program participants, measure desired outcomes before and after a program or over the course of an initiative, interview participants about their program experiences, and ideally, compare participants and non-participants (unfortunately, this last task is rarely feasible in Jewish communal research). Data that people leave behind “in their digital wakes” cannot aid in accomplishing these research tasks. They cannot substitute for the direct interactions with program participants—through both quantitative and qualitative methods—that evaluation research both requires and facilitates.
Sample size, representativeness, and non-response bias
Recalling a conversation with a friend who works at a big social media company, Kelman approvingly quotes her as saying, “We don’t sample…We don’t have to. We have so many users that we don’t have to sample or recruit specific pools of participants. Our numbers are so big that however many people respond to a survey or participate in an experiment, they basically represent everyone.” Kelman adds his own commentary: “With the number of its users in the billions, my friend’s employer can avoid the trouble of ensuring that the responses of a relatively few people can reasonably represent the larger population. At that scale, the challenges of representativeness nearly vanish.”
Sample size, however, does not guarantee representativeness. In survey research, all else being equal, larger samples are better than smaller samples—but non-response bias can undermine accuracy no matter how big a sample is. Non-response bias occurs when those who decline to participate differ systematically from those who do, skewing results. There are statistical adjustments through weighting survey data that can address this, but they require data external to the achieved sample and knowledge of how to implement the adjustments. In other words: while big social media companies can select and invite a million people to take a survey and generate a sample of several hundred thousand people, if the non-respondents differ in meaningful ways, the findings may still be flawed. The challenges of representativeness do not vanish because a sample is exceedingly large. They remain front and center, just as they do in conventional survey research.
The pressure of socially desirable behaviors
Kelman argues that getting people to agree to have their data tracked would allow us to see what they are doing without having to ask them about it on surveys. Tracking phone locations, he suggests, could estimate synagogue attendance (excluding those without phones, like some halakhically observant Jews). Similarly, charitable donations, media consumption, participation at JCCs, and attendance at Jewish film festivals could be followed. Google searches and Amazon purchases might uncover Jewish interests, needs, or holiday habits. As Kelman puts it, “If Amazon knows what I might be interested in before I do, imagine the possibilities for American Jewish organizations.”
It’s an enticing idea: sign them up and watch them do their thing. But it’s probably not as easy as that. Like traditional surveys, tracking could face social desirability effects, where people adjust behaviors to align with perceived expectations. For example, political surveys consistently overreport voting rates compared to official data. Similarly, in Jewish survey research, people may overstate synagogue attendance or charitable giving. Focus groups show similar bias, with some participants toning down or altering opinions to fit in.
Imagine, then, what some people might do if they knew their Jewish behaviors were being tracked. The person who goes to synagogue once every few months month might start to go once a month. The person who gave a charitable contribution to one Jewish cause last year might give to two or three this year. The person who is on the fence about attending a Jewish film festival because it’s raining out and they’re tired from a long day might just decide to go. When the tracking is done, and they go back to their more typical behaviors, we would be left with a biased view of their Jewish lives. As with non-response bias, social desirability effects remain a significant methodological challenge, even in a big data-driven approach.
The potential of selection bias
It has long been known that people who are more interested in a topic are more likely to participate in research about it – for example, to take a survey or agree to an interview. When research is conducted among Jews about Jewish topics, this selection bias can result in research being disproportionately engaged in Jewish life. It is quite easy to see how this dynamic would extend to Kelman’s data deal. Among all those with an opportunity to participate in the data deal, whether invited under some kind of random sampling design or an opt-in approach, we would expect that the people who agree to it would be more and differently engaged in Jewish life and community than those who don’t.
To mitigate these biasing effects, we would need some valid, external benchmarks against which to compare Kelman’s data deal participants and make statistical adjustments through weighting. As of today, those benchmarks come from methodologically rigorous probability surveys—a situation that turns the data deal on its head. It is not the case, as Kelman writes, that “although survey…data will always be useful, such approaches cannot reveal what Amazon or Google can about American Jews and their lives.” Rather, the usefulness of digital data tracking would likely depend on high-quality probability surveys to correct the self-selection biases that the data deal would bring.
Conclusion
To be clear: we do not advocate standing still. Our team occasionally has reason to look back at survey methods and findings from decades past, and we are struck by the advances Jewish social research has made. Without doubt, big data presents an exciting opportunity for our field’s continued advancement. We are encouraged that colleagues like Ari Kelman are thinking creatively about it and we are grateful for the opportunity to engage in conversation about it.
As our field innovates and advances, though, we need to do so with prudence and humility. We can’t build our field’s future on misunderstandings—and worse, mischaracterizations—of past and present methods. We need to grapple with how digital data research participants might differ from non-participants, and how participant behavior might change knowing their digital data are being tracked. We must also recognize the limits of big data methods, such as the inability to directly measure participant attitudes and preferences that underlie the behaviors digital data capture. Leveraging big data holds promise for our field, but it is not a panacea for all data collection challenges, nor an easy substitute for conventional methods that have proven their value over time to many communal organizations.
[1] Pew Research Center, Jewish Americans in 2020, https://www.pewresearch.org/religion/2021/05/11/jewish-americans-in-2020/. Pew Research Center, A Portrait of Jewish Americans, https://www.pewresearch.org/religion/2013/10/01/jewish-american-beliefs-attitudes-culture-survey/. Kotler-Berkowitz and Kelman were advisors to the Pew Research Center on the 2020 study. Kotler-Berkowitz was also an advisor on the 2013 Pew study. The Pew Research Center bears no responsibility for the contents of this essay.