Publications and Research Projects

If you use Empirica please cite the following paper:

Almaatouq, Abdullah, Joshua Becker, James P. Houghton, Nicolas Paton, Duncan J. Watts, and Mark E. Whiting. "Empirica: a virtual lab for high-throughput macro-level experiments." Behavior Research Methods (2021): 1-14. https://doi.org/10.3758/s13428-020-01535-9

The following is a non-exhaustive list of publications and preprints that are based on Empirica. If your paper is missing, please let us know via hello@empirica.ly

2022

Victorious and Hierarchical: Past Performance as a Determinant of Team Hierarchical Differentiation

Authors: Christopher To, Thomas Taiyi Yan, Elad N. Sherf.
In: Organization Science.
Abstract: Hierarchies emerge as collectives attempt to organize themselves toward successful performance. Consequently, research has focused on how team hierarchies affect performance. We extend existing models of the hierarchy-performance relationship by adopting an alternative: Performance is not only an output of hierarchy but also a critical input, as teams’ hierarchical differentiation may vary based on whether they are succeeding. Integrating research on exploitation and exploration with work on group attributions, we argue that teams engage in exploitation by committing to what they attribute as the cause of their performance success. Specifically, collectives tend to attribute their success to individuals who wielded greater influence within the team; these individuals are consequently granted relatively higher levels of influence, leading to a higher degree of hierarchy. We additionally suggest that the tendency to attribute, and therefore grant more influence, to members believed to be the cause of success is stronger for teams previously higher (versus lower) in hierarchy, as a higher degree of hierarchical differentiation provides clarity as to which members had a greater impact on the team outcome. We test our hypotheses experimentally with teams engaging in an online judgement task and observationally with teams from the National Basketball Association. Our work makes two primary contributions: (a) altering existing hierarchy-performance models by highlighting performance as both an input and output to hierarchy and (b) extending research on the dynamics of hierarchy beyond individual rank changes toward examining what factors increase or decrease hierarchical differentiation of the team as a whole.


2021

Will We Trust What We Don't Understand? Impact of Model Interpretability and Outcome Feedback on Trust in AI

Authors: Daehwan Ahn, Abdullah Almaatouq, Monisha Gulabani, Kartik Hosanagar.
In: arXiv preprint.
Abstract: Despite AI's superhuman performance in a variety of domains, humans are often unwilling to adopt AI systems. The lack of interpretability inherent in many modern AI techniques is believed to be hurting their adoption, as users may not trust systems whose decision processes they do not understand. We investigate this proposition with a novel experiment in which we use an interactive prediction task to analyze the impact of interpretability and outcome feedback on trust in AI and on human performance in AI-assisted prediction tasks. We find that interpretability led to no robust improvements in trust, while outcome feedback had a significantly greater and more reliable effect. However, both factors had modest effects on participants' task performance. Our findings suggest that (1) factors receiving significant attention, such as interpretability, may be less effective at increasing trust than factors like outcome feedback, and (2) augmenting human performance via AI systems may not be a simple matter of increasing trust in AI, as increased trust is not always associated with equally sizable improvements in performance. These findings invite the research community to focus not only on methods for generating interpretations but also on techniques for ensuring that interpretations impact trust and performance in practice.

Task Complexity Moderates Group Synergy

Authors: Abdullah Almaatouq, Mohammed Alsobay, Ming Yin, Duncan J. Watts.
In: Proceedings of the National Academy of Sciences.
Abstract: Complexity—defined in terms of the number of components and the nature of the interdependencies between them—is clearly a relevant feature of all tasks that groups perform. Yet the role that task complexity plays in determining group performance remains poorly understood, in part because no clear language exists to express complexity in a way that allows for straightforward comparisons across tasks. Here we avoid this analytical difficulty by identifying a class of tasks for which complexity can be varied systematically while keeping all other elements of the task unchanged. We then test the effects of task complexity in a preregistered two-phase experiment in which 1,200 individuals were evaluated on a series of tasks of varying complexity (phase 1) and then randomly assigned to solve similar tasks either in interacting groups or as independent individuals (phase 2). We find that interacting groups are as fast as the fastest individual and more efficient than the most efficient individual for complex tasks but not for simpler ones. Leveraging our highly granular digital data, we define and precisely measure group process losses and synergistic gains and show that the balance between the two switches signs at intermediate values of task complexity. Finally, we find that interacting groups generate more solutions more rapidly and explore the solution space more broadly than independent problem solvers, finding higher-quality solutions than all but the highest-scoring individuals.

Algorithmically Mediating Communication to Enhance Collective Decision-Making in Online Social Networks

Authors: Jason W Burton, Ulrike Hahn, Abdullah Almaatouq, M Amin Rahimian.
In: Proceedings of the 9th ACM Collective Intelligence Conference.
Abstract: Many collective decision-making contexts involve communication among group members. Sometimes this communication helps the collective reach an accurate decision because it allows individuals to gain otherwise unknown information from their peers, but sometimes this communication gives rise to detrimental social influence or “groupthink.” Whether communication is ultimately good or bad for a group’s collective decision-making depends on the underlying network structure (ie, who communicates with whom): high levels of connectivity and free-flowing information can lead to “excess correlation” (ie, correlation between individuals that is not accuracy inducing)[Jönsson et al. 2015]; high levels of centralization can lead to certain individuals wielding excessive influence over the network [Becker et al. 2017]; and a lack of structural plasticity can prevent networks from effectively responding to feedback about individuals’ performance [Almaatouq et al. 2020b]. Despite abundant knowledge on the relationship between network structure and collective accuracy, strategies for exploiting network structure to increase collective accuracy remain under-explored. In the present work, we experiment with one such strategy, rewiring algorithms, which mediate online communication by manipulating social networks’ structure. Crucially, the algorithms considered may improve accuracy by modifying connectivity based on the distribution of participant responses alone, that is, without access to a ground truth on the issue at the time of communication.

Screening Diabetic Retinopathy Using an Automated Retinal Image Analysis System in Independent and Assistive Use Cases in Mexico: Randomized Controlled Trial

Authors: Alejandro Noriega, Daniela Meizner, Dalia Camacho, Jennifer Enciso, Hugo Quiroz-Mercado, Virgilio Morales-Canton, Abdullah Almaatouq, Alex Pentland.
In: JMIR formative research.
Abstract: The automated screening of patients at risk of developing diabetic retinopathy represents an opportunity to improve their midterm outcome and lower the public expenditure associated with direct and indirect costs of common sight-threatening complications of diabetes. This study aimed to develop and evaluate the performance of an automated deep learning–based system to classify retinal fundus images as referable and nonreferable diabetic retinopathy cases, from international and Mexican patients. In particular, we aimed to evaluate the performance of the automated retina image analysis (ARIA) system under an independent scheme (ie, only ARIA screening) and 2 assistive schemes (ie, hybrid ARIA plus ophthalmologist screening), using a web-based platform for remote image analysis to determine and compare the sensibility and specificity of the 3 schemes. A randomized controlled experiment was performed where 17 ophthalmologists were asked to classify a series of retinal fundus images under 3 different conditions. The conditions were to (1) screen the fundus image by themselves (solo); (2) screen the fundus image after exposure to the retina image classification of the ARIA system (ARIA answer); and (3) screen the fundus image after exposure to the classification of the ARIA system, as well as its level of confidence and an attention map highlighting the most important areas of interest in the image according to the ARIA system (ARIA explanation). The ophthalmologists’ classification in each condition and the result from the ARIA system were compared against a gold standard generated by consulting and aggregating the opinion of 3 retina specialists for each fundus image. The ARIA system was able to classify referable vs nonreferable cases with an area under the receiver operating characteristic curve of 98%, a sensitivity of 95.1%, and a specificity of 91.5% for international patient cases. There was an area under the receiver operating characteristic curve of 98.3%, a sensitivity of 95.2%, and a specificity of 90% for Mexican patient cases. The ARIA system performance was more successful than the average performance of the 17 ophthalmologists enrolled in the study. Additionally, the results suggest that the ARIA system can be useful as an assistive tool, as sensitivity was significantly higher in the experimental condition where ophthalmologists were exposed to the ARIA system’s answer prior to their own classification (93.3%), compared with the sensitivity of the condition where participants assessed the images independently (87.3%; P=.05). These results demonstrate that both independent and assistive use cases of the ARIA system present, for Latin American countries such as Mexico, a substantial opportunity toward expanding the monitoring capacity for the early detection of diabetes-related blindness.

The Crowd Classification Problem: Social Dynamics of Binary-Choice Accuracy

Authors: Joshua Aaron Becker , Douglas Guilbeault , Edward Bishop Smith.
In: Management Science.
Abstract: Decades of research suggest that information exchange in groups and organizations can reliably improve judgment accuracy in tasks such as financial forecasting, market research, and medical decision making. However, we show that improving the accuracy of numeric estimates does not necessarily improve the accuracy of decisions. For binary-choice judgments, also known as classification tasks—for example, yes/no or build/buy decisions—social influence is most likely to grow the majority vote share, regardless of the accuracy of that opinion. As a result, initially, inaccurate groups become increasingly inaccurate after information exchange, even as they signal stronger support. We term this dynamic the “crowd classification problem.” Using both a novel data set and a reanalysis of three previous data sets, we study this process in two types of information exchange: (1) when people share votes only, and (2) when people form and exchange numeric estimates prior to voting. Surprisingly, when people exchange numeric estimates prior to voting, the binary-choice vote can become less accurate, even as the average numeric estimate becomes more accurate. Our findings recommend against voting as a form of decision making when groups are optimizing for accuracy. For those cases where voting is required, we discuss strategies for managing communication to avoid the crowd classification problem. We close with a discussion of how our results contribute to a broader contingency theory of collective intelligence.

Human Biases Limit Algorithmic Boosts of Cultural Evolution

Authors: Levin Brinkmann, Deniz Gezerli, KIRA VON KLEIST, Thomas F Müller, Iyad Rahwan, Niccolo Pescetelli.
In: SocArXiv preprint.
Abstract: Decades of research suggest that information exchange in groups and organizations can reliably improve judgment accuracy in tasks such as financial forecasting, market research, and medical decision making. However, we show that improving the accuracy of numeric estimates does not necessarily improve the accuracy of decisions. For binary-choice judgments, also known as classification tasks—for example, yes/no or build/buy decisions—social influence is most likely to grow the majority vote share, regardless of the accuracy of that opinion. As a result, initially, inaccurate groups become increasingly inaccurate after information exchange, even as they signal stronger support. We term this dynamic the “crowd classification problem.” Using both a novel data set and a reanalysis of three previous data sets, we study this process in two types of information exchange: (1) when people share votes only, and (2) when people form and exchange numeric estimates prior to voting. Surprisingly, when people exchange numeric estimates prior to voting, the binary-choice vote can become less accurate, even as the average numeric estimate becomes more accurate. Our findings recommend against voting as a form of decision making when groups are optimizing for accuracy. For those cases where voting is required, we discuss strategies for managing communication to avoid the crowd classification problem. We close with a discussion of how our results contribute to a broader contingency theory of collective intelligence.

Respect the Code: Speakers Expect Novel Conventions to Generalize within but Not Across Social Group Boundaries

Authors: Robert D Hawkins, Irina Liu, Adele E Goldberg, Thomas L Griffiths.
In: Proceedings of the 43rd Annual Meeting of the Cognitive Science Society.
Abstract: Speakers use different language to communicate with partners in different communities. But how do we learn and represent which conventions to use with which partners? In this paper, we argue that solving this challenging computational problem requires speakers to supplement their lexical representations with knowledge of social group structure. We formalize this idea by extending a recent hierarchical Bayesian model of convention formation with an intermediate layer explicitly representing the latent communities each partner belongs to, and derive predictions about how conventions formed within a group ought to extend to new in-group and out-group members. We then present evidence from two behavioral experiments testing these predictions using a minimal group paradigm. Taken together, our findings provide a first step toward a formal framework for understanding the interplay between language use and social group knowledge.

Multi-party Referential Communication in Complex Strategic Games

Authors: Jessica Mankewitz, Veronica Boyce, Brandon Waldon, Georgia Loukatou, Dhara Yu, Jesse Mu, Noah D Goodman, Michael C Frank.
In: PsyArXiv preprint.
Abstract: Verbal communication is an ubiquitous aspect of human interaction occurring in many contexts; however, it is primarily studied in the limited context of two people communicating information. Understanding communication in complex, multi-party interactions is both a scientific challenge for psycholinguistics and an engineering challenge for creating artificial agents who can participate in these richer contexts. We adapted the reference game paradigm to an online 3-player game where players refer to objects in order to coordinate selections based on the available utilities. We ran games with shared or individual payoffs and with or without access to language. Our paradigm can also be used for artificial agents; we trained reinforcement learning-based agents on the same task as a comparison. Our dataset shows the same patterns found in simpler reference games and contains rich language of reference and negotiation.

Probabilistic Social Learning Improves the Public’s Judgments of News Veracity

Authors: Douglas Guilbeault, Samuel Woolley, Joshua Becker.
In: PLOS ONE.
Abstract: The digital spread of misinformation is one of the leading threats to democracy, public health, and the global economy. Popular strategies for mitigating misinformation include crowdsourcing, machine learning, and media literacy programs that require social media users to classify news in binary terms as either true or false. However, research on peer influence suggests that framing decisions in binary terms can amplify judgment errors and limit social learning, whereas framing decisions in probabilistic terms can reliably improve judgments. In this preregistered experiment, we compare online peer networks that collaboratively evaluated the veracity of news by communicating either binary or probabilistic judgments. Exchanging probabilistic estimates of news veracity substantially improved individual and group judgments, with the effect of eliminating polarization in news evaluation. By contrast, exchanging binary classifications reduced social learning and maintained polarization. The benefits of probabilistic social learning are robust to participants’ education, gender, race, income, religion, and partisanship.


2020

Adaptive Social Networks Promote the Wisdom of Crowds

Authors: Abdullah Almaatouq, Alejandro Noriega-Campero, Abdulrahman Alotaibi, PM Krafft, Mehdi Moussaid, Alex Pentland.
In: Proceedings of the National Academy of Sciences.
Abstract: Social networks continuously change as new ties are created and existing ones fade. It is widely acknowledged that our social embedding has a substantial impact on what information we receive and how we form beliefs and make decisions. However, most empirical studies on the role of social networks in collective intelligence have overlooked the dynamic nature of social networks and its role in fostering adaptive collective intelligence. Therefore, little is known about how groups of individuals dynamically modify their local connections and, accordingly, the topology of the network of interactions to respond to changing environmental conditions. In this paper, we address this question through a series of behavioral experiments and supporting simulations. Our results reveal that, in the presence of plasticity and feedback, social networks can adapt to biased and changing information environments and produce collective estimates that are more accurate than their best-performing member. To explain these results, we explore two mechanisms: 1) a global-adaptation mechanism where the structural connectivity of the network itself changes such that it amplifies the estimates of high-performing members within the group (i.e., the network “edges” encode the computation); and 2) a local-adaptation mechanism where accurate individuals are more resistant to social influence (i.e., adjustments to the attributes of the “node” in the network); therefore, their initial belief is disproportionately weighted in the collective estimate. Our findings substantiate the role of social-network plasticity and feedback as key adaptive mechanisms for refining individual and collective judgments.

Interdependent Diffusion: The Social Contagion of Interacting Beliefs

Authors: James P Houghton.
In: arXiv preprint.
Abstract: Social contagion is the process in which people adopt a belief, idea, or practice from a neighbor and pass it along to someone else. For over 100 years, scholars of social contagion have almost exclusively made the same implicit assumption: that only one belief, idea, or practice spreads through the population at a time. It is a default assumption that we don't bother to state, let alone justify. The assumption is so ingrained that our literature doesn't even have a word for “whatever is to be diffused”, because we have never needed to discuss more than one of them. But this assumption is obviously false. Millions of beliefs, ideas, and practices (let's call them “diffusants”) spread through social media every day. To assume that diffusants spread one at a time (or more generously, that they spread independently of one another) is to assume that interactions between diffusants have no influence on adoption patterns. This could be true, or it could be wildly off the mark. We've never stopped to find out. This paper makes a direct comparison between the spread of independent and interdependent beliefs using simulations, observational data analysis, and a 2400-subject laboratory experiment. I find that in assuming independence between diffusants, scholars have overlooked social processes that fundamentally change the outcomes of social contagion. Interdependence between beliefs generates polarization, irrespective of social network structure, homophily, demographics, politics, or any other commonly cited cause. It also leads to the emergence of popular worldviews that are unconstrained by ground truth.

Collective Learning in News Consumption

Authors: Niccolò Pescetelli, Alex Rutherford, Albert Kao, Iyad Rahwan.
In: PsyArXiv preprint.
Abstract: In a complex digital space---where information is shared without vetting from central authorities and where emotional content, rather than factual veracity, better predicts content spread---individuals often need to learn through experience which news sources to trust and rely on. Although public and experts' intuition alike call for stronger scrutiny of public information providers, and reliance on global trusted outlets, there is a statistical argument to be made that counter these prescriptions. We consider the scenario in which news statements are used by individuals to achieve a collective payoff---as is the case in many electoral contexts. In this case, a plurality of independent though less accurate news providers might be better for the public good than having fewer highly accurate ones. In a carefully controlled experiment, we asked people to make binary forecasts and rewarded them for their individual or collective performance. In accordance with theoretical expectations, we found that when collectively rewarded people learned to rely more on local information sources and that this strategy accrued better collective performance. Importantly, these effects positively scaled with group size so that larger groups benefited more from trusting local news sources. We validate these claims against a real-world news dataset. These findings show the importance of independent (instead of simply accurate) voices in any information landscape, but particularly when large groups of people want to maximize their collective payoff. These results suggest---at least statistically speaking---that emphasizing collective payoffs in large networks of news end-users might foster resilience to collective information failures.

Network Structures of Collective Intelligence: The Contingent Benefits of Group Discussion

Authors: Joshua Becker, Abdullah Almaatouq, Emőke-Ágnes Horvát.
In: arXiv preprint.
Abstract: Research on belief formation has produced contradictory findings on whether and when communication between group members will improve the accuracy of numeric estimates such as economic forecasts, medical diagnoses, and job candidate assessments. While some evidence suggests that carefully mediated processes such as the "Delphi method" produce more accurate beliefs than unstructured discussion, others argue that unstructured discussion outperforms mediated processes. Still others argue that independent individuals produce the most accurate beliefs. This paper shows how network theories of belief formation can resolve these inconsistencies, even when groups lack apparent structure as in informal conversation. Emergent network structures of influence interact with the pre-discussion belief distribution to moderate the effect of communication on belief formation. As a result, communication sometimes increases and sometimes decreases the accuracy of the average belief in a group. The effects differ for mediated processes and unstructured communication, such that the relative benefit of each communication format depends on both group dynamics as well as the statistical properties of pre-interaction beliefs. These results resolve contradictions in previous research and offer practical recommendations for teams and organizations.

Exposure to Common Enemies can Increase Political Polarization: Evidence from a Cooperation Experiment with Automated Partisans

Authors: Eaman Jahani, Natalie McDaniel Gallagher, Friedolin Merhout, Nicolo Cavalli, Douglas Guilbeault, Yan Leng, Christopher A Bail.
In: SocArXiv preprint.
Abstract: Longstanding theory indicates the threat of a common enemy can mitigate conflict between members of rival groups. We tested this hypothesis in a pre-registered experiment where 1,670 Republicans and Democrats in the United States were asked to complete a collaborative online task with an automated agent or “bot” that was labelled as a member of the opposing party. Prior to this task, we exposed respondents to primes about a) a common enemy (involving threats from Iran, China, and Russia); b) a patriotic event; or c) a neutral, apolitical prime. Though we observed no significant differences in the behavior of Democrats as a result of these primes, we found that Republicans—and particularly those with very strong conservative views—were significantly less likely to cooperate with Democrats when primed about a common enemy. We also observed lower rates of cooperation among Republicans who participated in our study during the 2020 Iran crisis, which occurred in the middle of our fieldwork. These findings indicate common enemies may not reduce inter-group conflict in highly polarized societies, and contribute to a growing number of studies that find evidence of asymmetric political polarization. We conclude by discussing the implications of these findings for research in social psychology, political conflict, and the rapidly expanding field of computational social science..


2019

The Wisdom of Partisan Crowds

Authors: Joshua Becker, Ethan Porter, Damon Centola.
In: Proceedings of the National Academy of Sciences.
Abstract: Theories in favor of deliberative democracy are based on the premise that social information processing can improve group beliefs. While research on the “wisdom of crowds” has found that information exchange can increase belief accuracy on noncontroversial factual matters, theories of political polarization imply that groups will become more extreme—and less accurate—when beliefs are motivated by partisan political bias. A primary concern is that partisan biases are associated not only with more extreme beliefs, but also with a diminished response to social information. While bipartisan networks containing both Democrats and Republicans are expected to promote accurate belief formation, politically homogeneous networks are expected to amplify partisan bias and reduce belief accuracy. To test whether the wisdom of crowds is robust to partisan bias, we conducted two web-based experiments in which individuals answered factual questions known to elicit partisan bias before and after observing the estimates of peers in a politically homogeneous social network. In contrast to polarization theories, we found that social information exchange in homogeneous networks not only increased accuracy but also reduced polarization. Our results help generalize collective intelligence research to political domains.

Exploring Improvisational Approaches to Social Knowledge Acquisition

Authors: Dan Feng, Elin Carstensdottir, Magy Seif El-Nasr, Stacy Marsella.
In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems.
Abstract: To build agents that can engage user in more open-ended social contexts, more and more attention has been focused on data-driven approaches to reduce the requirement of extensive, hand-authored behavioral content creation. However, one fundamental challenge of data-driven approaches, is acquiring human social interaction data with sufficient variety to capture more open-ended social interactions, as well as their coherency. Previous work has attempted to extract such social knowledge using crowdsourced narratives. This paper proposes an approach to acquire the knowledge of social interaction by integrating an improvisational theatre training technique into a crowdsourcing task aimed at collecting social narratives. The approach emphasizes theory of mind concepts, through an iterative prompting process about the mental states of characters in the narrative and paired writing, in order to encourage the authoring of diverse social interactions. To assess the effectiveness of integrating prompting and two-worker improvisation to the knowledge acquisition process, we systematically compare alternative ways to design the crowdsourcing task, including a) single worker vs. two workers authoring interaction between different characters in a given social context, and b) with or without prompts. Findings from 175 participants across two different social contexts show that the prompts and two-workers collaboration could significantly improve the diversity and the objective coherency of the narratives. The results presented in this paper can provide a rich set of diverse and coherent action sequences to inform the design of socially intelligent agents.