1: \begin{abstract}
2: Altruistic cooperation is costly yet socially desirable.
3: As a result, agents struggle to learn cooperative policies through independent reinforcement learning (RL).
4: Indirect reciprocity, where agents consider their interaction partner's reputation, has been shown to stabilise cooperation in homogeneous, idealised populations.
5: However, more realistic settings are comprised of heterogeneous agents with different characteristics and group-based social identities.
6: We study cooperation when agents are stratified into two such groups, and allow reputation updates and actions to depend on group information.
7: We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria.
8: We observe that a defecting majority leads the minority group to defect, but not the inverse.
9: Moreover, changing the norms that judge in- and out-group interactions can steer a system towards either fair or unfair cooperation.
10: This is made clearer when moving beyond equilibrium analysis to independent RL agents, where convergence to fair cooperation occurs with a narrower set of norms.
11: Our results highlight that, in heterogeneous populations with reputations, carefully defining interaction norms is fundamental to tackle both dilemmas of cooperation and of fairness.
12: \end{abstract}
13: