Algorithmic decision-making (ADM) promises to strengthen evidence-based decisions, particularly to better manage risks in various domains. Its use also extends to the criminal justice system where algorithmic risk assessments potentially provide very valuable evidence that can inform highly sensitive decisions. Yet, such algorithmic tools also introduce intricate problems that are tied to the fundamental question of exactly what kind and what quality of evidence they offer. This paper illustrates this problem based on a comparison of pretrial risk assessments that have been implemented statewide in the USA. The authors highlight the empirical variation in the construction, evaluation and documentation of these tools to carve out the considerable discretion involved along these dimensions. They also point to further possible ways of looking at the performance of these tools and show why evaluating the quality of the evidence delivered by algorithmic risk assessments is a far from straightforward affair.