The “Robust” Data Scientist: Winning with Messy Data and Pingouin
This article uncovers the craftsmanship of using robust statistics in data science processes: illustrating what to do when data fail tests due to not meeting standard assumptions.
The Guardian AI·
Readers respond to an editorial on difficulties with replicability of results in social science research Your editorial on social science research (15 April) highlights the poor replicability of results, and the misuse of this by some to dismiss all social science. As was indicated, in a field as complex as human behaviour, poor replicability can be due to many factors: methodology, misused statistics, variations in sample characteristics and so on. There is one factor underlying much of this, not much discussed, which is a dearth of observation of human behaviour in everyday environments in the same manner as scientists would observe any other species in order to find out what the behaviour is and so what needs to be understood. Continue reading...
Read full articleThis article uncovers the craftsmanship of using robust statistics in data science processes: illustrating what to do when data fail tests due to not meeting standard assumptions.
Science rarely produces identical outcomes. Mistaking this for failure turns caution into an excuse for inaction A new set of studies out this month suggests that as many as half of all results published in reputable journals in the social sciences can’t be replicated by independent analysis. This is part of a long-running problem across many research fields – most visibly in the social sciences and psychology, though concerns have also been raised in areas of biomedical research. The latest work is a seven-year project called Systematizing Confidence in Open Research and Evidence (Score), which has now published three studies looking at 3,900 social science papers. It found that newer papers, and those published in journals requiring extensive sharing of underlying data, were more likely to be reproduced. Separately, medical research faces its own constraints: differing patient caseloads and limited sample sizes mean that, in practice, it can resemble the social sciences more than lab