There is a lot of excitement about the potential for machine learning and artificial intelligence to improve global health. Over the last few years, our research at the London School of Hygiene & Tropical Medicine (LSHTM) – part of HDR UK London – has increasingly looked towards these methods to help tackle important global health questions. Of course, this has raised questions about how, and perhaps more importantly what, we teach.

Earlier this year, HDR UK launched a competition for new MSc programmes in Health Data Science. LSHTM were delighted to be one of the successful institutions to bid for support. This has provided the impetus for us to think more ambitiously about our teaching in this area; our initial thoughts of a handful of new data science sessions rapidly escalated into an entirely new 1-year taught programme! We are really excited to be opening for admissions to our new MSc in Health Data Science this week.

I think the hardest part of the process, certainly for me, was grappling with the surprisingly slippery issue of exactly what health data science is. Health data science means different things to different people: some see it as a branch of computer science applied to health data, others feel it is simply a cynical rebranding of medical statistics. Xiao-Li Meng, the Whipple V. N. Jones Professor of Statistics at Harvard University, in the inaugural issue of the Harvard Data Science Review, suggests it is unwise to try to conceptualize data science as a single discipline and that it should perhaps be thought of as a collection of disciplines with “complementary foundations, perspectives, approaches, and aims, but with a shared grand mission”. This grand mission, adapted to the health context, may leave us with the definition of health data science as the use of digital technologies and information of any kind to advance human health.

Such a broad definition, however, leaves us with a dilemma: what then are the essentials of health data science? What skills, knowledge, and abilities would – if lacking – identify someone as not being a real health data scientist?

For us, these include: strong computational and technical skills; ability to apply statistical/machine learning algorithms; an appreciation of what data can be validly used for and how this relates to how the data arose; and knowledge of ethical, information governance and data security issues. Additionally, commitment to patient involvement through the whole research process needs to underlie all these aspects.

Teaching such wide-ranging skills requires a step away from more traditional teaching approaches. In parallel with learning cutting-edge machine learning algorithms, our students will participate in team-based learning, flipped classrooms, work-integrated learning, and real interactions with industry clients and patients.

Building on LSHTM’s global reach and wealth of experience in data science for health research, we have crafted a programme which will enable our graduates to develop the broad skills and expertise required for them to play a vital role in the emerging health data science landscape, contributing to LSHTM’s mission of improving health and health equity worldwide.