We sometimes forget what medicine was like before data was at its centre. The well-known tenet in data science of ‘garbage in, garbage out’ has driven many analysts to focus on the quality and origins of their data, yet few may realise how literal this tenet is. In the 18th and 19th centuries the foundations of modern clinical practice were being laid, and humans themselves were serving as input-output (I/O) devices, frequently and with no mirth whatsoever, via their own gastrointestinal tracts.

Shortly after its founding in 1774, the Royal Humane Society focused on treatments for the apparently drowned and began espousing the importance of rapid artificial respiration. The most potent method was thought to be the blowing of tobacco smoke into the rectum to deliver oxygen and presumably chemical stimulation, resulting in a hastily conscious and mildly embarrassed Englishman. The concept gained such popularity that tobacco enema kits, complete with bellows, lined the River Thames. Eventually in the early 1800’s, as the number of attempted resuscitations increased, the questionable efficacy of the tobacco enema became clear to the Society and a competing technique, respiration via the lungs, won the day.

Some treatments took far longer, even centuries, to establish as ‘garbage in’. Around the same time as tobacco enema kits, the intrepid explorers Lewis and Clark were assembling the latest medicines to begin their journey from the edge of the Missouri River to the Pacific Ocean. The greatest risk to such an undertaking in those days was coming down with an infection but more often an adventurer needed relief from bog standard constipation; luckily, one of the most powerful remedies for both was “thunderclappers”, oral pills filled with mercury chloride. It wasn’t until the 1920’s that mercury was finally established as a poison but during Lewis and Clark’s expedition it was frequently used to purge, leaving a unique mercury signature in the latrine pits of their camp sites across the western United States. While it did not necessarily aid the health of the expedition, many years later researchers utilised Lewis and Clark’s journal entries and the physical evidence of mercury-laced latrines to reconstruct a map of their epic journey.

The ins and outs of the (infected) human gastrointestinal tract together with carefully catalogued health records ultimately led to the best example of 19th century health data science as well as the birth of epidemiology. In 1854, John Snow was a British physician investigating a cholera outbreak in London. Snow took a dim view of those who blamed the outbreak on ‘bad air’ and set about mapping cholera cases throughout the Soho neighbourhood. After much painstaking work using medical and public records to produce a map of disease cases, he used statistical analysis to identify the water pump on Broad Street as the cholera’s source and subsequently expanded his analysis to implicate the contaminated water of two companies. Incidentally, the insights inherent in Snow’s analysis, namely fecal-oral transmission of the pathogen, were politically controversial and did not gain credibility until another statistician William Farr (namesake of the Farr Institute of Health Informatics Research), investigating later cholera epidemics, essentially replicated Snow’s results. The validity and robustness of Snow’s data was therefore established and the future incidence of cholera, along with many other waterborne pathogens, was drastically reduced by water sanitation.

No matter the method, humans have survived for millennia by using the Inputs/Outputs of our own bodies in isolation or in anecdote to guide our health decisions. Now we find ourselves flooded with more health data than Snow or Farr could have imagined; clinical practice is now firmly based on meticulous data gathering, cataloging and analysis. And, thankfully, our guts are taking somewhat less of a beating.