The full list of BREATHE data-driven commercial services is set out below.  For the purposes of this Case Study, attention is focussed on Services 1-3:

  1. Fast and accurate feasibility counts in large population-level datasets such as Clinical Practice Research Datalink (CPRD) and Secure Anonymised Information Linkage (SAIL) Databank, with links to Public Health Scotland (PHS) and Health and Social Care Northern Ireland (HSCNI) to round out four nations’ UK coveragewhere necessary.
  2. Full support for projects, studies and publications using any of the above datasets, plus genomic and/or disease specific datasets in respiratory science such as EXCEED (Extended Cohort for eHealth, Environment, and DNA); GASP (Genetics of Asthma Severity and Phenotypes); Cystic Fibrosis (CF) Registry, EAVE II (Early Pandemic Evaluation and Enhanced Surveillance of COVID-19) and COVIDENCE.
  3. Full support for projects, studies and publications in any respiratory condition recorded in CPRD or SAIL databanks, such as asthma, chronic obstructive pulmonary disease (COPD), Interstitial Lung Disease (ILD), Obstructive Sleep Apnoea (OSA), CF, COVID-19 (with particularly increased periodicity of data) and Respiratory Syncytial Virus (RSV).
  4. Training workshops, using synthetic data, for data scientists in industry and academia.
  5. Storage, curation, and linkage of respiratory and respiratory-related datasets in SAIL TRE.
  6. Industry Forum.
  7. Patient and Public Involvement and Engagement (PPIE) through our BREATHE Curiosity Group.

Insight into action

BREATHE data-driven commercial services are built on, and improved by, the continuous feedback loop provided by our commercial and data partners and the insight and input of our Industry Forum. (See Figure 1)

Through continuous engagement with our Founding and Supporting Partners, BREATHE has narrowed down the datasets that we choose to hold, curate, and promote and the disease areas that we pro-actively specialise in. These choices will constantly evolve, as market needs change, and are regularly checked against pipelines of leading pharma companies and the demands of innovative small and medium sized enterprises (SMEs) for the data necessary to train their AI algorithms.

We keep abreast of current trends and anticipate the market changes along with identifying and agreeing partnerships with the most appropriate and collaborative data partners.  Currently , COVID-19 and related respiratory vaccines and therapeutics remain are front of mind for many BREATHE commercial customers, but renewed interest in chronic and niche respiratory diseases is accelerating as the pharma industry’s focus on the pandemic slowly starts to fade.

Innovative SMEs see great opportunity in developing algorithms to help support care pathways in, for example, virtual wards and demand for access to data in this space is growing.

For our customers, awareness and understanding of what data are available, fast access to as near real time data as possible and rapid response to questions on feasibility, counts and study potential are crucial.  Also important are quality markers such as size and providence of datasets, where data are housed and curated and where studies will be carried out.

Figure 1 – BREATHE Data-Driven Commercial Services

BREATHE/ SAIL Databank TRE partnership 

The bedrock of our data-driven commercial services is the partnership between BREATHE and SAIL Databank TRE. The SAIL Databank TRE has been active since 2007 and provides a reputable and commercially experienced space within which BREATHE PIs, partners, and customers can conduct respiratory research.  BREATHE analysts, within SAIL Databank TRE, can quickly and accurately perform counts and feasibility analyses for customers on datasets housed and curated by BREATHE and have been heavily involved in the extensive curation work done by BREATHE to produce ready-to-use disease-specific datasets for asthma, COPD, and ILD from within Big Data collections such as CPRD and SAIL, as well as importing high value genomic datasets such as EXCEED and GASP.

To give a sense of scale, in SAIL Databank alone, there are now over 75,000 records of events by patients with a diagnosis of COPD since 2010.  All these records are linked across primary and secondary care and can provide a rich longitudinal picture of COPD care and care pathways in the whole Welsh population.  BREATHE is currently in discussions with several AI companies around training their algorithms to better predict and manage patients with COPD (and asthma) in a virtual ward setting.

In addition, BREATHE also has relationships and access agreements with other TREs such as Public Health Scotland TRE eDRIS (electronic Data Research & Innovation Service) and can offer a similar standard of service for customers looking to work with Scottish specific datasets such as EAVE II. BREATHE has just won an RFP with a large Pharma company seeking to understand the effectiveness of its recently launched COVID-19 therapeutic using Scottish data from EAVE II via eDRIS.

Commercial model

As alluded to above, BREATHE has made a decisionto proactively focus on a small number of datasets and to specifically develop pre-curated cuts of these datasets for asthma, COPD, and ILD in the first instance.  These data cuts, along with disease specific datasets such as EAVE II, EXCEED, and GASP are what BREATHE’s commercial team can promote to Pharma and SME customers, using a multi-channel marketing strategy and collaboration with Edinburgh Innovations (BREATHE Lead Institution University of Edinburgh’s commercialisation service).

Our multi-channel marketing strategy leverages a BREATHE website, fed by our active presence on LinkedIn and Twitter, aligned behind Commercial Leaders who pro-actively target medical, health outcomes and commercial customers with a special interest in respiratory disease.  This strategy enables BREATHE to showcase the work we do quickly and efficiently, as well as promote the data we hold, and to build a thought-leadership platform from which to better engage with industry customers.

Since its launch in Jan 2022, our BREATHE website has had over 3,000 unique visits:

Home | BREATHE (

For an organisation at this stage of its commercialisation journey, awareness and understanding of who we are, what we do, and why we do it, remains critical.  As too, is being available to our customers through multi-channel means. Given our target commercial market, BREATHE prioritises LinkedIn (supported by Twitter) as the means through which we seek to drive customers to our website and/or to contact us directly.

For examples of BREATHE LinkedIn activity, please see:

(7) BREATHE – The Health Data Research Hub for Respiratory Health: Posts | LinkedIn

Commercial rates and approvals

As a collaboration of academic institutions, NHS, industry, and third sector, BREATHE is committed to a not-for-profit operation, but still needs to cover costs, meet academic overheads, and to make a multi-economy model work to sustain ourselves in the long-term. To this end, BREATHE has worked with Edinburgh Innovations to identify and set competitive commercial day rates (by role) for all BREATHE staff involved in our data-driven commercialisation services. These rates have been bench-marked where possible to external comparators and anchored in numbers adhered to and published in the public domain by universities. Monies raised through commercial projects, studies and publications by BREATHE will be held in a rolling three-year clearing account within the University of Edinburgh and discharged, at the BREATHE Board’s discretion, to pay for project delivery, salaried staff costs and/or investment in new services.

The future

BREATHE is fiercely ambitious and keen to live up to its vision of driving the use of health data in research to transform respiratory health. The next steps in this journey will focus on the production and dissemination of synthetic data via collaborative training workshops at which prospective industry customers will come and experience the power of our health data in a totally safe environment. This is in no small part due to the grant BREATHE recently received from UK Research and Innovation to cover the capital cost of a two-year synthetic data software license.

At this point in its history and evolution, BREATHE sees synthetic data* as the perfect tool with which to fully engage prospective customers at a reasonable price (the cost of a pass to attend a training workshop), while simultaneously showcasing key BREATHE datasets that could be used for full projects, studies and publications based on customers’ current, and pipeline, interests.

*Please see Appendix for BREATHE synthetic data expansion proposal