Skip to NavigationSkip to content

Thinking big data

Published on 23/01/17 at 03:53pm

The potential solutions offered by ‘big data’ seems to offer a vision of the future but potential solutions aren’t enough for those in need of medication now. Ben Hargreaves takes a look at how data analytics are beginning to transform the pharmaceutical industry, from clinical trials to drug discovery.

‘Big data’ is a thorny topic. It is one that you read about in the news with increasing regularity. Most often linked to, though not always associated with, the online giants: Facebook, Google, Twitter and so forth. In these examples, there’s a somewhat uneasy relationship between the provider of a service and what you must pay in exchange. The shadowy background of data collection is what the prize really is for such companies, which they can sell on in the form of precisely-aimed advertising. The information gathered through such services offers companies a vast amount of data upon customers’ habits, desires and interests.

Big data is essentially able to construct your online persona and, using algorithms, understand what you may be interested in. How does this relate to the pharmaceutical industry? The same approach is now beginning to gain momentum within the industry but applied to a different motive. Instead of using the data to sell products, big data can be used as a tool to aid in clinical trial recruitment and identifying drug candidates.

The struggle to recruit sufficient patients for clinical trials is no secret. There are numerous studies looking into improving clinical trials numbers, as well as many variances in methods of how to improve them. What is clear is that there is no one magic bullet that will fix the issues; if clinical trial recruitment is to cease being such a headache for the industry, it will be through many small steps. At present, half of clinical trials are delayed by issues with recruitment – with each day where a drug is delayed meaning that patients lose time with potential treatments and the pharmaceutical companies lose money for drugs not reaching market.

Allowing machine learning to do the heavy lifting

The prospect of how big data can be applied to patient recruitment seems simple: if a database of patients’ illnesses exists and this is cross-referenced against clinical trials that are available in the area, a suitable solution can be found that is beneficial both to the patient and to those recruiting for the trial. One receives treatment that is the most appropriate for their condition and the other has a patient in their trial that has the condition they are looking to treat.

At present, the amount of medical data doubles every 18 months. By the year 2020, it is expected that the amount of medical data will double every two to three months, according to LifeScienceLeader. It means that there is a massive amount of data being created but it is not necessarily available for use nor is it being taken advantage of. There has been some progress, however, as companies begin to grasp the reams of data available.

Katie Mazuk senior director, global head investigator and patient engagement at Johnson & Johnson, commented upon the use of datasets within Janssen. She notes that, “Population datasets are used to look for geographical or health care system hot spots for certain disease areas which can better inform where we select clinical sites and do direct to patient outreach. Ultimately, an ideal situation would be if a pop-up/prompt could appear to a Health Care Professional (HCP) in a patient’s Electronic Health Record (EHR) that they may be a good candidate for a clinical trial based on data. Connecting all of the HCPs who see a patient with the potential treatment options available, including clinical trials, is really where we want to go. Technology is rapidly picking up in this area and several companies are investing in Big Data or Real World Evidence. Janssen is involved in several pilots with companies that are able to query hospital data to identify patients to HCP’s and provide insights to pharma companies to optimise protocol designs.”

The ties between the pharmaceutical industry and digital technology firms are clearly growing tighter, especially recently, but it’s not quite easy just yet. There’s simultaneously a fear and an acceptance that applying data analytics and technology is an inevitable process; there’s an awareness that the world is pointing towards the digital solutions to many issues but there’s a cost. In a recent Eyeforpharma industry report on clinical trials and technology, an anonymous data integrity expert was quoted as saying: “When you add technology, you need to navigate a learning curve; the human animal does not always respond well to change”. For a long time, the pharmaceutical industry has been regarded as conservative in its approach and reluctant to change but not all change has to occur from the inside.

Struggling to predict what the future may hold and what could become every day is a challenge facing any large company. Thomas Watson, for example, is reported to have once remarked, "I think there is a world market for maybe five computers." He was the head of one company that is trying to take advantage of the mass of data available in the modern world and that is a company most readily associated with ‘big data’, IBM.

IBM Watson Health is named after the company’s illustrious former CEO and is the tool that IBM are offering to pharmaceutical companies to aid them with a variety of tasks. The key offering point of interest for those looking to improve the running of their clinical trials is IBM Watson for Clinical Trial Matching.

We spoke to Thomas Balkizas, IBM Watson Healthcare executive lead, about the potential benefits to be found by applying the vast sets of data available to IBM to develop cognitive systems for use in clinical trials. He explained: “Cognitive systems are not programs, they’re trained. They’re based on a body of knowledge and the quality of that determines the quality of outcome. You use cognitive systems to ask questions, upon concepts not just with keywords. You can train it upon a domain and it will become an expert in that domain. We’ve been training Watson and it’s been in the public domain since its appearance on Jeopardy gameshow in 2011 – that is just one application, with question and answer. We’ve trained and built a number of APIs (Application programming interfaces), probably about 56 to this day, and counting. These can be chained together or used in various ways to generate solutions”.

He continued, “In healthcare, you mentioned clinical trials, it’s the same principle of machine learning and natural language processing. You can give Watson for Clinical Trial Matching a number of clinical attributes; it could be anything from age, demographic, lab results, or, specifically for oncology, details of stage, size and so on. Watson will very quickly identify, and by quickly I mean within seconds, clinical trials for which the patients might be eligible. It will do this by criteria level evaluation based on the patient’s attributes. To give you an example of how complicated this problem is currently, if you’re in the care of your doctor and there’s no treatment approved, for example in cancer, then this doctor will, down the phone, speak to clinicians about what treatment that might match the inclusion criteria. This process will take a long time because the inclusion criteria are notoriously complicated.”

This is one of the major issues that the pharmaceutical industry currently faces when it comes to clinical trials. Each day during which a clinical trial is delayed can cost the trial sponsor an estimated $8 million and, with 94% of clinical trials being delayed by one month, that stacks up to a sizeable figure.

The right treatment to the right patient

An independent study by Indiana University found that machine learning could result in drastic improvement on both cost and the quality of health care offered, focusing upon the United States. The researchers used machine learning to determine the best treatment at a single point in time for a patient. The study used 500 randomly selected patients for simulations, comparing the decisions of the artificial intelligence against doctor performance. The research found that health care costs were reduced by 58.5% whilst patient outcomes were improved by 41.9%.

With impressive figures such as these, it is possible to become carried away with the prospect of a fully automated future – where artificial intelligence is coolly able to take over doctor’s roles. The reality is far from it; the systems are really of benefit as a tool for freeing doctors from complicated means of identifying clinical trials for patients. This would allow the doctors more time to properly consult with patients, provide patients with full support and a better knowledge of what treatments may involve.

It’s not only doctors who would benefit from this situation, the patient stands to gain by having more targeted medicine and more targeted clinical trials. The latter is the significant benefit from Watson Clinical Trial matching, especially as sometimes a suitable clinical trial cannot be found for patients by conventional means. Research has shown that being unable to offer patients any form of treatment will result in negative outcomes rather than being able to offer some form of treatment, in the form of a clinical trial.

Balkizas states the case for IBM’s services in strong terms: “If not Watson, then what? The lack of patient recruitment is the number one reason a clinical trial fails. It’s finding the right tool to perform a function that is not a human function; that is to match thousands of data points to other thousands of data points from clinical trials to patients. Even if you’re very well trained, it is not within cognitive capabilities to match with all the criteria to consider.”

Data privacy issues

The benefit for the patient, for the doctors, for the CRO’s and the sponsors of the clinical trials is clear to see from the increased efficiency with which clinical trials could be processed. However, the question that will most often be raised regarding the use of vast reserves of patients’ data are privacy issues. It’s a topic that is never far from the news, especially after Edward Snowden’s releases hit the headlines and generated a wider knowledge, and acknowledgement, of the pervasive nature of data collection.

Data has become too valuable to companies and it is in their own interests to maintain confidence in how it is being gathered and used. Recently, the Kings Trust began a series of independent pieces entitled ‘The NHS if…’ and one of which was a piece ‘What if people controlled their own health data?’ The premise of the piece is what would happen if the public were allowed to limit who was allowed to view their own health data. With people becoming more aware of the value of their own data, the piece hypothesises a situation where a crisis in security or the leak of data could provoke a knee-jerk reaction for the public to deny access to their data.

It is a hypothetical piece but an interesting thought-experiment nonetheless; in 2014, for example, medical records accounted for 43% of all data stolen, according to Forbes. Data stolen from cyber-attacks will cost US hospitals $305 billion in cumulative lifetime revenue, whilst one in 13 patients will have their personal information stolen over the next five years, reported Accenture. Hackers are clearly identifying health data as a particular target, and that’s because healthcare records are worth a considerable amount of money when sold on illegally. On top of this, there is a suggestion that healthcare information is simply easier to hack, with the data security not previously being seen as high-priority, especially considering other financial pressures being placed on those looking after patient’s data.

There is a real incentive to make sure that security remains as strong as possible because a public crisis in confidence regarding security of their data could be catastrophic for those involved in healthcare and research. With the speed with which online trends spring up and gain momentum, any movement by the public away from allowing the sharing of patient data could seriously hinder importance research and advances in technology, as developed by IBM.

This is why there has been a rush, of late, to strengthen security around patient data. IBM Watson Health recently agreed a research initiative with the FDA to develop and test the exchange of patients’ health data in a secure manner through blockchain technology. Blockchain technology first came to attention through cryptocurrency, such as Bitcoin, as a secure means of transferring funds; this technology could now be utilised to protect patient data.

The positive effects of this technology might not only be felt by the difficulty with which it could be misappropriated but also by engendering trust between collaborators. If a blockchain framework was able to establish a secure method in which to share patient data then it would lead to the increase in sharing and circulation of that data between research bodies and institutions. It could also allow doctors and caregivers to support individuals under their care and better manage their health. This is particularly important as new forms of data are constantly emerging as technology advances, meaning that even now there are incredibly varied data streams: for example, Electronic Medical Records, clinical trials, genomic data, and health data from mobile devices, wearable technology and the “Internet of Things”.

If ensuring that data is adequately protected is of interest to those in charge of health data then educating the public about the benefits of this practise is also of importance. Researchers benefit from being able to analyse patient’s data and there are many institutions that rely on this data to further their research.

The Institute of Cancer Research, for example, uses patient’s data to identify potential drug targets and to inform the management of patients. As can be imagined, the institute is keen to protect access to this data and has a page on its website dedicated to detailing how it protects patient data and why the data is important to their institution.

A hard sell to the public

The NHS commissioned research, in 2013, into the general public’s views upon data collection and management and used the results to produce a white paper entitled, ‘Survey of the general public: attitudes towards health research’. The report found that “74% of all respondents said that they would be confident that their personal data would be held securely if they were asked by their doctor to take part in a health research study in the UK. 25% said they would not feel confident.”

Public trust in security is then almost without question, as a clear majority of respondents were confident in the NHS to keep their data secure. The more worrying findings are the last to be listed by the report: “33% of respondents would be very happy for their GP to access their patient records to see if they might be suitable to join a health research study. 25% would be very happy for a hospital consultant to access their records and 18% would be very happy for an NHS doctor who does not provide their care but is doing research to access their records.” Clearly there is a significant drop off from trust in security to trust in how the public feel their data is being used.

Balkizas acknowledges an issue with trust when it comes data collection: “People care about their data, they understand that it has value and they value their privacy. People have to trust what we do with big data and have to trust our systems and that we will not do anything untoward with their data. So, we need to build that trust.”

Perhaps a similar example can be found in the UK’s strategy to increase organ donations – levels have increased year upon year through greater exposure and greater knowledge of the benefits; in 2012 rates of consent to organ donations were at 57% that increased to 62% by 2015. It shows a steady increase on a particularly tough topic – not many want to consider what will happen after they die.

The general public’s health is similarly affected by organ donation as by the use of patient data in research, with both resulting in positive health outcomes and increased likelihood of survival with those facing life-threatening illnesses. The question is whether data collection, and revealing the extent to which it pervades, will ever be countenanced by those that work with the data and come out of the shadows. Whilst it remains obscure, there is less accountability to the public but, as a result, there will always be issues of trust.

A brighter future with data analysis

Perhaps this is why, rather than a focused PR campaign for widespread use of patient data, there are instead numerous stories regarding the benefits and breakthroughs based upon the back of that data. It could perhaps be argued that there is an intention to let the benefits speak for themselves rather than engage too much in a dialogue. It’s a tactic that has so far worked well for Google and Facebook, who have managed to dodge serious inquiries into their practises by becoming ubiquitous and convenient in everyday life.

There is no doubt that the use of data analytics and cognitive systems will increase as technologies potential is also pushed forward, and there could be exciting breakthroughs made not far into the future. Those watching the news will have noticed a trend of the big pharmaceutical companies pairing up with technology firms to develop new drug targets; at the forefront of those companies is IBM whose Watson for Drug Discovery is drawing interest from all quarters.

Balkizas explains how this works: “We help organisation that search for drugs to get there faster and be more efficient. We all know it takes a long time; it is widely accepted that it takes between 10 to 15 years from finding a protein target to launching a pill on the market. With Watson, they can search between proteins, genes, drugs and diseases to find interactions to see where they go next, to see what they can research next. Before, it was like searching for a needle in a haystack but now, with Watson, we can shorten that journey to having an idea to pumping millions of dollars into research”.

Perhaps the key point to be picked out is, by allowing companies to search through their reams of data intelligently using Watson’s cognitive systems, it takes away some of difficulty of whittling down targets to the one that is taken to clinical development – ridding the process of the proverbial needle-in-the-haystack scenario. The clinical trial process, as has been mentioned, is a laboriously slow process at every stage and any means to speed this up is being grasped with both hands. Already Pfizer, Teva and Celgene, amongst others, have begun using Watson for Drug Discovery and it will only take the first reported success of a drug discovered in this fashion to get as far as market applications for the uptake to increase rapidly.

Bart Vannieuwenhuyse, senior director of Health Information Sciences at Janssen, spoke to Pharmafocus about the company’s current approach to drug discovery through data analytics. He commented: “In the domain of real ‘big data’ i.e. the multi-omics data types that become more and more available for analysis, the expectation is indeed that this will (as is already being seen) lead to better insights into disease etiology and underlying molecular disease pathways. This in turn can lead to identification of novel therapeutic targets.”

He continued: “To really take full advantage of this opportunity, it is necessary to combine medical, biological knowledge with advanced analytics skills whereby machine learning and other pattern recognition methodologies are crucial. While Janssen is strengthening its capabilities in this area, we are also a strong believer in open innovation and public-private collaborations, such as IMI (Innovative Medicines Initiative), to approach these challenges. One example in which we are combining large public datasets, like biochemical activity datasets, with similar internal data is an approach to gradually build ‘in silico assays’ of biological activity based on molecular structure of potential new chemical entities. By using these large datasets and applying machine learning and predictive modelling to the data, we can predict for a number of biological targets what the best chemical structures should be and, vice versa, from a chemical structure predict what biological activity this will have.”

The steps taken by Janssen are a sign of wider movements within the industry and it will only be a matter of time before the initial small successes offered by this approach may see more pharmaceutical firms ready to take the plunge and carry the whole industry with them. The pharmaceutical industry is notoriously slow on adoption of new technologies, preferring a more cautious approach when tasked with trying to oblige regulatory bodies crossing continents and countries. However, with technology developing at such a rapid pace, there is no denying the tide of technology will sweep industries along with it, and those companies ready to remain flexible; the cost for those that do not will be to be left foundering in the water.


It doesn't matter if the lyric is deep and poetic or simple and straightforward, as long as it works. One good way to come up with lyrics is to write a well-written topic sentence and then try writing a paragraph that flows naturally from it.

Mission Statement is a leading portal for the pharmaceutical industry, providing industry professionals with pharma news, pharma events, pharma service company listings and pharma jobs,
Site content is produced by our editorial team exclusively for and our industry newspaper Pharmafocus. Service company profiles and listings are taken from our pharmaceutical industry directory, Pharmafile, and presented in a unique Find and Compare format to ensure the most relevant matches