Skip to content

Challenges Evaluating mHealth’s Success

September 25, 2014

I saw another exciting news story on a mobile health intervention the other day.  I honestly don’t remember the company or product, but what stuck with me was the declaration of success based on 10 patients using the product for three months.  Success was touted in terms of cost reduction and resource utilization reduction in a before/after analysis.  This inspired me to collect some thoughts on some of the challenges around evaluating success in mHealth.

Challenges evaluating mHealth success_cHealth Blog

mHealth represents the collision of two interesting worlds — mobile, which changes on what seems to be a daily basis, and health care, which changes infrequently, only after significant deliberation and usually much empirical analysis.  In the tech (mobile) world, companies are talking about creating a minimally viable product (MVP), getting it out in the market, assessing adoption through metrics such as downloads and customer feedback, and iterating accordingly.  This would seem to make sense in the consumer world where the goal is to sell a game, an information app or productivity app.  If people use it and are willing to pay, that proves its utility, right?

There is something to this line of thinking.  Empiric market success is in some ways the ultimate success, at least for those who want to make a big difference in how humanity benefits from technology.

But does this work in health care?  I’m not so sure.  As clinicians, we’re trained to turn our noses up at this sort of measure of success.  But maybe we’re the ones who are wrong.  Let me use the 10-patients-for-three-months example to illustrate some issues.

  1. Selection bias. Virtually all pilots and trials of any sort suffer from this to some extent.  These days, it seems that patient/consumer engagement is the holy grail and we all must realize that people who show up to enroll in any sort of study are already engaged to an extent.  What about the people who are great candidates for an intervention (conventional wisdom says the disengaged are sicker and more costly) but are too unmotivated even to show up to enroll? Does anyone know how to handle this one? 

    Activity_tracking_about_Center_for_Connected_Health cHealthBlog_selection bias

  1. Regression to the mean. This is a pesky and annoying one — and a favorite of folks trained in public health — but unfortunately it is a real phenomenon.  This is the stake in the heart of virtually all before/after studies.  If you follow a group of people, particularly sick ones, a certain percentage of them will get better over time no matter what you do.  The more sick the starting sample, the more dramatic the effect.  This is why some sort of comparison group is so helpful and why before/after studies are weak.
  1. Small sample size bias. This one can go either way, meaning you can exaggerate an effect or miss one.  If you want to run a proper study, find someone who has training in clinical trial design to estimate the size of the effect of your intervention, and thus the size of the sample you need, to show its efficacy.  Lots of technical jargon here (power calculations, type I error, type II error, etc.), enough to make your head spin. But bottom line, you can’t really say much about the generalizability of data based on 10 patients.
  1. Novelty effect. I made that up, and there is probably a more acceptable scientific term for it. But what I’m referring to is, when you take that same group of people that was motivated enough to enroll in a study and apply an intervention to them, the newness will drive adoption for a while.  We see this all of the time in our studies at the Center for Connected HealthThe novelty always wears off over time.  In fact, I’d say the state-of-the-art in understanding the impact of connected health is one of cautious optimism because we haven’t yet done long term studies to show if our interventions have lasting effects over time.  There is room for argument here, I guess, but three months is awfully short.

    Challenges evaluating mHealth success2_cHealth Blog

Why is health care tech different than finding the MVP in the rapidly-changing, market-responsive world of mobile tech?  One reason may be that we’re dealing with health and sickness which are qualitatively different than sending a friend the latest snapshot from vacation.  It is cliché to say it, but lives are at stake.  So we’re more careful and more demanding of evidence.  Is this holding us up from the changes that need to occur in our broken health care non-system?  Possibly.

It is true that a well designed trial with proper sample size is expensive and takes time.  Technologies change faster than we can evaluate them.

One thing we’ve done at CCH is design studies that use a large matched data set from our electronic record as a comparator.  This speeds things up a bit, eliminating the need to enroll, randomize and follow a control group.  Results are acceptable to all but the most extreme purists.

What ideas do you have on this dilemma?

15 Comments leave one →
  1. September 29, 2014 12:35 pm

    The issues you mention are, of course, pervasive wherever and whenever decisions are based on statistical interpretation of data. (Naturally, I’m of the opinion that we should strive to make all decisions based on that.)
    1. Selection bias. This is an insidious problem that is often easy to overlook, but that can be totally devastating when trying to generalize your results. In consumer lending we use a process referred to as “reject inferencing” to alleviate this problem somewhat. Imagine you’re a manager at a bank and you have a group (hopefully large) of individuals that applied for a loan. If you approve everyone, you’ll be out of business in no time. A small number of individuals will not pay you back and thereby wipe out any profits you might have made on the rest of the population. So you come up with an “acquisition strategy” which decides who to decline because they are too risky. This strategy takes stuff like applicant’s Fico score, income, and lending history into consideration. Much of this data comes from credit bureaus; when you apply for a loan, lenders are allowed to peek at your credit bureau record. You then watch the population that got approved for the loan and see whether or not they default after a certain period of time. This gives you a “bad rate” for your acquisition strategy, hopefully it is low. The problem is that this bad rate is not generalizable to the whole population that applied, because you don’t know how people that got declined would have performed had you approved them. Therefore you don’t really know how well your acquisition strategy worked.
    There is no ideal solution to this, since you really want to know what would have happened if you had approved everyone. However, you can make some inferences about the declined population: if they defaulted on another loan after being declined by our bank, then likely they would have defaulted on our loan as well, if we had approved them. This information is captured by the credit bureaus. Banks can request default rates for certain populations from the credit bureaus. In this case the population would be “one minus the acquisition strategy.” The default rates of the approved and declined populations are then blended to give an overall bad rate.
    The situation in healthcare is more complex, of course. First there is no credit bureau equivalent that I’m aware of. Second the definition of “default” is multi-dimensional in healthcare, as there are many forms of outcomes. This solution may remain theoretical in healthcare for the time being.
    2. Regression to the mean. I think “weak” is a generous term to use. Without a control group, any conclusions about the effect of an intervention are meaningless.
    3. Small sample size bias. I’m always amazed at conclusions people draw from small samples. People, even statisticians apparently, greatly underestimate the effects of noise. In statistics we talk about the significance of a result, which captures how much confidence one can have that the effect seen is real. We’ll skip the math (you can find it on Wikipedia if you’re interested), but suffice to say that the correlation needs to be very strong to be able to draw any conclusions with just 10 data points.
    4. Novelty effect. Keeping participants engaged is one of the big issues in healthcare, I think. It gets back to tickling the dopamine receptors to push people in the right direction. Mobile technology should be able to really help here. Just like in banking: keeping people financially healthy is an on-going process that spans years.

    Lives are at stake, of course, but that cuts both ways. Ask an Ebola patient whether they’d mind taking a drug that wasn’t fully tested, but, who knows, it may help them.

    “Fail fast and often” is only an option when the cost of failure is relatively cheap. In many cases in healthcare failure is expensive and care in experiment design and outcome interpretation is warranted. Having said that, there are likely also many areas in healthcare (I’m thinking probably in the wellness and prevention arenas) where innovation methods used in marketing and tech could be used effectively. Even in those cases, however, I’d advocate proper data collection and interpretation. How else are you truly going to know if you succeeded or failed?

    • September 29, 2014 9:43 pm

      Thanks so much for this thoughtful commentary.

    • October 1, 2014 9:59 am

      How often is the targeted patient/caregiver population represented in the design process? We are trying a different spin on the entrepreneur pitch that is NOT focused on the business/market opportunity, but asking entrepreneurs to describe HOW they would collaborate with a patient philanthropy/organization in a way that accomplishes milestones relevant to teh fit-for-purpose from an end-user standpoint. It’s our first time doing this as part of our conference, and we are open to suggestions if we do this again the following year.

      Information about our pitch session can be found at

      • October 1, 2014 12:04 pm

        Thanks for sharing this innovative approach. Please keep us updated on what you learn

  2. millenson permalink
    October 2, 2014 3:47 pm

    Joe, you are absolutely right from the point of view of clinical ethics, science and patient welfare. On the other hand, if one looks at say, the amount of money raised by Healthways in its IPO before “regression to the mean” finally reared its ugly head, there is another factor to consider: If they discover it doesn’t work, you don’t have to give the money back. Cochrane vs. PT Barnum.

  3. October 14, 2014 1:00 pm

    Clearly Joe, improved outcomes (preventative, medical, surgical, palliative) are the only measure of ‘success’ in health care, mhealth or dig health. Digital or mhealth has the means of tabulating and revealing those outcomes according to the thousands of contributing and non contributing variables via integration and standardization of EMR/EHR and billing systems.

    Don’t’ you find it interesting, confounding and somewhat frustrating that the ‘connected’ health revolution you are leading searches to evaluate ‘successes’, but has been almost completely unable to implement the necessary dig health software and hardware integration on a global basis which can both contribute to those changes and evaluate those outcomes for the betterment of medicine? Sure, MGH and Partners is somewhat integrated, but your institutions represent a fraction of the clinical outcome data and distinct variables relative to our nation or world.

    How can mhealth or digital health realize it’s full potential without full integration and standardization of software and hardware? To measure or evaluate mHealth’s success, the 4 clinical outcomes need to be measured in real time according to all their thousands of variables (age, sex, disease, hospital, income, date, etc.). Outcomes are the only ‘products’ manufactured and produced by health care in America therefore, digital helath or mhealth hardware or software will be unable to compete capitalistically in a marketplace based on the quality, cost or improvement or successes of outcomes if they are not tabulated and revealed.

    Companies such as Modernizing Medicine and Flatiron health are ‘sniffing’ around the edges of standardization and tabulation of clinical outcomesm, without revelation. With the 5 major industries supporting our congresspeople and the administration opposed to tabulation and revelation of health care data in real time (insurance, pharma, med mal, electronic medical records manufacturers and Academe) how do you as the leader of ‘Connected Health’ in the world plan to measure or “evaluate mhealth success” on a non-empiric global basis??

    I’m intrigued by the contradictory aims of the major players in the field of mhealth who want to sell devices, hardware and software, but don’t want to truly evaluate their quality or costs concerning clinical outcomes by revealing outcomes according to all variables.

  4. October 14, 2014 1:05 pm

    Thanks for taking the time and for your incisive commentary.

  5. October 14, 2014 1:17 pm

    Biggest problem seems to be calling the phenomenon by the wrong name

    • October 14, 2014 1:46 pm

      please elaborate, Matthew. You’ve whet our collective apetite.

  6. liam glynn permalink
    October 15, 2014 10:25 am

    Very much agree with Nick on several points but bottom line is we need strong evidence and sound research methodologies to generate this evidence…. we have to do the RCTs or we might as well join the circus…. our latest contribution described in the link below:

    …only a small trial but compelling intervention effect size…. now at least when I sit and recommend an physical activity app to my patients I dont feel like PT Barnum…

  7. October 15, 2014 11:52 am

    No one wants to feel like PT Barnum. But you also know that clinical trials are very artificial environments so one could argue that an effect derived from a clinical trial may not translate into an effect in the general patient population. We’ve certainly see that. I’ve learned to argue both sides of this one 🙂


  1. The yin and yang of mobile healthcare « Jerry Fahrni
  2. Challenges Evaluating mHealth’s Success
  3. Evaluating Apps Based On 10 Patients? | Medical Apps Today

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: