DataDirect: Synergy with the Mobile Technologies Core | MCN
Let me just see. I'm going to talk for maybe 10 minutes about the data direct tool specifically. But then I think for another five, 10 minutes, I can demo any aspects of the tool or a documentation website wherever you guys want to take it. It's most helpful to you. I love that we did the introduction because it's a hugely diverse group from lots of different departments. You probably have interacted in a lot of different ways with central data tools.
If you have used data direct before, hopefully I'll tell you something new about it. If you haven't, hopefully after you can try it out. The one takeaway about data direct, it's a home grown tool. It's supported by a small nimble team. When you have features that you don't see that you would like added to data direct or data that you would like added to data direct.
We have a very quick turnaround because it's a small localized team that's the one takeaway. I'm going to talk about how data direct can really enable augmenting all the work you guys do with mobile technologies. We sit in Michigan Medicines Office of Research. We're one of the units there like IRB, fast forward Medical Innovation, the Central Biorepository. This is one of the strategic investments from about ten years ago when leadership in Office of Research said there's a lot of data, a lot of its electronic, it's really hard to get at all throughout the data life cycle. Whether it be looking at a feasibility assessment of how many patients we see with a particular condition for a study.
All the way to a need for sharing data. Now, there's an NIH requirement that data be made available for reproducibility, all the way to storing, how long should I store it? Where should I store it? The officer research created the data office. We used to be called the honest broker office. We still serve anonymous broker function, but you could think of it now as the research data warehouse.
Then data direct that sits on top of that. As well as a group of programmers who can pull data for you, so the various needs that you can have met via the data office and data direct tools specifically is cohort discovery. A lot of people use these tools for large retrospective observational studies.
Or to design a prospect of study based on, if you have a large enough, are there enough patients for an industry trial. Could we start to use them in a recruitment way? There's recruitment features in the tool itself. There's a real intersection with the Precision Health efforts to expose which of our Michigan Medicine patients also have genetic data, mainly genotyped. Some has actually been fully sequenced that you can use together their genetic information with their medical record information, with their mobile device data.
You can start to glue all these pieces together. Data direct is supposed to be a go to place where you can at least see what's out there and what's possible. Then we do a lot of work helping Michigan medicine data map to a national standard so that it can be used in network based research. If you and your PI, you have a study with Duke and with Mayo Clinic, everyone has Epic, but we all implemented it differently.
So our data needs to be mapped to a standard so we can more easily share it. Then a lot of what we do is help determine based on what data you need who could have it. If I just need a limited dataset, I need dates, but I don't need someone's address or their first name or last name. Could you guys pull a limited dataset for me? Can I give this data to my fellow that I'm working with this semester? I have a visiting med student. Could they have access to data direct or could you pull data for them? Some researchers say to us, could you just give me all the data? I don't, just a finite set. I want to build a machine learning algorithm.
I want to build a predictive model that's going to tell me a given outcome. I need all the data. Can you just give that to me? Can I store it in the Cloud? Can I store it on Dropbox? These are the nuances that in addition to what data they need, it's all these different flavors of who can have it, where it has to be stored, what requirements are needed. Of course, law can data sets, and that's really what today is about. That's what this whole mobile technology core is about.
And then how can I link data I have from a Fitbit or a sleep study with The patient's diagnosis in my chart, or with social economic status data from their geo location data or surveys we ask them to fill out as part of prompt or as part of the intern study. How can I link all those together so I can get a fuller picture? We're not IRB and we're not compliance, but we do everything through those two lenses. That's a lot of why people come to us. And they'll ask us, what kind of IRB application do I need if this is the type of study I want to do or if I'm going to share this data with Glaxo Smith, Kline after it, after I do the collection, they're going to pay me for those findings. What consent do I need? What data sharing agreement? Who in innovation partnerships do I work with? I know we have someone on from that in order to make the partnership strong and to make us able to do those things.
Our main customer is the Michigan Research Community at large. Increasingly network across the country are things we help researchers participate in. And then industry is another one. I'll talk most about data direct if you haven't used it before. The main thing to know is that there are two main modes, if you will.
Without IRB, you can do cohort feasibility. You can get a good sense of how many patients meet some inclusion, exclusion criteria. You can also see what is the race breakdown, what are the insurance type breakdowns, all without IRB, it's just aggregate counts With IRB, you can download your own patient row level data export to do your analysis.
There's actually 22 versions of data direct And this came out of the Precision Health Initiative because we didn't want everyone to have to get a level two account in order to use this really powerful tool. We wanted those who were outside of the firewall to be able to do this in a very safe and efficient way. There's two different URLs you can see here and you don't have to write it down. I'm going to share these slides with Nicole and Victoria. The second to last slide has a ton of links on it that you may need. Just know there are different features in these two tools.
They work close, but they're not identical because in the level one version that sits outside the firewall, there are dates, but they've been shifted. You can still do time relationships, but the dates have been shifted and all identifiers have been removed. The level two has real dates, it has other identifiers.
If you've been approved for recruitment, let's say you could have access to patient E mail or home address. That's the two. Sometimes it's de identified version, sometimes it's called the Precision Health version. That's the level one. But I'll share with you a couple of different features in general, Data Direct is a tool to get at structured data from the electronic medical record. This is the data go back sometimes to 2000, depending on what type of data we're really good at billing data, encounter date of admission, dates of discharge, those data go back lab data, got some 2004, 2006, they start to be a little bit more robust and less missing data still before we got Epic, an MR product, but we've been able to glue together all the different electronic systems that Michigan has had over the years.
Then as I mentioned at the top of the talk, there's lots of opportunity to continue to enhance the tool and make it more usable. I'm constantly asking groups like yourselves and other research communities within Michigan what is still missing? What do you need? One thing we heard a lot was socioeconomic status data, that our only way to adjust for SES variables in the data we had was those on Medicaid you could use as a proxy for those who are at more disadvantage. We built in two types of data. One is data that we generated through geo location information that's taking each patient's street address, determining the latitude, Longitude, associating that with the census block ID. And then because of national data sets.
And many of the data are actually curated at the Institute for Social Research here at Michigan, but they make available things like household income, highest education level, crime, access to parks. Over 100 different data elements are available from those geo location data. It's not the individual patients home income or education level. It's the block in which that patient lives.
This is the average household income. This is the average percent of households that are immigrant led or female led households. It gives you a real sense of these two patient populations with hypertension did very differently.
Could there be socio economic factors playing into those different outcomes? The other things, a lot of our colleagues, I don't know ma, if you've used this, but questionnaires that patients get, I answered when they come to see their primary care physician. Things like food insecurity. Did you have to forego medications this month because of cost? Did you have to forego payment of a utility or payment of rent or your mortgage? And again, a sense of some of the factors impacting a patient and possibly their health care researcher said, hey, every time I go in I have to recreate asthma or I have to recreate some phenotype or cohort. Could you just have starting populations? I could go in and grab like I just want all surgical patients because I'm doing anesthesia and I don't want to have to type in every operating room in order to capture those. I just want all primary care patients. I want all diabetic patients.
If there's a particular population that you and the researchers you support have to build again and again, we can build that as a starting population or what we call computable phenotypes in the tool. We also just heard there are certain data types that we weren't pulling over from my chart that are really helpful. The family medicine researchers, some of them have a lot of patients hearing impaired. And they wanted to know in the cohort definition which patients prefer ASL as the language to conduct the visit in.
We were able to build that in upfront. Researchers wanted ejection fracture in a structured field. They didn't want to have to go to the echo report and read it. Same with pulmonary function tests. I just need to know who's enrolled in the portal.
So I can potentially use the portal as a way to get questionnaires. Or to recruit who has a Michigan PCP, pregnancy outcomes, death data, who's on a given inpatient service. Think of what you and your researchers might still struggle over with the data direct tool.
And we could look at building that in right now. We have really good medical data going back, I don't know, 15 years of clinical data from the EMR. I think we're close to 100,000 patients who consented to the Michigan Genomics Initiative study, and we have genotype data about them. Increasingly though, with your partnership, we're really looking to round out some of the other data types I talked about that will give us a fuller picture of the patient. Certainly incorporation of mobile help technologies, whether it be watches, monitors, it's going to be an incredible, incredible asset for better able to understand the patient. I think the role that research data warehouse, the data office plays is having a way to centrally link those by a common patient ID, by a common encounter ID.
And researchers can better analyze those findings. I mentioned some of these, but when in doubt, I just encourage you to reach out to us and say, do you have this data? Where's the prompt data If I can add this current data? Like yesterday we got asked for sleep studies. The data, the findings and the measurements from sleep studies live in a PDF format.
Is there any way to get a data dump from the vendor, Philips, so that we could have important measurements about a sleep study in a structured field. That's some of the things we can help you work on. Then here, see this slide. I'll share them with Victoria. Links to the different data direct tools.
The documentation site is really helpful. It tells you where the data came from, if any mapping or clean up was done with the data, how far back it goes, I think it's really helpful and then the e mail address to set up a consultation with us. So I'm going to stop there and stop sharing. See if there's any questions. See if I can demo any of the tools for you all. Oh, thanks. I saw Gabriel has a question in the chat about dates being shifted, and I'm just reading Emily's real past too.
Yeah, Okay, great question. We call these applicated dates, but essentially. Our programmers have done is taken the data, the database that underlies the level one version outside the firewall and shifted all the dates within a given patient's record all in the same direction. Maybe it's one week, two weeks, three weeks, or maybe it's one week back, two weeks. But all of the dates within one patient's record will be shifted the same amount. You can still tell age at a given episode.
You can still tell when was the antibiotics started and when did their white blood cell count change. Dates are all the time between them. The temporal relationship is preserved, but they're not the real date.
And so it's not HI, it's not one of the hip data elements between. Yeah. Go ahead Gabriel. Yeah. So is that different per patient? Because if you're looking at a cohort and the data shifted in different directions, that affects the analysis, right? It could. I mean, it depends if you're study day of the week is maintained.
So if you're studying admissions to the ED on a Sunday, it'll be, it'll still show that, that patient was admitted on a Sunday. It'll have a week before the real Sunday. They were a week after. And all the dates are shifted, so it won't affect your analysis within that patient's record, you will still know that the patient came back to the ED within 15 days of that admission. There's different ways that those dates are shifted between patients. Just so if you figure out the pattern, you can't, it won't apply to any other patients.
It's really just a way to protect. But within a data set, you would treat it like any other date. You just can know if the file gets out there. Those aren't real dates.
Those are fake dates. Does that make sense, Gabriel, or does that Emma or Emily? I'm sorry. Does that answer the question? Sure. I just have a couple of follow ups.
Sure. So the amount that it's shifted, are we talking for everyone, It's more in a time frame of like weeks or are some people shifted like months or years? Yeah. It's either 123 weeks forward or 123 weeks back. Okay. And so again, seasonality was another thing that, I mean, this became especially relevant during covid. People would say, I don't want the date shifted six months because I really want to know, March 2020, What was the patient's immunization status? So, it might be a week or two off, two weeks off, but it's still, it's seven days or 14 days, or 21 days. So it still maintains day of the week and it still maintains seasonality.
It's just a way to not use real dates. Okay. And then their age at the time of like an encounter, for example, should be accurate. So if it's like shifted to before their birth date, their age should still reflect their birth.
Yeah. Their birthday has been shifted to the exact same amount. Their age to the day is the exact same. Okay. And if you need real dates, some people need real dates. You know, if you think of some of the study like Genoine'sCDf study, she needs to know who's in the hospital today with these vitals because they might be at risk for C diff and I need to start some antibiotics.
Prophyl. Then I would use the level two data direct tool that has real dates in it, but for the vast majority of users outside the firewall, they don't need to know it was March 13. You know, it could be March 20, It wouldn't make a difference to them. They just need to know within a given patient what was the disc time length between this, how many days before that? When were they diagnosed? When were they, you know, they need to know those relationships but not the real date. So you said it's consistent within patient, so that's even across all views. So if I'm downloading medication data that has a certain like prescription date that should be shifted the same amount as their encounter date, that might have been exam previously.
Okay. Again, with their birth date. With their whatever anchor date, you all dates within a given patient have had the same algorithm shifted so you can still do everything? Yeah. Okay. Thank you for clarifying. I know I say shifted dates and then people are like that sounds and it sounds my findings won't be usable but Right. Yeah, Right. Yeah, I am not smart enough to do this. There's really smart people who do these algorithms.
I can just almost explain them in English, but not well. Yeah, that makes sense. Yeah, I have a cohort that I downloaded 2016 and onward. And then I also compare dates within like 30 days of.
Yes, That kind of thing. So yeah. Okay. Just good to know. Thank you. Victoria, I had a question around Geo location. Is the geo location based on primary residence or mailing address and what happens with that changes or someone that has unstable housing or homeless? Great. And is that tied to social determinants of health?
Gabriel, you're asking all the hard questions today. I love it. Yes. So it's tied to current street address and yes, it's associated with census data and other national efforts that look, it's not necessarily every ten years there are updates to census data and some of these SES data that are more frequently collected than every ten years. What you'll get if you download and yes, these are the social determinants of health SES data. However you want to call them based on geo location.
And they're proxies, they're not that patients, but for each patient you'll get a percentage of households on that patient's actual street block had these characteristics. What you'll get if you download the geo location views is you'll get one patient and let's say in the last five years they have moved three times. You'll get the social determinants data associated with each of those addresses. Yeah. Every time in my chart we captured a new address. When they came in, each of those street addresses has the associated SES data.
Does that make sense, April? Yeah. Yeah, that helps. What happens when someone is homeless? I don't know what their address looks like in if they're homeless, we could identify there's also some of our patients state penitentiary and I don't know if there's the longitude of the prison has SCS data. It probably does, I don't know.
But that's a great question to say for some of the less common either. A group home, homeless shelter, a prison, some of these different things, I don't actually know. I don't know what those look like in the registration system of but we could look.
Yeah. That'd be interesting to find out from my DI perspective, it absolutely would. Yeah. I know our population, Michigan Medicine population isn't especially disadvantaged that we do have homeless.
We do have prison. We do, and of course we haven't been adult care facilities, things like that, that are nursing homes, other other group living situations that I suppose if there is a street address you could still do the geo location assessment. But, but yeah, that's a great question.
A Dr. Put in the chat, the Nanda website, that gives you a little bit of a background on the data types, how were they were collected. Certainly if you use those data, reference that website that Ma has there, it's got some really interesting information. It is so cool. We're using that for one of our projects. It's very much focused on health equity during covid, based on address at time of service.
We have these Nanda elements that tell us where people lived. And Nanda dataset gives you those tie ins to your social determinants of health. Race Income for us includes also things like broadband access because again, we're concerned about telehealth during Covid. It's incredible data. It's big, scary data. But it's been critical for us as we try to piece together which populations were affected by telehealth expansion during Covid.
That's awesome. Ma, do you know anything, ma, about Gabriel's question with patients whose status is homeless? I, I can't speak on that, but I can't say that what we have noted because we have multiple years of data that we did have to go back to Arena and others at the data office and say, hey, but what if someone moved? And so I think we have in essence, like nested where people have multiple addresses and therefore their Nanda data shifted within our time period. It does get really complicated if you're doing a big retrospective poll and you're getting people who would be transient a weekly, daily basis. Or whether it's just have multiple years and people moved. It is something you need to consider. Thanks. I want to say something about future studies and how to think about this.
When I think about wearables data, I think about incredibly granular large data sets. Nicole notes this all too well. That's going to tell my heart rate every 10 seconds, it's going to tell my blood. It's huge datasets If the goal of your study is to centrally locate those data for other researchers to use, the research data warehouse and data office are the place to go for that.
It doesn't necessarily mean we'll house the broad data elements themselves from the wearable devices in the research data warehouse. Likely we'll have meta data about what's available that's discoverable in the data tools, in data directed things. If you say yes, I'm interested in physical activity, sleep and depression. Self reported depression from this study that was all collected on the Apple Watch, let's say.
You could identify that cohort, the timeframe in which you want the wearables data. And then we could connect you to the wearables database where those granular, huge datasets live. I think if you have a dataset you've been working. Prompt is already in the level one data direct.
You can look up how we represented the wearables. If you have suggestions on how to better represent those in a self served tool, let us know. But I think we'll probably have more meta data. You can at least see what's available on which patients for how long.
If that's what you're after, we can connect you up with the granular data, but as Nicole knows, it's overwhelming. When you look at how much data is available from a device, do we have time here for you to give us an example of pulling a coherent data directly? Let's say we look for a specific social determinant of health, and they have hard data. Victoria, you tell me. We do. I wanted to make
a quick announcement just because of the conversation around social determinants of health. I put a link to the agenda in the chat. On the agenda, there's an open invitation to, I don't know if it's a conference, it's a bi annual meeting I guess. But it's a 4.5 hour meeting that has a lot of presentations in such. I just wanted to share that so that you guys can go if you want to.
I would say absolutely. Go ahead and do a show and tell Aaron Okay, great. And please know I'm the worst person to do this because work with smart people who do this. Let me show you. I'm going to log into, if you see a data direct login screen here, I'm going to show this because this is where the prompt data live. And we're working on exposing the prompt data in the level two right today.
Go to the link I gave you. That is the Precision Health built tool that sits outside the firewall and I'll show you how to do that. This version is level one. It will send me a duo note to authenticate, then I will name my query.
That's a test. Are you seeing create a new query on the screen? Yes. Okay, good. Sometimes it doesn't advance. If you have an IRB, any IRB for which your unique name is a study team member will be in this dropdown list.
I'll pick a fake study. And from there you can either choose cohort. That's going to give you aggregate counts, or you can choose de identified data download. This is row level data, one patient per row shifted dates as we talked about earlier. I'll choose that one and I'm going to create a new cohort.
Again, I could start with a defined population, I could. Let's start with primary care patients and I'm going to click Add. It's from our starting place of 5 million unique Michigan Medicine patients who have a medical record number. This is going to say of those, 251,000 are considered primary care patients. We're a tertiary care hospital. A lot of people come here for specialty services, only go home to where they live and see their primary care physician.
5 million is not our active patients. Our active patients are just below 2 million, meaning they've had at least one encounter in the last 18 months. 5 million is anyone who's ever had a medical record number.
It could be someone from the early '90s who had a medical record number, they never came in. 5 million is not our active patients. Then I'm going to look for a diagnosis of depression.
I'm going to scroll down here. I can pick any of the ICD, ten codes that I see. I can pick. I can have it set to all sources, Look for depression among facility or professional billing. It might just be on the problem summary list. It might be in their medical history. It might be the reason they're there that day.
The visit diagnosis, I also say sure, bring in ICD nine codes in case the data go back before 2016. If I want to refine this list, there are Xs along the right hand I can filter down. I'm just going to, for the sake of demo, be cast a wider net. Then I'm going to go down to the output view selection once I have my inclusion exclusion criteria. And I can go into way more detail here if you want any particular, I can then say, all right, I am interested in downloading the following types of Excel spreadsheet data about these primary care patients with depression. This, just to show you, demographic is where we have some of the geo location data.
It's where the Michigan Death Index data are. That's a date of death, cause of death if they died in Michigan only though patient race data. And I would just click on any of these and then select the columns that I'd like to see in my output. We have DID date. Again, to remind you, this isn't a real date.
This is a date that has been shifted again. Death like birth has been shifted in the exact same amount. So you can tell agent death, things like that. That's all under demographics procedures, Medicaid, all different things you can download. I'm going to come down here to Wearable. These are the various views or output types you can download.
For prompt, you can see we have about eight or nine different views. We have activity logs, We have the daily data. The daily data is things like calories, BMI, distance elevation, all the things that it's monitoring on. That summed up as a total for a day. It's not every single count food and calories for the day, not each meal.
Same with resting heart rate. Sleep log summary. I'll show you what those look like. Again, this is minutes. It took the user to get out of bed after waking.
This is how granular you want, the data, how much time they spent in various levels of sleep. This would make sense to Kathy. This is health kit activities. This is energy burned, exercise time, and then there's health kit. I can't remember what this one is.
It's for Google. Google Enter. Yeah. Thanks. Colin, this is your study. You should initiate this. Yes. Yes. So here's again to better understand what the start stop times were, what device was used, things like that. So what this would do is this would spit out data for any of these 66,000 patients who were in prompt.
That makes sense. Another thing here is survey data. There were surveys given as part of prompt. You can look at that against patient reported outcomes, if you will.
The PQ survey is a 25 instrument survey on what you, how much you sleep, your family history. There's also pain severity that comes from the Michigan Genomics Initiative. The questionnaire I have to look at what is the primary care questionnaire. I'm not totally sure, but I can find out how.
I'll stop sharing there for just a second just to see if there's any questions or anything I should highlight or if you guys want to play with it yourselves. And then come back at the bottom of each at the bottom of the data direct screen, whether you're in the level one or level two, it says E mail for support and feedback that will bring you right to an e mail that you can say, hey, I would've expected to see this here or I got zero and I know there's patients in here. Any thing you're doing, it's like a real time report.
It just so you can remember it and not have to pop out to Outlook. I want to clarify a few things about the fit data that, yeah, some of the categories say like food track and things like that. We don't require our participants to do any of that.
There's probably a small amount of data for that. The only thing we require is tracking heart rate, sleep, and steps for the Fitbit data. Just FYI, also with the health kit data for the Apple users, we have a decent amount of people who have Apple Watch data in there. They use that instead of the Fitbit, There might be some data.
You can do comparisons between the devices if you want, but just there's a decent amount of Apple Watch users in our study that use it exclusively. Just wanted to point that out. That's awesome. Thank you. Once these get used, if there's a views of prompt or whatever is our next study that we expose in the tool. If there's views that aren't useful, we can take those out just to make it easier for the user.
Or if there's better descriptions that you're like, this really isn't amount of time it took him to get out of bed, this is blah blah, blah. We can just continue to iterate and make that more helpful. We love prompt because it was like such a great guinea pig for us to say, how would a self serve tool show someone what's available? In a hopefully clear way, you could decide if you wanted it for research or not. Prompts phenomenal to work with. But yeah, think if you're working on a study, if one of the goals is to eventually make those data available for others to use, think about playing with this and see how we could help expose it and link it with all the other types of data that we have. Right now, we have 10 minutes left. I'm going to stop.