How Democratizing Access to Data Strengthens Our Institution

Published in

NYU Langone Health Tech Hub

5 min readApr 14, 2022

At NYU Langone Health, clinical and research analytics are deeply embedded in our institutional DNA, continuously delivering timely information and insights securely via a multitude of modalities. Data and analytics are shared extensively across all our “missions”: from clinical operations and corporate functions to supporting research and education. At NYU Langone, secure, prompt access to actionable data and analytics is not aspirational — it is an essential, routine part of our daily work.

NYU Langone has a long and proud history of pioneering the use of data and analytics for improving both patient care and business management. A Harvard Business Review article published in 2017 emphasizes the importance that we place on identifying and maintaining a single source of truth; on deploying dashboards across the enterprise, and the critical role of business users and leaders play in ensuring data quality. NYU Langone institutional and Medical Center Information Technology (MCIT) leadership has consistently prioritized investments in data and analytics, from dashboards to robust data governance platforms, and from traditional data warehousing to modern “big data” platforms in the data center and cloud. Our MCIT leaders and experts are collaborating with clinical and research leadership to pioneer the concept of “data hubs” that are dramatically democratizing secure data access and utilization across the institution by way of multi-modal data integration, self-service analytics, and research collaborations.

So, when we talk about democratizing data, which data are we referring to? It might seem that the obvious answer would be “all of it”, and that the straightforward approach would be to simply move all data into a “data lake” and then provision access for all those who need it. But this approach is quite problematic for many reasons. 20 years ago, we would have considered employing an enterprise data warehouse for this purpose. It would have taken years to build, cost millions of dollars, and we would have ultimately found that users actually access just a small portion of the warehoused data. While our modern Hadoop data lake provides a much more economical and agile solution, it remains the case that connecting users with data is not a trivial, one-size-fits-all proposition. One simply cannot assume that all users have the same needs and same level of technical expertise.

We therefore take a targeted approach to data democratization, where each individual request is evaluated to determine the best source to meet the needs of use case for the data as well as the person who is accessing it. When reviewing requests, we consider the following:

· Technical expertise of the user

· Security and confidentiality

· If clinical data only is needed

· External data sources

· Intended purpose (reporting metrics/creating dashboards, data discovery/data mining, ad hoc reporting, creating predictive models)

· Data latency

Based on the answers to these questions, we can provision access to the best source. This ensures users will gain access to the data they need, and only the data they need.

Of course, democratizing data only provides value to the organization if you have analytic tools and training in place for those who will be accessing this data. Providing data without the proper tools would be like giving someone a car who doesn’t have a driver’s license or know how to drive. Yes, some analysts may need no training to query and massage the data, but democratizing data is not just for the few with technical skills; the goal is getting data and analytics to the masses. In the future, we’ll be able to simply ask questions of the data and have AI do the rest, but we’re not quite there yet. Therefore, the proper tools and training are a necessity.

It is also crucial to have a strategy for your Business Intelligence and analytics tools. This provides an additional layer of security when your user community interacts with their data and assists with making data available to non-technical users. Most tools have a semantic or physical layer which sits on top of the original data source, which allows for creation of custom slices of the data. This can limit users’ access and available data elements, making it easier for them to navigate the tools and information. Standardizing on a single tool, as we have, certainly helps streamline this process and allows users to easily share and collaborate across the organization.

The results of democratizing data across our institution surpassed our expectations. The NYU Langone data lake has now become the “go to” resource for data analysts, clinical informaticists, and researchers to procure data for their requirements in a secure and governed manner. In addition, it serves as a learning platform for our budding clinical informaticists — analytically inclined undergraduate and graduate medical students from NYU Grossman School of Medicine.

We were cognizant of horror stories about “dirty lakes”, so we remained vigilant and took several measures to prevent our data lake from degenerating into a data swamp infested with security risks. As opposed to a “build it and they will come” approach, we populated the data lake incrementally and based on use cases. For example, to study cancer genomics in the context of longitudinal patient clinical data, we developed a pipeline to bring in mutation data into the data lake.

Our CEO Dr. Robert Grossman had very early on laid out a clear vision for analytics, which included a simple but very powerful maxim: business users, not IT, are responsible for data stewardship (including data quality). Consistent with this vision, we engaged with and entrusted our clinical informaticists and analytic leaders across the institution to serve as domain data stewards — for example, cancer/oncology, cardiology, population health etc. Their role is to serve as subject matter experts in their respective fields and provide guidance on data organization and access.

The data lake user population reflects the diversity of NYU Langone’s multiple “missions” (and, of course, the diversity of New York City!) and comprises extraordinarily talented and motivated teams and individuals. In order to provide a secure and consistent user experience, we encourage our data lake users to utilize enterprise standard tools and technologies, such as secure virtual desktops and specific open source and licensed analytic tools. Naturally, there are always some outlier use cases — which researcher does not like to use their favorite graduate school programming language? — but we find that reason usually prevails…helped by institutional and IT security policies.

Data management, data science and analytics are rapidly evolving fields, and we strive to keep pace in a manner that balances the aspirational with ground realities. As an example, our on-premise data lake is now connected (securely) to the cloud, which benefits multi-site studies that require collaboration between researchers across institutions.

In hindsight, we attribute the successful data democratization to a clear vision, persistent efforts, deep collaboration at multiple levels, and strong leadership support from IT and business.

If we had to start over again, would we do things differently? In our opinion, probably not!

Jeff Shein, Senior Director, Data Warehousing & Analytics, NYU Langone Health

Rajan Chandras, Director, Data Management and Architecture, NYU Langone Health

How Democratizing Access to Data Strengthens Our Institution

Written by NYU Langone Health Tech Hub