Data Engineering to Support Advanced Analytics

By Charles Yoon | Data Engineer

As a growing company that leverages health data to provide value to patients, it is increasingly apparent that a distinct data engineering component is becoming a necessary and a crucial part of a successful data science implementation.  

Data science is an interdisciplinary field that deals with processes to extract meaningful knowledge or useful insights from vast amounts of data. It favours the use of more generalized solutions to tackle multiple issues versus the more traditional statistical approaches of customizing a solution to a specific problem. Today, data science has far reaching implications in many fields, especially in healthcare, where the latest applications of machine learning or artificial intelligence software are being used to improve and augment diagnoses from medical imaging, predict adverse events from EHR data, and even discover potential therapeutic treatments for COVID-19. 

Data engineering, meanwhile, is geared towards supporting the application of data science by organizing heterogenous data sources, facilitating data cleaning, and developing standardization models to fulfill advanced analytics and business requirements. In other words, data engineering is the first phase and data science is the second phase of an analytics pipeline. And with the ever-shifting landscape of data, especially in healthcare, we are moving to a world where data from multiple sources need to be pulled together and organized in order to generate a complete picture of a patient’s health. 

The scope of data engineering can be broken down into 3 major components: 

  1. Optimize data consolidation by implementing data integration tools to connect to a myriad of data sources from modern systems such as wearables and connected devices while remaining compatible with legacy systems and existing warehouses, as is usually the case in healthcare. 
  1. Build real-time data streams and pipelines to direct the incoming data flow towards filling our data lake. 
  1. Process this data by cleaning and transforming it in a way to make it easily accessible for data scientists and business analysts. 

At MEMOTEXT, we currently integrate data from multiple sources including prescription drug claims, Fitbits, ambient mobile data, and behavioural data from program responses.  

As we continue to grow, we are undergoing a transformation to adapt to these changing times by transitioning to a modern data architecture.  This will enable us to better prepare for scale by building new data pipelines and technologies to funnel into our advanced analytics. This will enable our algorithms to make better real-time decisions as well as increase the accuracy of our predictive models, which will ultimately benefit our patients take control of their own health.