Projects

   

Rare Disease Detection and Physician Targeting: A Factor Graph Machine Learning Approach

A rare disease is any disease that affects a small percentage of the population. In many cases rare diseases are so rare that many individual physician may have never seen a single patient with that disorder. The extremely low incidence rate of rare diseases makes it particularly difficult to recognize and diagnose. A major challenge in the rare disease market is how to target physicians who are potentially involved with patients with a specific rare disease. The existing detection and targeting methodologies, such as segmentation and profiling, are developed under an assumption of a large and mass market, and are thus not suitable for the rare disease market where the objective classes are extremely imbalanced. This paper proposes a graphical model approach to predict targets by jointly modeling physician and patient features from different data spaces. Through an empirical example with medical claim and prescription data, the proposed approach demonstrates better accuracy compared with benchmark models. The graph representation also provides visual interpret-ability of the relationship among physicians and patients. This paper contributes to the literature of exploring the benefit of utilizing relational dependencies among entities in health-care industry.

Estimation of Heterogeneity For Multinomial Probit Models Using Dictionary Learning

Empirical studies suggest that utility functions are often irregularly shaped, and individuals often deviate widely from each other. In this paper, we introduce a multinomial probit model involving both parametric covariates and nonparametric covariates. To combine heterogeneity across individuals with flexibility, we have two different strategies for the parametric component and the nonparametric component. For the parametric component, heterogeneity is incorporated by the inclusion of random effects. As for the nonparametric component, each individual has a unique individual-level function. However, all those nonparametric components share the same basis. Because the basis is unknown, we use dictionary learning to learn the basis. Additionally, an EM algorithm serves to estimate the multinomial probit models.

 

Click Prediction Using Deep Learning

We all have experienced real time bidding advertising. When we load a webpage with advertisements, they might be related to your google search history, or maybe your last amazon purchase. These advertisements are distributed by demand side platform companies, that try to determine if an advertisement is worth placing and how much they are willing to pay for it. One of these DSPs is iPinYou, who released a dataset of their advertisement transactions. The challenge is to determine based on the user information iPinYou receives, whether or not the user is going to click on a certain advertisement. This prediction process model is what I want to build with deep neural networks, and compare the results with traditional regression methods for this prediction problem.