学术报告
当前位置: 首页 >> 学院新闻 >> 学术报告 >> 正文
乔梦柯:Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining
发布日期:2023-06-12  来源:   查看次数:

报告时间:2023年6月16日(星期五)10:00—11:30

报告地点管理学院新大楼1125会议室

:乔梦柯特任副教授

工作单位:中国科学技术大学

举办单位:合肥工业大学管理学院

个人简介

乔梦柯现系中国科学技术大学管理学院特任副教授,乔梦柯先后毕业于新加坡国立大学(博士)、华中科技大学大学(学士)。乔梦柯曾在国内外知名期刊和学术会议上发表论文包括Information Systems Research,International Conference on Information Systems,Workshop on Information Technologies and Systems。乔梦柯的研究方向主要是机器学习与因果推断,文本挖掘,计量经济学等。

报告内容

As a result of advances in data mining, more and more empirical studies in the social sciences apply classification algorithms to construct independent or dependent variables for further analysis via standard regression methods. In the classification phase of these studies, researchers need to subjectively choose a classification performance metric for optimization in the standard procedure. No matter which performance metric is chosen, the constructed variable still includes classification error because those variables cannot be classified perfectly. The misclassification of constructed variables will lead to inconsistent regression coefficient estimates in the following phase, which has been documented as a problem of measurement error in the econometrics literature. The pioneering discussions on the issue of estimation inconsistency because of misclassification in these studies have been provided. Our study attempts to investigate systematically the theoretical foundation of this problem when a newly constructed variable is used as the independent or dependent variable in linear and nonlinear regressions. Our theoretical analysis shows that consistent regression estimators can be recovered in all models studied in this paper. The main implication of our theoretical result is that researchers do not need to tune the classification algorithm to minimize the inconsistency of estimated regression coefficients because the inconsistency can be corrected by theoretical formulas, even when the classification accuracy is poor. Instead, we propose that a classification algorithm should be tuned to minimize the standard error of the focal regression coefficient derived based on the corrected formula. As a result, researchers can derive a consistent and most precise estimator in all models studied in this paper.

上一条:Yongchuan Bao: 生产商解决方案:客户关系的催化剂还是抑制剂?理论探讨和实证研究
下一条:岳晓航: Automating Supply Chain Contracts in the Presence of Demand Shifts and Contract Execution Lag

【关闭】