外文翻译--预测电信行业客户流失——基于一种sas生存分析模式的应用程序(编辑修改稿)内容摘要:

h customer in the sample, a variable of DUR is used to indicate the time that customer churn occurred, or for censored cases, the last time at which customers were observed, both measured from the origin of time (August 16, 2020). A second variable of STATUS is used to distinguish the censored cases from observed cases. It is mon to have STATUS = 1 for observed cases and STATUS = 0 for censored cases. In this study, the survival data are singly right censored so that all the censored cases have a value of 15 (months) for the variable DUR. DATA SOURCES There are four major data sources for this study: block level marketing and financial information, customer level demographic data provided through a third party vendor, customer internal data, and customer contact records. A brief description of some of the data sources follows. Demographic Data – Demographic dada is from a third party vendor. In this study, the following are examples of customer level demographic information: Primary household member’s age Gender and marital status Number of adults Primary household member’s occupation Household estimated ine and wealth ranking Number of children and children’s age Number of vehicles and vehicle value Credit card Frequent traveler Responder to mail orders Dwelling and length of residence Customer Internal Data – Customer internal data is from the pany’s data warehouse. It consists of two parts. The first part is about customer information like market channel, plan type, bill agency, customer segmentation code, ownership of the pany’s other products, dispute, late fee charge, discount, promotion/save promotion, additional lines, toll free services, rewards redemption, billing dispute, and so on. The second part of customer internal data is customer’s telemunications usage data. Examples of customer usage variables are: Weekly average call counts Percentage change of minutes Share of domestic/international revenue Customer Contact Records – The Company’s Customer Information System (CIS) stores detailed records of customer contacts. This basically includes customer calls to service centers and the pany’s mail contacts to customers. The customer contact records are then classified into customer contact categories. Among the customer contact categories are customer general inquiry, customer requests to change service, customer inquiry about cancel, and so on. MODELING PROCESS Model process includes the following four major steps. Explanatory Data Analysis (EDA) – Explanatory data analysis was conducted to prepare the data for the survival analysis. An univariate frequency analysis was used to pinpoint value distributions, missing values and outliers. Variable transformation was conducted for some necessary numerical variables to reduce the level of skewness, because transformations are helpful to improve the fit of a model to the data. Outliers are filtered to exclude observations, such as outliers or other extreme values that are suggested not to be included in the data mining analysis. Filtering extreme values from the training data tends to produce better models because the parameter estimates are more stable. Variables with missing values are not a big issue, except for those demographic variables. The demographic variables with more than 20% of missing values were eliminated. For observations with missing values, one choice is to use inplete observations, but that may lead to ignore useful information from the variables that have nonmissing values. It may also bias the sample since observations that have missing values may have other things in mon as well. Therefore, in this study, missing values were replaced by appropriate methods. For interval variables, replacement values were calculated based on the random percentiles of the variable’s distribution, ., valu。
阅读剩余 0%
本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体),仅供学习参考。 用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权,请联系我们反馈本站将在三个工作日内改正。