Friday, March 13, 2015

Understanding ARIMA Using an Eyesight Measurement Analogy


I’ll use an analogy related to eyesight measurement and prescription eyeglasses as an example to explain a serious concept. Nowadays there is a lot complicated equipment to measure human eyesight. It is relatively easy to accurately measure eyesight and get the right prescription for eyeglasses. But a few decades back, before today’s sophisticated and computerized eye testing machines, doctors accomplished this task manually. Today, almost everyone who has visited an eye doctor may easily recognize the picture presented in the Figure 12-12. It is known as a Snellen chart (though this name is not as common as the chart itself).



Figure 12-12. A Snellen chart (courtesy: Wikipedia)

To test eye sight and prescribe eyeglasses, doctors perform a small test. Instead of using special equipment, in the past doctors had a box full of lenses (of different powers). The patient was asked to sit in a chair and was given an empty frame to put on her eyes. The doctor used to put differently powered lenses, on by one, in the frame and asked the patient to read from the Snellen chart. Some patients, for example, read the top seven rows and struggled with the lower ones. The doctor then removed the first lens and put another. After much such iteration, the doctor used to finalize on the exact lenses to be used in the patient’s glass. Some patients got diagnosed as nearsighted and some with farsightedness. The basic assumption in this process was that all the patients are literate. But what if the patient was illiterate and couldn’t read any letters?

So, the major steps in the process of eyesight determination and prescription can be listed as follows:

1.  Assume that patient is literate.
2.  Based on some tests, identify nearsightedness or farsightedness and get a rough estimate of eyesight.
3.  Estimate the exact eyesight by trying various lenses.
4.  Use the test results to give the prescription.

A similar analogy can be used with Box–Jenkins approach. Say you have a time series in your hands and you want to forecast some future values in this series. You first need to identify whether the time-series process is an AR process or an MA process or an ARMA process. You also need to identify the orders, p and q, of AR(p) or MA(q) or ARMA(p,q) processes as applicable. Once you identify the type of series and the orders,  you can attempt to write the series equation. You are already familiar with the AR(p), MA(q), and ARMA(p,q) equations. The next step is to find parameters such as a1, a2, … ap, and b1, b2, … bq as applicable. Before you move on to identifying the model, there is an assumption to be made; the series has to be stationary (in the coming sections, I explain more about stationary time series). This is to simplify the overall model identification process. Table 12-2 uses the eyeglasses prescription analogy to illustrate the Box–Jenkins approach. After that I discuss various steps involved in the Box–Jenkins approach.

Table 12-2. An Analogy Between a Vision Test and the Box–Jenkins Approach
Vision Test
Box–Jenkins Approach
Assume the patient is literate.
Assume that the time series is stationary; otherwise, make it stationary.
Based on some tests, identify nearsightedness or farsightedness and get a rough estimate of eyesight.
Based on plots (ACF and PACF functions, explained later in this chapter), identify whether the model is an AR or MA or ARMA process.
Estimate the exact eyesight by trying various lenses.
Estimate the parameters such as a1, a2, … ap, and b1, b2, … bq .
Use the test results to give the prescription.
Use the final model for forecasting.


Steps in the Box–Jenkins Approach

Once again I’ll show a time-series forecasting problem. Consider some time-series data of a premium stock over a period of time. Let’s assume you have the stock price data for the past year. Also assume that you want to predict the stock prices for the next week. You will use the Box–Jenkins approach for this forecasting. First you need to make sure that the stock price time-series process is stationary. Then you need to identify the type of process, which approximates the pattern followed by the stock price data. Is it an AR process or an MA process or an ARMA process? Once you have the basic model equation in place, you can estimate the parameters. Successfully completing all these steps concludes the model-building process. You now have the final equation that can be used for forecasting the future values of the stock under consideration. You also need to take a look at the model accuracy or the error rate before you can continue with the final forecasting and model deployment. What follows is a detailed explanation of each step.


Step 1: Testing Whether the Time Series Is Stationary

If a time-series process is stationary, it’s much easier to build the model using the Box–Jenkins methodology.


What Is a Stationary Time Series?

“A time series is said to be stationary if there is no systematic change in mean (no trend), if there is no systematic change in variance, and if strictly periodic variations have been removed,” according to Dr. Chris Chatfield in The Analysis of Time Series: An Introduction (Chapman and Hall, 2003). A stationary time series is in a state of statistical equilibrium........

  1. Chapter 12: Time-Series Analysis and Forecasting
    1. What Is a Time-Series Process?
    2. Main Phases of Time-Series Analysis
    3. Modeling Methodologies
    4. Box–Jenkins Approach
      1. What Is ARIMA?
      2. The AR Process
      3. The MA Process
      4. ARMA Process
    5. Understanding ARIMA Using an Eyesight Measurement Analogy
    6. Steps in the Box–Jenkins Approach
      1. Step 1: Testing Whether the Time Series Is Stationary
      2. Step 2: Identifying the Model
      3. Step 3: Estimating the Parameters
      4. Step 4: Forecasting Using the Model
      5. Case Study: Time-Series Forecasting Using the SAS Example
      6. Checking the Model Accuracy
    7. Conclusion


Discussion forum

Facebook:

Blog :







Thursday, February 26, 2015

Wednesday, February 18, 2015

What is Dispersion? An extract from the book "Practical Business Analytics using SAS"



What is Dispersion

Dispersion is the variation in data—the non uniformity or inconsistency in the values of a variable. The measures of dispersion indicate nothing about the middle value of the data. Rather, they give you an idea about the spread in the data. Dispersion can be measured using Range, Variance and Standard Deviation.

Anderson Wants to Cross a River 

Mr. Anderson, who can’t swim, wants to cross a small waterway. He asked a neighbor to describe the depth of that river, and the neighbor said its depth is 4 feet on average. Mr. Anderson is happy and starts to cross it. His happiness does not last long. The reason is that although the average is 4 feet, the depth at some places might have been 7 feet, which is more than Mr. Anderson’s height. If he had inquired about the deviation from average depth or the inconsistency of depth at various points, or at least the range of depth(minimum and maximum depeth) apart from the average depth of the river, it would have saved Mr. Anderson from drowning.


Therefore, merely knowing the average or the center value may not be sufficient in all cases. The deviation from center (or the dispersion) or the spread of a variable is also important. Given next are a few measures of dispersion.















Wednesday, February 11, 2015

(TH101)Peer Comparison case study - Testing of Hypothesis


Business Problem


     This is a peer comparison project. Suppose that you are working for Samsunge in customer experience management team. The idea is to regularly monitor the customer satisfaction levels and peer company moves. The competitor company is Appleo. The objective is to test two main hypothesis.
1.The Samsunge Average customer satisfaction score is minimum 75%.
2.The overall average satisfaction score of Samsunge is same as  Appleo. There is no significant difference in the satisfaction scores


It might be possible that both hypothesis are correct, one of them is correct or both of them are wrong. Perform the relevant testing to verify these assumptions

The Data


The data is collected for 100 Samsunge customers and 100 Appleo customers. Their satisfaction scores are recorded. The sample represent the data and it is unbiased




Approach



Download the data and import it to SAS

Part-1
Take Samsunge_Score Coolum
Identify the right test(Testing sample mean)
Accept or reject the null hypothesis based on P-value
Part-2
Calculate the mean of Samsunge and Appleo
Perform mean comparison test / two sample equal mean tes
Accept or reject null hypothesis based on P-Value

References:

Chapter -8 Testing of Hypothesis from the book Practical Business Analytics Using SAS: A Hands-on Guide http://www.amazon.com/Practical-Business-Analytics-Using-Hands/dp/1484200446
SAS code from Chapter-8 of the book Practical Business Analytics Using SAS: A Hands-on Guide http://www.amazon.com/Practical-Business-Analytics-Using-Hands/dp/1484200446
Case study id: TH 101-Peer Comparison