Tuesday, May 5, 2020
Scenarios as a Statistical Problem Harris Green Pty Ltd
Questions: Question 1 (a) For each of the following scenarios: classify the variable as either numerical or categorical, AND state whether the scale of measurement is nominal, ordinal, interval or ratio.(i) Fast food restaurants sell soft drinks in three sizes - small, medium and large. (ii) A manufacturing company is sending millions of car parts overseas. (b) Classify each of the following scenarios as a statistical problem in either descriptive statistics, probability or statistical inference. (i) Based on the survey conducted by Harris Green Pty Ltd, researchers predicted that group buying websites will be the most popular method for buying electrical and electronics products in the future. (ii) A survey of 1000 adult drivers conducted by News Today shows that 45% of drivers admit to drinking and 36% admit to talking on the mobile phone while driving a vehicle. Question 2 The following data shows the service times (in seconds) for a sample of 96 customers who arrived at the service counter of a local hospital. 105 101 99 100 105 101 102 91 102 100 104 100 98 99 107 99 101 97 101 92 100 100 101 103 94 106 94 102 93 109 100 103 103 109 96 101 103 103 101 100 98 96 98 104 96 105 103 97 102 106 100 108 100 100 99 99 104 98 106 107 108 102 93 100 101 105 108 99 96 101 100 99 106 95 92 108 102 105 105 81 89 103 108 98 109 106 101 102 104 97 103 108 104 98 109 108 (a) Construct a stem-and-leaf diagram using this data. You need to generate between 5 and 10 stems only for the diagram. (b) Draw a histogram for the frequency distribution with the first class 79 to less than 86. On the same graph draw the frequency polygon. (c) Draw an ogive for the frequency distribution in Part (b) with the first class 79 to less than 86. Question 3 A health research agency has recently collected the following information when investigating the occurrences of skin cancer in a certain population of beach goers: 7% of beach goers, who do not use any sun-screen lotion develop skin cancer at some stage in their life. 1% of beach goers, who use sun-screen lotion develop skin cancer at some stage in their life. 90% of beach goers use sun-screen.Use this information to answer the following questions. (Hint: construct a contingency table.)(a) If a beach goer is randomly selected, what is the probability that the person uses sun-screen lotion and yet develops skin cancer at some stage in life? (b) If a beach goer is randomly selected, what is the probability that the person develops skin cancer at some stage in life? (c) If a beach goer is randomly selected who has already developed skin cancer, what is the probability that the person does not use sun-screen lotion? (d) What is the probability that a beach goer randomly selected will not develop skin cancer in life time or uses sun-screen lotion? Question 4 (a) A local supermarket receives fresh fruits delivery each morning at a time that varies uniformly between 6:00am and 8:00am. What delivery time can you be confident in stating that 95 percent of deliveries will arrive before? (b) The maintenance department of a citys electric power company finds that it is cost-efficient to replace all street-light bulbs at once, rather than to replace the bulbs individually as they burn out. Assume that the lifetime of a bulb is normally (Gaussian) distributed with a mean of 8000 hours and a standard dev iation of 300 hours. If the department wants no more than 3% of the bulbs to burn out before they are replaced, after how many hours should all of the bulbs be replaced? (c) The time between unplanned shutdowns of an Internet service provider has an exponential distribution with a mean of 20 days. Find the probability that the time between two unplanned shutdowns is 13 days. Question 5 (a) The probability of success in a trial is 0.70. In 500 trials, what is the probability of succeeding between 280 and 355 times? Use normal approximation to the binomial distribution with continuity correction. (b) Toyota requires a quality assurance check of new cars before a shipment is made. The tolerable exception rate for this internal control is 0.05. During an audit, 400 cars were sampled from a population of 4,000 cars, and 10 were found that violated the internal control. Calculate the upper bound for a 95% one-sided confidence interval estimate for the rate of noncompliance. (c) BP wishes to estimate the mean amount of water that has seeped into the fuel storage tanks at its refineries in Sydney. A preliminary sample of n = 16 tanks showed that the standard deviation, s = 48 litres. How much larger should the sample be in order to estimate the mean water content of the tanks to within 10 litres with 95% confidence? Answers: 1.aiCategorical Variable and Ordinal ScaleThe data is categorical because there are 3 given categories small, medium and large and as the categories shows a degree of order it is considered in the ordinal scale.iiNumerical Variable and Ratio scaleThe number of cars is the count of the number of cars and therefore is the numerical variable and having ratio scale.bi Probability or Statistical Inference For the purpose of prediction and inference the help of regression or anova is required and since here it is said that the researchers were able to forecast the which will be the most profitable company in the future : this analyis is considered to be a probability or statistical inference analysis.iiDescriptive Statistics Here the mean percentages are given which are evaluated from the summary statistics/descriptive statistics giving description of the data. No Inference or prediction is done here only the data is studied thoroughly. 2. The stem and leaf diagram is given below. Most o f the data are given in the 90s and 100s values therefore the stem have total 10 stems with 5 for each. The 80s gives the least number of values. As the data is lies between a very close interval and the numbers are in close proximity therefore the stem and leaf plot gives a good representation of the numbers. It can be seen that maximum values are 98,99,100,101and 110 them. For the values in the 90s , the second half of the values show much more frequency especially the value 99 arrives very often followed by the value 98. The values 100 and 101 havee the maximum frequencies . Stem-and-Leaf PlotFrequency Stem Leaf 2.00 Extremes (=89) 1.00 9 . 1 4.00 9 . 2233 3.00 9 . 445 7.00 9 . 6666777 13.00 9 . 8888889999999 22.00 10 . 0000000000001111111111 15.00 10 . 222222233333333 11.00 10 . 44444555555 7.00 10 . 6666677 11.00 10 . 88888889999Stem width: 10Each leaf: 1 case(s)As seen from the stem and leaf plot if we look at the histogram we can almost see the same diagrammatic representati on. There is almost no frequency for values 86- 94 as is evident from the data. Maximum values are concentrated near the 98-106 range showing that maximum customers remained for a time between this time. Referring to the stem and leaf diagram also maximum frequencies occurred for the values 100 and 101. Then there was a again a slight continueous decline of the values.This would be more evident if further a table of summary staistics including the mean, standard deviation is implemented. The frequency polygon representing the histogram is given below the histogram(Conway, 1963). Upper Limits Frequency 86 1 90 1 94 7 98 14 102 36 106 24 110 13 More 0 Upper Limits Frequency Cumulative % 86 1 1.04% 90 1 2.08% 94 7 9.38% 98 14 23.96% 102 36 61.46% 106 24 86.46% 110 13 100.00% More 0 100.00% From the above diagram the ogive or the cumulative frequency diagram shows the cumulative frequency plots over the histogram with a gradual increasing trend.For the ogive from the table corresponding to the data limits are given the frequencies and then the cumulative frequency table because it is necessary to calculate cumulative frequencies to draw an ogive as it is a cumulative frequency diagram. The rate of increments within the data can be observed from the diagram. 3. From the given data the following contigency table has been formulated with the number of people going to the beaches. First the division is made by the number of people going to the beaches who use sunscreen and people who donot use sunscreen. Another division is made with the number of people having skin cancer and without skin cancer. It is a bivariate frequency table showing percentage of beach goers among the total number of beach goers. Skin Cancer No Skin Cancer Total Use Sunscreen 1 90 Donot Use SunScreen 7 10 Total 8 92 100 i Total % of beach goers = 100%% of favourable cases = 1% people developing skin cancer although using sunscreenTherefore probability = 1/100i Total % of beach goers = 100%% of favourable cases = 1%(beachgoers using sunscreen) +7% (beachgoers not using sunscreen) = 8%Therefore Probability = 8/100 ii Percentage of people who developed a skin cancer = 8%Percentage of people having skin cancer and donot use sun skin = 1%Therefore Probability = 1/8iii Total % of Beach goers =100%% of beach goers not developing cancer in lifetime = 92%Therefore probability = 92/100iv Also Total % of Beach goers =100%% of beach goers using sunscreen =90%Therfore probability = 90/1004. a The distribution of delivery times follows a uniform distribution.The mean of the uniform distribution is 6+8/2 = 7 a.m.And variance = (8-6) 2/12 = 4/12 = .333Thefore the 95 percent confidence interval giving the upper limit for the delivery time is given belowZ/2 = 1.96 therefore upper confidence level =7+1.96*.333 = 7.65 c The exponential distribution here explains the time intervals of unplanned shut downs of the internet service providers where the mean of the number of days of unplanned shutdown is 20 days(Ahsanullah Hamedani, 2010). pdf of an exponential distribution = e-x Here mean = 20 Then probability that the time between the unplanned shutdowns = 13days If x= time between two unplanned shutdowns = 13 days Then probability = 20*e-20*135. As the distribuion was about the rate of success and failure it was initially a binomial distribution but since here the toat sample size is 500 which is much greater thean 20 and since p lies between .05 to .95 and had a value of .70 therefore it was approximated by the normal distribution. aSuccess probability for a trail, i.e. p = .70Total number of trials = 500Therefore n=500This binomial distribution is to be approximated by a normal distribution (Frederic P. Miller, 2009).Therefore Z = X np/npq, where X is a binomial variable with n=500 and p = .7 0.Np = mean of Z = 500*.70 = 350 and npq = 105Now the required problem is to find P[280-350/105Z355-350/105]= P[-6.8 Z.487]=P[ Z.487] - P[ Z-6.8] = .687 -0.0 =.687Therefore required probability is .687.b Here the rate of non compliance is given as .05 Therefore p = .05 and q= 1-.05 = .95The sample size n = 400 and the confidence level is given as 95%.Therefore the upper bound for the 95% level of confidence is given belowP*q/n = .000119 , now P*q/n = .011 Now Z/2 =1.96 , P*q/n * Z/2 = .022 =errorTherefore upper margin = .05+.022 = .07d Here the standard deviation calculated from a sample before = 48The error permitted in the data =+-10 The confidence level is 95%.The relevant sample size that should generate the proper estimate of the mean is given by the formula Sample size = [(Z/2 * S.d)/Error Margin on both sides]2Here Z/2 = 1.96 Sample size = [1.96 * 48/10]2 = 88.51. 72 more tanks are needed. References Ahsanullah, M., Hamedani, G. (2010). Exponential distribution. New York: Nova Science Publishers.Conway, F. (1963). Descriptive statistics. Leicester: Leicester University Press.Frederic P. Miller, A. (2009). Confidence interval. [S.l.]: Vdm Pub. House.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.