Statistics Research and Report Assignment :
Statistics Research and Report Assignment
In this assignment you will examine data used by a Real Estate investment advisor. She wants you
to answer some specific questions put by clients about houses prices in the neighbourhood encompassed by 4 suburbs around city of Melbourne. The data is contained in the file
‘Real_Estate.xls’ and contains the following columns (variables):
Variable NameDescriptionIDHouse Identity numberPriceSelling Price of the house (in 000’s)BedroomsNumber of bedroomsSizeHouse Size (m2)Pool0=House without a Pool 1=House with a PoolDistanceDistance from city centre (km)SuburbSuburb numberGarage0=House without a Garage 1=House with a GarageStatistics Research and Report Assessment
Before you begin your analysis, you are required to take a random sample of size 150 from the 170 cases in the file. Use the file Random_Sample_Generator-2.xls to do this. Answers to the questions below are to be based on your sample of 110 cases. Make sure to keep a safe copy of the sample you use since you cannot use Random_Sample_Generator-2.xls to reproduce it. Provide a printout of the data in your sample, with ID numbers in ascending order.
Part 1: Initial Data Analysis
Variable ListUsing the variables listed in the table above, state for each variable whether it is qualitative or quantitative.
If it is qualitative, state whether it is nominal or ordinal, and if it is quantitative, state whether it is discrete or continuous.
HistogramCreate a histogram showing the distribution of selling price of the house.Comment upon the shape of the distribution: is it symmetric? If it is not, is it positively or negatively skewed?Are there any outliers present? If so, are they of particular interest?
State which central measure would be best to use to describe the centre of this distribution, and the reason(s) why.
Descriptive statisticsPrepare a table that shows the 5-number summary of price for houses in the 4 suburbs.Construct side-by-side boxplots for the price of the houses in the 4 suburbs. Briefly comment upon any differences you observe in house price for each suburb.Are there any outliers present? If so, are they of particular interest?State which central measure would be best to use to describe the centre of this distribution, and the reason(s) why.Prepare a summary table that shows the mean and standard deviation of Price for houses in the 4 Suburbs according (subject) to the variable Bedrooms. Think carefully about the layout of the rows and columns of your table. As well as means and standard deviations you should also include the number of houses in each group. So each cell in your final table should contain the mean, the standard deviation and n, the number of houses in that group.
Refer to part (e). Comment, in bullet point form, on the Price of any combinations for
Suburb and Bedrooms variables (i.e. cells in the table).
One of the clients wants information on size of houses as it relates to price.
Produce a scatter plot of Price vs Size (Size should be on the horizontal axis). Make sure you label your axes properly and that your graph has an appropriate title.Refer to part (a). Briefly, describe the nature of the relationship between these 2 variables.
Now, create a new variable (column) labelled Size Group which divides Size up into two size groups as follows:
Under 200 square metersSmall200 square meters and overLargeStatistics Research and Report Assessment
Produce suitable graphs or charts to help in providing the information requested on the
Size of the house as it relates to Price.
Construct 95% confidence interval for small and large houses Price.
Refer to (ii). Is there any interaction (overlap) between the 2 Confidence Intervals? What does this tell you about the Prices for the two Sizes.
Part 2: Research Questions
Based on your random sample, identify and investigate TWO research questions of your own using inferential statistics (estimation and hypothesis testing).
Statistics Research and Report Assessment
The postappeared first on .