Using the original data, select the Date and AveragePrice and Region. Filter the data where region e
2022-11-22
2)AveragePrice represents the column of interest. Compute the summary statistics for the AveragePrice variable.
data <- read.csv("r 数据.csv")
summary(data$AveragePrice)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.440 1.100 1.370 1.406 1.660 3.250
AI助手
What is the average value of AveragePrice from all data points? For each year when data exists, what is its corresponding average value? Plot a bar graph showing Year on X-axis and Mean of AveragePrice per specified year. (8)
library(tidyverse)
data %>%
group_by(year) %>%
summarise(AveragePrice = mean(AveragePrice))
## # A tibble: 4 x 2
## year AveragePrice
## <int> <dbl>
## 1 2015 1.38
## 2 2016 1.34
## 3 2017 1.52
## 4 2018 1.35
data %>%
group_by(year) %>%
summarise(AveragePrice = round(mean(AveragePrice),2))%>%
ggplot(aes(x = factor(year ), y = AveragePrice)) +geom_col()+ xlab("year") + geom_text(aes(label = AveragePrice), vjust = 2, colour = "white")
AI助手

Which region has the highest average price in the entire dataset, and what is its value? (8)
data %>%
group_by(region) %>%
summarise(AveragePrice = mean(AveragePrice)) %>%
arrange(desc(AveragePrice))
## # A tibble: 54 x 2
## region AveragePrice
## <chr> <dbl>
## 1 HartfordSpringfield 1.82
## 2 SanFrancisco 1.80
## 3 NewYork 1.73
## 4 Philadelphia 1.63
## 5 Sacramento 1.62
## 6 Charlotte 1.61
## 7 Northeast 1.60
## 8 Albany 1.56
## 9 Chicago 1.56
## 10 RaleighGreensboro 1.56
## # ... with 44 more rows
AI助手
Remove all other columns except Date, AveragePrice, and Region. Group data according to the TotalUS Region. Generate a line graph with Date on the X-axis and the average price of TotalUS Region for each week on the Y-axis. (5)
data$Date <- parse_date(data$Date,format = "%m/%d/%Y" )
data %>%
select(Date,AveragePrice,region)%>%
filter(region == "TotalUS")%>%
group_by(Date)%>%
summarise(AveragePrice = mean(AveragePrice))%>%
ggplot(aes(x = Date, AveragePrice)) + geom_line() + ggtitle("Average price of TotalUS for the week")
AI助手

Models Based on the original dataset, extract Date, AveragePrice, and Region. Filter out records where regional area is plains. Then, construct Prophet models and ARIMA models to forecast for one year into the future.
df<- data %>%
select(Date,AveragePrice,region)%>%
filter(region == "Plains")%>%
group_by(Date)%>%
summarise(AveragePrice = mean(AveragePrice))
train <-df[1:158,]
test <-df[158:nrow(df),]
### prophet
library(prophet)
proline <- train
names(proline) <- c("ds", "y")
pro1 <- prophet(proline,
growth = "linear",
yearly.seasonality = F, weekly.seasonality = T, seasonality.prior.scale = 5,
daily.seasonality = F, changepoint.prior.scale = 0.015, seasonality.mode = "additive"
)
future <- make_future_dataframe(pro1, period = 12, freq = "week")
pro <- predict(pro1, future)
pred <- pro$yhat[159:170]
mae <- function (actual, predicted)
{
return(mean(abs(actual - predicted)))
}
mae(test$AveragePrice,pred)
## [1] 0.1862995
library(forecast)
t <- ts(train$AveragePrice,frequency = 52,start = 2015)
fit <- auto.arima(t)
pred <- data.frame(forecast(fit,12))$Point.Forecast
mae(test$AveragePrice,pred)
## [1] 0.07156024
AI助手
ARIMA model performs better on the test set
