Advertisement

Using the original data, select the Date and AveragePrice and Region. Filter the data where region e

阅读量:

2022-11-22
2)AveragePrice represents the column of interest. Compute the summary statistics for the AveragePrice variable.

复制代码
    data <- read.csv("r 数据.csv")
    summary(data$AveragePrice)
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ##   0.440   1.100   1.370   1.406   1.660   3.250
    
    
      
      
      
      
    
    AI助手

What is the average value of AveragePrice from all data points? For each year when data exists, what is its corresponding average value? Plot a bar graph showing Year on X-axis and Mean of AveragePrice per specified year. (8)

复制代码
    library(tidyverse)
    
    data %>%
    group_by(year) %>%
    summarise(AveragePrice = mean(AveragePrice))
    ## # A tibble: 4 x 2
    ##    year AveragePrice
    ##   <int>        <dbl>
    ## 1  2015         1.38
    ## 2  2016         1.34
    ## 3  2017         1.52
    ## 4  2018         1.35
    data %>%
    group_by(year) %>%
    summarise(AveragePrice = round(mean(AveragePrice),2))%>%
    ggplot(aes(x = factor(year ), y = AveragePrice)) +geom_col()+ xlab("year") +   geom_text(aes(label = AveragePrice), vjust = 2, colour = "white")
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手
在这里插入图片描述

Which region has the highest average price in the entire dataset, and what is its value? (8)

复制代码
    data %>%
    group_by(region) %>%
    summarise(AveragePrice = mean(AveragePrice)) %>%
    arrange(desc(AveragePrice))
    ## # A tibble: 54 x 2
    ##    region              AveragePrice
    ##    <chr>                      <dbl>
    ##  1 HartfordSpringfield         1.82
    ##  2 SanFrancisco                1.80
    ##  3 NewYork                     1.73
    ##  4 Philadelphia                1.63
    ##  5 Sacramento                  1.62
    ##  6 Charlotte                   1.61
    ##  7 Northeast                   1.60
    ##  8 Albany                      1.56
    ##  9 Chicago                     1.56
    ## 10 RaleighGreensboro           1.56
    ## # ... with 44 more rows
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

Remove all other columns except Date, AveragePrice, and Region. Group data according to the TotalUS Region. Generate a line graph with Date on the X-axis and the average price of TotalUS Region for each week on the Y-axis. (5)

复制代码
    data$Date <- parse_date(data$Date,format  = "%m/%d/%Y" )
    data %>%
    select(Date,AveragePrice,region)%>%
    filter(region == "TotalUS")%>%
    group_by(Date)%>%
    summarise(AveragePrice = mean(AveragePrice))%>%
    ggplot(aes(x = Date, AveragePrice)) + geom_line() + ggtitle("Average price of TotalUS for the week")
    
    
      
      
      
      
      
      
      
    
    AI助手
在这里插入图片描述

Models Based on the original dataset, extract Date, AveragePrice, and Region. Filter out records where regional area is plains. Then, construct Prophet models and ARIMA models to forecast for one year into the future.

复制代码
    df<- data %>%
    select(Date,AveragePrice,region)%>%
    filter(region == "Plains")%>%
    group_by(Date)%>%
    summarise(AveragePrice = mean(AveragePrice))
    train <-df[1:158,]
    
    test <-df[158:nrow(df),]
    
    ###  prophet 
    library(prophet)
    
    proline <- train
    names(proline) <- c("ds", "y") 
    pro1 <- prophet(proline,
      growth = "linear",
      yearly.seasonality = F, weekly.seasonality = T, seasonality.prior.scale = 5,
      daily.seasonality = F, changepoint.prior.scale = 0.015, seasonality.mode = "additive"
    )
    future <- make_future_dataframe(pro1, period = 12, freq = "week")
    pro <- predict(pro1, future)
    
    pred <- pro$yhat[159:170]
    mae <- function (actual, predicted) 
    {
    return(mean(abs(actual - predicted)))
    }
    mae(test$AveragePrice,pred)
    ## [1] 0.1862995
    library(forecast)
    t  <- ts(train$AveragePrice,frequency = 52,start = 2015)
    
    fit <- auto.arima(t)
    
    pred <-   data.frame(forecast(fit,12))$Point.Forecast
    mae(test$AveragePrice,pred)
    ## [1] 0.07156024
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

ARIMA model performs better on the test set

全部评论 (0)

还没有任何评论哟~