Научная статья на тему 'Bike rental volume prediction via linear regression model'

Bike rental volume prediction via linear regression model Текст научной статьи по специальности «Науки о Земле и смежные экологические науки»

CC BY
114
15
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
PREDICTIVE MODEL / BIKE RENTAL / PREDICTIVE INSTRUMENT / LINEAR REGRESSION

Аннотация научной статьи по наукам о Земле и смежным экологическим наукам, автор научной работы — Tianyu Yu

Aim: This study aimed to build a predictive model for bike rental volume using linear regression. Method: The data set under study is related to 2-year usage log of a bike sharing system namely Capital Bike Sharing (CBS) at Washington, D.C., USA. There were some external sources that corresponding historical environmental values such as weather conditions, weekday and holidays are extractable. All the records were randomly assigned into 2 groups: training sample (50%) and testing sample (50%). Linear regression model was built. Results: For testing sample, the MSE was 798 for the linear regression. In cross validation, the average MSE of the linear model is 806, which indicated the model was stable. Conclusions: In this study, we built a predictive model for bike rental volume using linear regression. This study suggests that it is possible to develop a reproducible and transportable predictive instrument for bike rental volume.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Bike rental volume prediction via linear regression model»

Section 2. Mathematical methods in economics

Tianyu Yu,

Wilbraham & Monson Academy, MA E-mail: [email protected]

BIKE RENTAL VOLUME PREDICTION VIA LINEAR REGRESSION MODEL

Abstract

Aim: This study aimed to build a predictive model for bike rental volume using linear regression.

Method: The data set under study is related to 2-year usage log of a bike sharing system namely Capital Bike Sharing (CBS) at Washington, D.C., USA. There were some external sources that corresponding historical environmental values such as weather conditions, weekday and holidays are extractable. All the records were randomly assigned into 2 groups: training sample (50%) and testing sample (50%). Linear regression model was built.

Results: For testing sample, the MSE was 798 for the linear regression. In cross validation, the average MSE of the linear model is 806, which indicated the model was stable.

Conclusions: In this study, we built a predictive model for bike rental volume using linear regression. This study suggests that it is possible to develop a reproducible and transportable predictive instrument for bike rental volume.

Keywords: predictive model, bike rental, predictive instrument, linear regression.

1. Introduction bikes are needed at stations near residential area,

Since the introduction ofbike share, the popular- and during during weekends, the need for bike

ity and the usage of bike share has skyrocketed in the share in big cities will increase. We can not just

past years. Million of trips are taken by millions of product millions bikes and distribute it everywhere

people every year. Bike stations are at every corner in the city. Not only because of the cost of massive

of the streets, and with the tremendous amount of production, it also causes chaos that makes it hard

bikes out there, it raises the question: Does the dis- to manage. In China, there are bikes everywhere

tribution of bikes really meet people's needs? Or, we along the streets, and it is convenient to rent a bike,

could developed a model that can predict the bike but because of the massive number of bikes, most of

rental volume, thereby making every bike to its maxi- them are not in good conditions which totally take

mum productivity. away the whole point of bike sharing. In stead, it

The need of bike share is influenced by a lot of becomes trouble that the companies has to get rid

factors, which is the main reason why we found off. If bike share company can predict the rough

the situation that at one station, people can't re- number of bikes needed at each station through

turn bikes while at another station, people can find modeling influence factors like wind, temperature,

a bike to rent. For example, in the morning, more time and humidity, we can transport the bike ahead,

and satisfy people's needs with the limited amount of bikes.

2. Data and Methods

Data

The data set under study is related to usage log of a bike sharing system namely Capital Bike Sharing (CBS) at Washington, D.C., USA.

In the CBS system when a rental occurs, the operation software collects basic data about the trip such as duration, start date, end date, start station, end station, bike number and member type. The historical data set of such trip transactions is available online via. To avoid trend issues, we se-

Table 1.- Variables

lect only corresponding data to years 2011. There exists several weather data sources, however, most of them provide only forecasting data and do not contain historical weather reports. There is another group of forecasting sources that contain historical weather reports for specific last days (e.g. 14 days). Another group also contains weather historical report but in daily scale. They got from this source some attributes such as weather temperature, apparent temperature, wind speed, wind gust, humidity, pressure, dew point and visibility for each hour from the period 1 January 2011 to 31 December 2011 for Washington, D.C., USA.

available in this data

season 1: springer, 2: summer, 3: fall, 4: winter)

mnth month (1 to 12)

hr hour (0 to 23)

holiday weather day is holiday or not

weekday day of the week

workingday if day is neither weekend nor holiday is 1, otherwise is 0

weathersit 1: Clear, Few clouds, Partly cloudy, Partly cloudy; 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

temp Normalized temperature in Celsius. The values are derived via (t-t min)/(t max-t min), t min= -8, t max=+39 (only in hourly scale)

atemp Normalized feeling temperature in Celsius. The values are derived via (t-t min)/(t max-t min), t min= -16, t max=+50 (only in hourly scale)

hum Normalized humidity. The values are divided to 100 (max)

windspeed Normalized wind speed. The values are divided to 67 (max)

casual count of casual users

3. Results

The sample size is 4322 in the test sample and 4323 in training sample, a total of8645 records from year 2011.

Table 2.- Descriptive information of test sample and training sample

Training sample Test sample

Season

1 2 3 4 5

1 1000 48.36 1068 51.64

2 1129 51.25 1074 48.75

3 1128 50.36 1112 49.64

1 2 3 4 5

4 1065 49.91 1069 50.09

Holiday

No 4210 50.08 4196 49.92

Yes 112 46.86 127 53.14

Week day

0 609 49.47 622 50.53

1 596 48.26 639 51.74

2 597 48.85 625 51.15

3 642 52.24 587 47.76

4 628 51.27 597 48.73

5 624 50.36 615 49.64

6 626 49.53 638 50.47

Working day

No 1347 49.27 1387 50.73

Yes 2975 50.33 2936 49.67

weathersit 1.44 0.66 1.44 0.65

temp 0.49 0.20 0.48 0.20

atemp 0.47 0.17 0.46 0.18

hum 0.65 0.20 0.64 0.20

windspeed 0.19 0.12 0.19 0.12

Causal rentals 28.67 39.19 28.53 38.49

30 54 78 102 126 150 174 198 222

casual

Figure 1. Distribution of bike rentals in training sample

Distribution of casual

50

40

30

CLi

O

Ei

CL

20

10

6 30 54 78 1 02 1 26 150 174 198 222 246 270

casual

Figure 2. Distribution of bike rentals in test sample

According to the linear regression, season, hour, humidity were significant predictors for bike rental holiday or not, working day or not, temperature and volume.

Table 3.- Linear regression to predict the volume of bike rental

Estimate Std. Error t value Pr(>|t|)

(Intercept) 21.0 2.7 7.7 0.000 ***

season 1.9 0.7 2.5 0.011 *

mnth -0.3 0.2 -1.4 0.150

hr 1.1 0.1 17.0 < 2e-16 ***

holiday -8.6 2.7 -3.1 0.002 **

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

weekday 0.1 0.2 0.5 0.645

workingday -29.6 1.0 -30.7 < 2e-16 ***

weathersit 0.3 0.8 0.5 0.643

temp 78.8 18.3 4.3 0.000 ***

atemp 11.0 20.5 0.5 0.593

hum -49.2 2.7 -18.2 < 2e-16 ***

windspeed -2.3 3.9 -0.6 0.550

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

4. Discussion

The number of major cities that are becoming bike-friendly is growing in recent years. It is expected that in a near future, most major cities provide this service along their other public transport services. How to better predict the rental volume is a key challenge to the business.

In this study, we built a predictive model for bike rental volume using linear regression. This study suggests that it is possible to develop a reproducible and

transportable predictive instrument for bike rental volume.

According to the linear regression, season, holiday or not, working day or not, temperature and humidity were significant predictors for rental volume.

In conclusion, we used linear regression model to predict the bike rental volume. This study is of great importance considering the fast growth of bike rental business in major cities around the world.

i Надоели баннеры? Вы всегда можете отключить рекламу.