Section 2. Mathematical methods in economics
Tianyu Yu,
Wilbraham & Monson Academy, MA E-mail: [email protected]
BIKE RENTAL VOLUME PREDICTION VIA LINEAR REGRESSION MODEL
Abstract
Aim: This study aimed to build a predictive model for bike rental volume using linear regression.
Method: The data set under study is related to 2-year usage log of a bike sharing system namely Capital Bike Sharing (CBS) at Washington, D.C., USA. There were some external sources that corresponding historical environmental values such as weather conditions, weekday and holidays are extractable. All the records were randomly assigned into 2 groups: training sample (50%) and testing sample (50%). Linear regression model was built.
Results: For testing sample, the MSE was 798 for the linear regression. In cross validation, the average MSE of the linear model is 806, which indicated the model was stable.
Conclusions: In this study, we built a predictive model for bike rental volume using linear regression. This study suggests that it is possible to develop a reproducible and transportable predictive instrument for bike rental volume.
Keywords: predictive model, bike rental, predictive instrument, linear regression.
1. Introduction bikes are needed at stations near residential area,
Since the introduction ofbike share, the popular- and during during weekends, the need for bike
ity and the usage of bike share has skyrocketed in the share in big cities will increase. We can not just
past years. Million of trips are taken by millions of product millions bikes and distribute it everywhere
people every year. Bike stations are at every corner in the city. Not only because of the cost of massive
of the streets, and with the tremendous amount of production, it also causes chaos that makes it hard
bikes out there, it raises the question: Does the dis- to manage. In China, there are bikes everywhere
tribution of bikes really meet people's needs? Or, we along the streets, and it is convenient to rent a bike,
could developed a model that can predict the bike but because of the massive number of bikes, most of
rental volume, thereby making every bike to its maxi- them are not in good conditions which totally take
mum productivity. away the whole point of bike sharing. In stead, it
The need of bike share is influenced by a lot of becomes trouble that the companies has to get rid
factors, which is the main reason why we found off. If bike share company can predict the rough
the situation that at one station, people can't re- number of bikes needed at each station through
turn bikes while at another station, people can find modeling influence factors like wind, temperature,
a bike to rent. For example, in the morning, more time and humidity, we can transport the bike ahead,
and satisfy people's needs with the limited amount of bikes.
2. Data and Methods
Data
The data set under study is related to usage log of a bike sharing system namely Capital Bike Sharing (CBS) at Washington, D.C., USA.
In the CBS system when a rental occurs, the operation software collects basic data about the trip such as duration, start date, end date, start station, end station, bike number and member type. The historical data set of such trip transactions is available online via. To avoid trend issues, we se-
Table 1.- Variables
lect only corresponding data to years 2011. There exists several weather data sources, however, most of them provide only forecasting data and do not contain historical weather reports. There is another group of forecasting sources that contain historical weather reports for specific last days (e.g. 14 days). Another group also contains weather historical report but in daily scale. They got from this source some attributes such as weather temperature, apparent temperature, wind speed, wind gust, humidity, pressure, dew point and visibility for each hour from the period 1 January 2011 to 31 December 2011 for Washington, D.C., USA.
available in this data
season 1: springer, 2: summer, 3: fall, 4: winter)
mnth month (1 to 12)
hr hour (0 to 23)
holiday weather day is holiday or not
weekday day of the week
workingday if day is neither weekend nor holiday is 1, otherwise is 0
weathersit 1: Clear, Few clouds, Partly cloudy, Partly cloudy; 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp Normalized temperature in Celsius. The values are derived via (t-t min)/(t max-t min), t min= -8, t max=+39 (only in hourly scale)
atemp Normalized feeling temperature in Celsius. The values are derived via (t-t min)/(t max-t min), t min= -16, t max=+50 (only in hourly scale)
hum Normalized humidity. The values are divided to 100 (max)
windspeed Normalized wind speed. The values are divided to 67 (max)
casual count of casual users
3. Results
The sample size is 4322 in the test sample and 4323 in training sample, a total of8645 records from year 2011.
Table 2.- Descriptive information of test sample and training sample
Training sample Test sample
Season
1 2 3 4 5
1 1000 48.36 1068 51.64
2 1129 51.25 1074 48.75
3 1128 50.36 1112 49.64
1 2 3 4 5
4 1065 49.91 1069 50.09
Holiday
No 4210 50.08 4196 49.92
Yes 112 46.86 127 53.14
Week day
0 609 49.47 622 50.53
1 596 48.26 639 51.74
2 597 48.85 625 51.15
3 642 52.24 587 47.76
4 628 51.27 597 48.73
5 624 50.36 615 49.64
6 626 49.53 638 50.47
Working day
No 1347 49.27 1387 50.73
Yes 2975 50.33 2936 49.67
weathersit 1.44 0.66 1.44 0.65
temp 0.49 0.20 0.48 0.20
atemp 0.47 0.17 0.46 0.18
hum 0.65 0.20 0.64 0.20
windspeed 0.19 0.12 0.19 0.12
Causal rentals 28.67 39.19 28.53 38.49
30 54 78 102 126 150 174 198 222
casual
Figure 1. Distribution of bike rentals in training sample
Distribution of casual
50
40
30
CLi
O
Ei
CL
20
10
6 30 54 78 1 02 1 26 150 174 198 222 246 270
casual
Figure 2. Distribution of bike rentals in test sample
According to the linear regression, season, hour, humidity were significant predictors for bike rental holiday or not, working day or not, temperature and volume.
Table 3.- Linear regression to predict the volume of bike rental
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.0 2.7 7.7 0.000 ***
season 1.9 0.7 2.5 0.011 *
mnth -0.3 0.2 -1.4 0.150
hr 1.1 0.1 17.0 < 2e-16 ***
holiday -8.6 2.7 -3.1 0.002 **
weekday 0.1 0.2 0.5 0.645
workingday -29.6 1.0 -30.7 < 2e-16 ***
weathersit 0.3 0.8 0.5 0.643
temp 78.8 18.3 4.3 0.000 ***
atemp 11.0 20.5 0.5 0.593
hum -49.2 2.7 -18.2 < 2e-16 ***
windspeed -2.3 3.9 -0.6 0.550
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
4. Discussion
The number of major cities that are becoming bike-friendly is growing in recent years. It is expected that in a near future, most major cities provide this service along their other public transport services. How to better predict the rental volume is a key challenge to the business.
In this study, we built a predictive model for bike rental volume using linear regression. This study suggests that it is possible to develop a reproducible and
transportable predictive instrument for bike rental volume.
According to the linear regression, season, holiday or not, working day or not, temperature and humidity were significant predictors for rental volume.
In conclusion, we used linear regression model to predict the bike rental volume. This study is of great importance considering the fast growth of bike rental business in major cities around the world.