№4(8) 2007
F. Carlevaro, C. Schlesser, M.-E. Binet, S. Durand, M. Paul
Econometric Modeling and Analysis of Residential Water Demand Based on Unbalanced Panel Data
This paper develops an econometric methodology devised to analyze a sample of time unbalanced panel data on residential water consumption in the French island La Reunion with the purpose to bring out the main determinants of household water consumption and estimate the importance of water consumption by uses. For this purpose, we specify a daily panel econometric model and derive, by performing a time aggregation, a general linear regression model accounting for water consumption data recorded on periods of any calendar date and time length. To estimate efficiently the parameters of this model we develop a feasible two step generalized least square method. Using the principle of best linear unbiaised prediction, we finally develop an approach allowing to consistenly break down the volume of water consumption recorded on household water bills by uses, namely by enforcing this estimated decomposition to add up to the observed total. The application of this methodology to a sample of437 unbalanced panel observations shows the scope of this approach for the empirical analysis of actual data.
A study carried on in 1997 in the French island La Reunion, brought out an over-consumption of water, namely a consumption of 246 liters/day/inhabitant against 145 liters/day/ inhabitant in continental France.
In order to explain this global figure, the Regional Directorate for the Environment (DIREN) commissioned the «Centre de recherches economiques et sociales de l'Universite de La Reunion» (CERESUR) to carry out a study aiming at:
• measuring domestic water consumption, stricto sensu, in order to assess the quantitative importance of the over-consumption of water;
• explaining the determinants of water consumption for the residential sector, in particular, the role played by the uses, the equipments, and the behaviour of individuals, in the over-consumption ofwater, in orderto develop policy measures of rational use of water resource, allowing to reduce waste.
To achieve these goals, a stratified random survey was designed and conducted in 2004 on a sample of 2000 households, in order to collect the necessary information to perform these analyses. Carried out by telephone, this first survey was completed with a mailing to 1000 volunteer households, intended to collect the volume ofwater consumption displayed in the last three bills. Unfortunately, this mail survey provided only 173 reliable responses.
Our study follows the first analysis of the data of this survey carried out by CERESUR's team in charge of the DIREN study [Binet et al. (2005)]. It intends to go deeper in the analysis of this
1. Introduction
81
I
СЭ
I
No4(8) 2007
information by developing a microeconometric model allowing to bring out the main climatic, demographic, economic, and technological determinants of household water consumption and to estimate the share of each type of use in the total volume of water consumed by households.
Our presentation is organized as follows. In Section 2, we specify a causal econometric model describing the daily water consumption of a panel of households according to the distinct uses of water they make at home. The impact of unobserved factors of heterogeneity in household behaviour is taken into account by means of an error component structure of model disturbances. In Section 3 we derive, from this model, a conventional general linear regression model explaining the available time unbalanced panel data for empirical analysis. Based on this conventional regression model, we develop, in Section 4, an efficient two step feasible generalized least square estimator for the model parameters. Using this model, we analyse, in Section 5, the DIREN survey data in order to bring out the determinants of two broad categories of home water uses, namely «essential water uses» and «leisure water uses», respectively1. Finally, we develop, in the last Sec-w tion 6, an efficient method to decompose the observed volumes of water consumption accor-§ ding to these two categories of uses and we analyse the results of its empirical application to our | sample of observations. To conclude we comment on the scope of our methodology for the em-<2 pirical analysis of actual data and outline the most promising developments we plan to carry out.
<u
| 2. Modeling Daily Water Consumption
3
c o
T3
<u <8
CQ
T3 _
£ h -
where Yih (t) stands for the quantity of water consumption in housing i, during day t, for the use of type h.
In turn, we decompose water consumption per use according to the following identity
Our model decomposes the total daily water consumption in a given housing, Yi (t), where i is the index of a housing and t that of the day, according to h - 1,...,H distinct uses of water in a home. This decomposition is expressed by the following identity
H
Y, (t) Yh (t), (1)
population (individuals or equipments) for use h in housing i.
From identities (1) and (2) we infer a causal econometric model explaining variable Y,(t),
I Yh (t) - Qh (t) Nh, (2)
w
oc where Qh(t) expresses a daily water consumption per user and Nh the size of water users'
13 £
<s
■I by modeling the unobserved variable Qh(t) according to the linear error components model,
1 namely
& Qh (t) - xh (t)'ph + eh + eh (t), (3)
§ where
| xh (t) - vector of Kh exogenous explanatory factors,
ph - vector of Kh impact parameters on Qh (t), for explanatory factors xh (t), § eh - random disturbance, expressing the households' heterogeneity with respect to water
§ consumption per user for use of type h,
o
Empirical results are drawn out of a Master thesis of [Schlesser (2006)]. 82
No4(8) 2007
eh(t) = random disturbance, expressing the impact of unspecified factors influencing water 13
consumption per user for use of type h during day t.
¿2
Inserting (3) into (2), and the result into (1), we obtain a linear regression model ^
§
Y, (t) = xi (t)'p+e, (t), (4) S
W
where x,(t) = [xh(t)Nh]h.1 H, p = [ph]h.1 H *
.s
OQ
h
and e, (t ) = £( ef + ef (t ))Nf. (5) J
h=1 Ï
<u
The moments of the random disturbances e,(t), derived from the moments of the joint <g distribution of random vectors ej = [e,h]h=1 H and ej (t)= [eh(t)]h=1 H, are «
W
• centered, o
E( ei ) = E( ei ( t )) = 0; (6) |
• homoscedastic, f
V(ei) = S and V(ei(t)) =2; (7) £
• non-autocorreled,
E(ei(t) ej (x)') = 0, t * x; (8)
• non-intercorreleted,
E(eie') = E(e, (t)e; (t)') = 0, i * j and E(eiei(t)') = 0. (9)
Expressing disturbances s(t)as a linear combination of random vectors e, and e,(t), namely as
ei (t) = n' (ei + ei (t)), (10)
where n, = [ Nh ] h=.....H,
we conclude that these disturbances are
• centered,
E(s i (t)) = 0; (11)
• heteroscedastic,
V(s i (t)) = n' (S= s,2 +ct 2, where s2 = n'Sn and ct2 = n'En; (12)
• equicorreleted (for a given housing),
Cov(si(t); si (t)) = n'Sni, t * x; (13)
• non-intercorreleted,
E(si (t) sj (x)) = 0, i * j. (14)
Heteroscedasticity (12) follows from the variability of size nt across the population of water users but does not change in time. Similarly, the correlation between the terms of time series
-V
PoccMMCKO-wBe^apcMM ceMUHap no эконометрмке u CTaTMCTMKe
83
No4(8) 2007
disturbances s, (t), which is constant (equicorrelation) for a given household, varies across households as a function of the size n, of the population of users. As far as the absence of correlation across the terms of two distinct disturbances times series is concerned, s, (t) and s7(t), i ^ 7, it is the consequence of assumptions (8) and (9).
3. Modeling Observed Water Consumption
Residential water consumption from DIREN's survey is not observed on a daily base, but for durations and periods of time that change across the surveyed households.
Let A = {t 0,10 + 1,...,t,} be a period of observation, where 10 and t, are the first and last day of the period, respectively. The waterconsumption of housing i observed during such a period is, by definition,
Y, (A) = £y, (t). (15)
teA
.S The econometric model explaining such an observation can be derived from the daily econometric model (4)-(5) by performing a time-aggregation of this model overthe observation period § A, which leads to
CL
I Yi (A) = x, (A)'p+si (A), (16)
c
I where
■o £
s, ( A) = £s, (t) = n (d( A)e, + e, ( A)), d( A) = t, -10 +1, e,( A) e, (t). (18)
g x,. ( A) = 2 x,. (t) = [xh ( A)Nh ] h=1.....h , xh ( A) = 2 xh (t) (17)
C t eA teA
® and
<u «
CQ teA t eA
g Notethat ifan exogeneous factorxhk (t)does notvary during the observation period's A, i.e. if §
q xhk ( t) = xhk,
iS
HI we obtain
75 xhk ( A) = d( A)xhk.
■g To compute the moments of random disturbances s, (A), it is useful to first compute those of J random vectors ei(A). Using moments (6) to (9), we derive 13
E( e,( A)) = 2 E( e ,(t )) = 0, (19)
t eA
<1 V( e,( A)) = E(e, (t ) e,( x)') = 2 V( e,(t)) = d( A)£ (20)
Ç teAxeA t eA
I Cov(e,(A); e,(A*)) = 22E(e/(t)e,(x)') = 2^(e,(t)) = d(An A*)Z; (21)
te A xe A * teAnA *
si Cov(e,(A); ej(A*)) =2 2E(e,(f)ej(x)') = 0 if i * j; (22)
■jj teA xeA*
§ Cov(e,; ej(A)) = 2E(e, ej(t)') = 0 if j. (23)
C teA
O
uj From these moments, we easily derive those of the random disturbances s, (A) of the model (16), namely
84
№4(8) 2007
E(s i (A)) = n' (d( A) E( ei) + E(ei (A))) = 0; (24) |
V(s i (A)) = n'V(d(A)ei + ei(A))ni = n' [d(A)2 S+ d(A)E]ni; (25) §
c
Cov(s,(A); s,(A*)) = n'Cov(d(A)ei + (A); d(A*)ei + ei(A*))ni = n' [d(A)d(A*)S + d(An A*)E]n(-; (26) ¡3
Q
Cov(s,(A); Sj(A*)) = n'Cov(d(A)ei + (A); d(A*)e+ e,(A*))ni = 0, i * j. (27)
Compared to random disturbances of the daily water consumption model (4), those of the ob- ®
served water consumption model (16) are subject to a new source of heteroscedasticity and auto- ^
correlation, following from the possible variability ofthe duration of observation periods d( A )and c
d( A *). §
4. Generalized Least Squares Estimation |
Relation (16) leads to a generalized linear regression model allowing estimating the parame- o ters p for the DIREN's sample of observation, denoted by
Yt = Yi (А^), xt = x, (At), t = 1,...,T;, i = 1,..., m, (28)
where Att, t = 1,... ,T,, refer to the available disjoint periods of observation (AitnA,, =0, t ^x)for housing i.
The observations of this unbalanced panel are explained by the generalized linear regression model
Yt = x;P +s,f, t = 1,...,T, i = 1,...,m, (29)
where random disturbances st = si(At) are
• centered,
E(st) = 0, (30)
• heteroscedastic,
E(s2f) =ст2dit + s2d2, where dit = d(At), (31)
• serially correleted (for a given housing),
E(stSi,) = Si2dtdix, t Ф x, (32)
• uncorreleted (accross housings),
E(SitSjx) = 0, i ф j. (33)
To write this model in the classical matrix form, we first specify the models explaining the available individual housing time series of observations. These models are written as
yi = X- P+s i, i = 1,..., m, (34)
where
yi = [Yt ]t =1.....Ti, X, = [x'lt ]f=1.....ъ, si = [St ]t=1.....Ti.
Moments of random disturbances si may be written in the matrix form:
E(si) = 0, E(sis)) = ст2Di + s,2did', = Q,, E(sis') = 0, i Ф j, (35)
where dj = [dit]t=1.....Ti and Di = diag(d/).
85
§ <и
№4(8) 2007
Finally, we derive a compact matrix form of a generalized linear model for the whole observations panel by stacking the m housing time series models
y = Xp+s,
where y = [y,].....m, X = [X, ].....m, s = [s,].....m, E(s) = 0
and
E(ss') = diag( Q1,...,Qm) = Q. The BLUE estimator for p of this model is the GLS estimator
i m \ 1 m
p GlS = (XQ-1X) - XQ-1 y = I £ X' Q-1X, I £ X' Q-1 y,.
(36)
(37)
(38)
■S
ca
1С с
тз <u u с ■S
JS
с
0 тз
<u <л
со тз с
1
Q
I
с ОС
15
,<в £
«
с Ч
тз
с «
а
I
о
<и
I §
£
It may be computed by using an analytic expression of Q1 obtained through the application of the inversion formula:
(M + ^v') -1 = M - -
m ~Vv'M -1 + ц'М v'
where M = ct2 Di, ^ = s,2 di, v = di. This formula may be written as
Q-i 1 D- --S-' a - I ' a - + s-5,.
■г,- г,
(39)
where i, indicates the unit vector with T, components and 8,- =i'd, = ^dit denotes the total duration of observation (in days) of water consumption in housing i. t=1
By transforming the matrices and the vectors of observations as follows:
X, = D; 2X- =
Vd"
X, = X' г, =
, = D<2 y, =
T,
Zx"
T
= г' y, =Z ;
we can compute matrices and vectors in formula (38) directly from these transformed data:
X'Q-1X,. = —IX'X, ----X,X,' 1 = M,;
' ' a - + 5 ¡s - '''
X'Q-1 y, = JL|X'y,--r+25
a 2 +5 ¡s 2
x-,y ,• I = m,.
Finally, we derive the simplifed GLS estimation formula:
p gls = M -]m,
mm
where M = ^Mi and m = ^mi.
(40)
(41)
(42)
(43)
(44)
t =•
,=1
,=1
86
No4(8) 2007
In practice the variance-covariance matrices S and 2 are not known, and, therefore, our GLS ]3
estimator is not feasible. To develop a feasible GLS estimator we rely on a two step method, starting ^ with a consistent estimation of these matrices, S and 2, using the ordinary least squares residuals
of the model (36), followed by a GLS estimation of p based on the previous estimation of S and 2. |
According to a suggestion of [Vonesh and Chinchilli (1997)], consistent estimators of S and 2 q can be obtained by using the following two steps procedure.
-\
PoccMMCKO-wBe^apcMM ceMUHap no эконометрмке u cTaTMCTMKe
w
• In the first step, we lookfor consistent estimators of S and 2, assuming p to be known. Such estimators, denoted by S(p) and 2(p), may be derived by applying OLS to the following linear regression model with respect to the vector 9 of the independent components of S and 2
sit(P)sk(P) = ditdkn ¡Sn i + Stxdin2n, + -q it = z¡tx9 + -nitx, t < x = 1,...,T,, i = 1,...,m, (45)
where stt(P) = Ytt -x'tp, Stx is the «Kronecker delta» (1 if t =x, 0 otherwise) and ^it a centered random disturbance. For the true value of p, the OLS estimator 9(P) is consistent for 9.
<u .c
UJ
<u «
■i: u w
• In the second step, we estimate the dependent variables of the model (45) using the resi- ^
duals of an OLS estimate of the model (29), namely <3
ul
st = y,t -x¡tp mco, (46)
where p mco = (^ )-1 ^ y.
Then we estimate parameters 9 of this model by OLS, which provides consistent estimates, 9 = (pMCO), for the elements of matrices S et 2.
This FGLS estimator for p is consistent and asymptotically distributed according to the normal distribution
1 ' >/¡0-1 VA -1
Nip; -(rQ-1X)-1I, (47)
m
where T = ^ T,.
¡=i
Therefore, statistical inference about parameters p can be based on this asymptotic distribution, using the following consistent estimator for the asymptotic variance-covariance matrix of
p FGLS
Va (pFGLS) = T:(X«-1X)_1, (48)
where «1 = «(9).
5. Empirical Application
Due to data limitations, our present application of the model considers only two broad categories of water uses, namely «essential uses of water» and «leisure uses of water».
We measure the size of the population of water users for the first category of uses by means of an equivalence scale, where the adult is chosen as unit of measurement and any child is weighted as a fraction of an adult. Therefore, the size of this population of users is computed as a number of equivalent adults according to the following formula:
N1 = A, + aE,, (49)
87
с
ОС
15 &
w с ч тз
с «
а
I
о
№4(8) 2007
where a, refers to the number of adults of household i, Ej to the number of children of the household, and 0< a< 1 to the weighting coefficient for children. Experiments performed by using different values of the equivalence coefficient a, ranging between 0.25 and 1, have convinced us that 0.5 is the optimal choice for this coefficient, both from the point of view of model fitting and with respect to the economic relevance of parameter estimates.
When measuring the size of the population of users of the second category of uses, we would have liked to use data on the size of the garden and the volume of the swimming pool, but DIREN's survey gives only information on the presence or not of these equipments. This led us to assume that the size of these equipments is a function of household income per equivalent adult as such equipments are present only in well-off households. As income information provided by DIREN's survey is recorded in terms of income intervals, we use an imputation model (presented in the Appendix) to estimate the household's level of income.Therefore, the size of the population of water users for the second category of uses is defined as
■S f R
jSw n2 I—T for housings with private garden, (50)
"cb 1
£ 10 otherwise,
T3 <u u c ■S
JS
c
^ • the proportion of employed adults, X defined as the number of employed adults of house-
is
£ CQ
T3
c £
q • the number of rooms per equivalent adult, X2, defined as the number of rooms of housing Js i divided by the household number of equivalent adults. We expect a positive impact of this va-i riable on household water consumption, as water consumption for keeping up the house increases with the size of the housing;
• we experimented with other explanatory variables, such as the use of a dishwasher, the presence of a vegetable garden, the type of housing, etc. butthe impact parameters of all these variables turned out to be non-significant or of unsuitable sign.
where Ri refers to the imputed income for household i in thousand of euros per month (k€).
The daily water consumption per user of the «essential uses of water» is modeled as a linear function of two explanatory variables, namely:
• the proportion of employed adults, X,1, del hold i divided by the household number of equivalent adults. We expect a negative impact of this variable on household water consumption, as employed adults spend more time outside home than the other members of the family;
We modeled the daily water consumption per k€/equivalent adult for leisure needs, as a linear function of two explanatory variables:
• the rainfall, X3 (t) (when considering the maintenance of the private garden), defined as the precipitations in centimetres (cm) for the day t. We expect a negative impact of this variable on household water consumption as garden watering decreases when it rains;
• the presence of a private swimming pool, X,4, defined as a dichotomous variable
| |1 if the housing has a private swimming pool,
<5 |0 otherwise.
Using these assumptions, the specified model of household water consumption is written as 88
У, (А) = ß'd(A)N,' +ßXd(A)N' +ß3X,2d(A)N' +ß4 d(A)N,2 +ß5X3(A)d(A)N2
No4(8) 2007
-ß6 X,4 d( A )N 2 +Б, (A )(52)
Here Y(A) stands for the water consumption of household i in liters (It) during the period A, d(A) represents the duration in days of period A and X3 (A) symbolizes the average precipitations in millimeters for the period A. This last variable allows expressing the total precipitations X3 (A) during the period A, as the product: X,3 (A)d(A). To make the interpretation of the model estimates easier, we present in Table 1 the economic meaning and the unit of measurement of the parameters of the model.
Table I
Economic meaning and unit of measurement of model parameters
Parameter Economic meaning Unit of measurement
ß' Constant daily water consumption per equivalent adult lt/[day ■ equivalent adult]
ß2 Marginal impact of employed adults on daily water consumption lt/[day ■ employed adult]
ß3 Marginal impact of number of rooms on daily water consumption lt/[day ■ room]
ß4 Constant daily water consumption per unit of household income lt/[day ■ k€/equivalent adult]
ß5 Marginal impact of rainfall on daily water consumption per unit of household income lt/[cm rain ■ day k€/equivalent adult]
ß6 Impact of owning a private swimming pool on daily water consumption per unit of household income lt/[day ■ k€/equivalent adult]
3 ¿2
с
s
Cl
w щ
.c oa uj
<u w
■S ■c u W
О
8 S
щ
Ь
To estimate this model, we used the data from the DIREN survey complemented by daily metereological observations recorded by the metereological stations (about a hundred) set up in La Reunion island by Meteo France Agency. These geographically distributed metereological observations allowed us to compute the rainfall to be used for each household according to the observations recorded at the closest metereological station.
While the socio-economic characteristics of the household as well as the characteristics of its housing used to quantify the non-metereological explanatory variables of the model have been collected from a random sample of 2000 households interviewed in 2004 by telephone, the volumes of water consumption for analysis have been provided only by some volunteers of this sample, having accepted to fill out a questionnaire concerning the amounts and volumes of their last three water consumption bills. Unfortunately, this approach allowed us to collect only 173 reliable household answers providing information on one to three bills staggered over the years 1998 to 2004, namely 437 observations of which 340 from households with a private garden, and only 35 (12 households) with a private swimming pool as well. Moreover, while a comparative analysis of the caracteristics of the telephone survey sample and that of the volonteers with data provided by the 1999 national census for La Reunion island [Binet et al. (2005)], confirms the representativeness of the first sample with respect to the household size and the distribution by district and weather, nevertheless, we observe in the second sample an overrepresentation of households with two persons and of those located in some urban districts. Therefore, our sample may have been generated by some self-selection mechanism that can flaw the optimal sampling properties of our FGLS estimator.
89
№4(8) 2007
■S
са
1С с
тз
(U U
с
■S
JS
с о тз
(U «
,2 со
тз с
и
Q
I
с
ОС
15 £
W
с ч
тз с
W
а
I
о
(U
I
<5 £
Finally, the household income level used to quantify a proxy for the size of the user population for «leisure uses of water», which was not observed directly, but was indirectly estimated with the imputation model described in the Appendix. In order to assess the sensitivity of the estimates of the model (52) with respect to the assumptions underlying the imputation model, we conducted three evaluations of the household income level based on differents specifications of the imputation model. Table 2 presents the FGLS estimates of the impact parameters of model (52), using these three imputed values of household income level, namely:
• income 1: household income imputed without income indicators;
• income 2: household income imputed using household standard of living;
• income 3: household income imputed using household head profession.
Table 2
Parameter estimates of model (52)
Parameter Parameter estimatesa according to imputed household income Income 1 Income 2 Income 3
Р1 80.5* 85.8* 87.6*
(19.6) (19.8) (19.4)
Р2 -78.5* -72.3* -65.7*
(23.6) (23.8) (23.3)
Р3 104.0* 99.1* 94.0*
(12.2) (12.5) (12.2)
Р4 190* 200* 200*
(45.3) (49.0) (48.7)
Р5 -94.5* -76.3* -43.4*
(2.62) (2.42) (1.99)
Р6 69.0 39.9 45.0
(90.7) (94.8) (98.8)
1 Figures in brackets are estimated asymptotic standard errors. * Indicates statistical significance against a one-sided alternative at the 1c
These results call for the following comments:
i level.
• Estimates are robust with respect to the model specification used to impute an income level to households. Any household income imputation leads to very narrow parameter estimates, showing the expected sign and a similar level of statistical significance againstthe relevant one-sided alternative. Clearly, robustness of the parameter estimates is stronger for the parameters descri-bying the daily water consumption for «essential uses of water» as the household income is used to quantify a proxy for the size of the population using water for leisure needs.
• Water consumption perequivalentadultfor essential uses consists of a committed consumption p1 estimated to 80/88 liters per day. Every employed adult decreases the household committed consumption by a quantity p2 estimated to 65/79 liters per day, while an extra room in the housing increases the household committed consumption by a quantity p3 estimated to 94/104 liters per day.
90
No4(8) 2007
• In turn, water consumption per k€/equivalent adult for leisure uses consists of a committed 1 consumption for garden watering p4 estimated to 190/200 liters per day.This fixed consumption ^ is reduced by rainfall by a quantity p5 estimated to 43/95 liters per day and per cm of rain. When the housing is equipped with a private swimming pool, one must add to the committed consum- | ption p4 an extra consumption for leisure p6 estimated to 40/69 liters per day. Although non-sig- o nificant, due to the small number of observations, in our sample, for housings equipped by a private swimming pool, this parameter estimate is certainly economically relevant.
-V
PoccMMCKO-wBe^apcMM ceMUHap no эконометрмке u cTaTMCTMKe
91
w
6. Decomposing Observed Water Consumption by Use 7
§
Inspired by former studies, [Chow and Lin (1971)], [Carlevaro (1987, 1994)], [Carlevaro and Ber- c
tholet (1998, 2000)], in particular, we develop in this section a methodology allowing to decom- <g
pose, by uses, the observed water consumption data. More specifically, this approch provides an ij
estimate Yh(A), h - 1,...,H, of unobserved water consumption by uses Y,h(A), h - 1,...,H, consis- ^
tent with the observed water consumption Yi(A), as it enforces the identity: o
Yi (A) = £ Yh (A) = £ Yh (A). (53)
h-1 h-1
This result is obtained by formulating the decomposition problem as an optimal linear prediction problem in the frame of the joint distribution of observed variables Yi(Att) and of unobserved variables Yh(Att), h - 1,...,H, where t - 1,...,T, i - 1,...,m.
We already modeled in Section 4 (Equations (29) to (33)) the distribution of observed variables Yi(A tt), t -1, ..., T, i -1, ..., m.To model the latent variables Yh(Ait), h -1, ..., H, for t -1, ..., T, i-1,...,m we rely on the following relation:
Yih(At) - Nhxh(At)Ph +sh(At), (54)
where
sh(At) - Nh(d(At)e,h + eh (At)) and eh(At) eh(t). (55)
t eA ,
From this relation, we derive the matrix expression of the regression model explaining the time series of the vector of the H uses of water by the household i, namely
y, (At) - X, (At )P + Si (At), t - 1,...,Ti, (56)
where
y,(Ait) - [Yih(Ait)]h-1.....H, X,(Ait) - [ i'h ® Nhxh(Ait)']h-1.....h (57)
and
s, (At) - [sh (At )]h-1.....h = Ni (d(At)e, + e, (At)), (58)
where
Nj - diag(nt).
The symbol i'h used in (57) stands for row h of the unit matrix of size H. The moments of vector random disturbances s, (A tt) can be directly computed from the moments of random vectors e, and e,(At) obtained in Section 3. From formulas (19) to (23), we derive:
• E(s, (At)) = N, (d(At)E(e,) + E(ei (At))) - 0; (59)
• V(s (At)) - NV(d(At )e, + e! (At ))N, - Ni [d(At )2 S + d(At )S]N,; (60)
§ <u
No4(8) 2007
• Cov(S;(At); E,(A,T)) = N;Cov(d(A/f)e,- + e,(A/f); d(A,T)e,- + e,(A^))N; = d(A/f)d(A,T)N;SN;, t^x; (61)
• Cov(S(At); I;(Ax)) = 0, j. (62)
To specify the regression model of unobserved water consumption by uses in a compact form, we first write, for each household, their time series of water consumption (56) in a vector form:
y, = X;p + s;, ; = 1,...,m, (63)
where
y , = [y , (At )]t =1.....T;, X , = [X; (At)]t=1.....T,, I; = [I; (At )]t =1.....t, .
These models display vector random disturbances
I = (ITi ® N;)(d, ® e -, + e,), (64)
| where e = [e(- (A)t )]t=.....t .
Q
| These random vectors are centered and non-intercorrelated with a variance-covariance matrix <2 that can be written as
T3
I V( S;)=(IT. ® N;)V( d; ® e; + e;)(ITl ® N;) =
| =(I ® N;)[dfd|® S + D;®Z](IT;® N;) = (65)
£ ' _
^ = d; d; ® N; SN; + D; ® N; ZN; = Q;.
<5 ■Q
$ Finally, we obtain a compact matrix expression of the regression model by stacking together oo the time series models (63)
T3
c |
q with
| y = [y; ] ;=1.....m , X = [ X; ] ;=1.....m , I = [I; ] ;=1.....m
75 and _ _ _
| E(I) = 0, V(I) = diag( Q1,...,Qm) = ^. (67)
<8 Following [Goldberger (1962)] we estimate the vector y of water consumption by uses with a predictor y satisfying the following statistical properties:
• linearity, with respect to the vector of observed water consumption y, namely ofthe form:
« У = Xp+S, (66)
£
c ..
£ y = Ay, (68)
<5 where A is a nonstochastic matrix; a
IB • unbiasedness, i.e. having prediction errors
о
<u
s = y - y = (ЛХ-X)p+ Лв-s (69)
E with zero expectation, implying the restrictions: § _
£ AX = X and E = [A -1]
(70)
92
№4(8) 2007
• efficiency, i.e. having, for any linear combination of the prediction errors z's (where z stands for a nonzero vector of arbitrary weights) the following variance:
V( z 'б) = z V EZ,
(65)
where
V(e) = [A -I]V
A -I
which does not excede that of any other linear and unbiased predictor y.
To find this best linear unbiased predictor (BLUP) for y, we rely on the prediction model
P
i
=0 and V
Q R
R' Q
where the covariances matrix R is bloc-diagonal
R - E(ss') - [E(s,s')],,j-1.....m - diag(Ru...,Rrn).
Writing the disturbance vector s, in the following compact form:
s, - [n'(d(At)e, + e, (At)]t-.....t, - (It, ® n')(d, ® e, + e,),
it is easy to compute analytically the expression for the matrix R,: R, - (ITi® n' )E((d, ® e, + e,)(d' ® e' + e'))(IT,® N,) =
- (IT,® n')(d, d' ® S + Di ® Z)(IT,® N,) - d, d' ® n'SN, + Di ® n'ZN.
For this prediction model, predictor y for y:
y - Xp MCG +E,
where s - R'Q^s, s- y - Xp MCG.
(66)
(67)
(68)
(69)
(70)
з ¿2
с
s
Q
oo
lb
.c Щ
Uj
<u «
!P .щ
■c
u «
§ <u
3
As a consequence of the bloc-diagonality of matrices R and Q, we derive the independent predictors for each vector y,:
P, - X, p MCG + s, (71)
where s - R,'Q-1s/ and s, - y, -X,pMCG.
This decomposition formula is the sum of two terms, which can be interpreted, according to [Nasse (1973)], as a calibration term and an adjustment term, respectively. The calibration term X;p MCG provides a first estimate of water consumption by uses for the household i, whereas the adjustment term corrects the calibration term, in order to reset the equality (53) between each total of observed water consumption and the sum of estimated water consumption by uses.
This intrinsic adding-up property of y,can be checked by premultiplying the predictor by the matrix (IT,®i'H) which transforms the vector y, of unobserved water consumption by uses for the household i into the vector of observed water consumption for this household, yh This leads to
( It, ) y, = X,- P MCG + Б, = y,,
(72)
E
93
No4(8) 2007
because
and
(ITl ®г'н )X, = [x'HX , (At )]f=1.....Tl = [x; ]t =1.....Ti = X,
1 ( s 2
( ITi ®г'н )R, Q-1 = (IT/ ®г'н )( d,d'® N,Sn, + О, ® N, Zn, ) — I О Г1 --1-
' ' a2 l a2 + s25,
= jiL d,d'+ О, ¥ О-1 --s2—
»2 'Jl ' a 2 + s ?S,
(73)
г/г,- I =
г,г, I =
(74)
= — dj г' + IT--—I —-— ô, d, г' - d, г' | = IT.
a2 ' a2 + s,25/la2 '
■S
S
Щ
с
T3 <u u с ■S
JS S
с
0 тз
<u <л
£ CQ
T3
с «
1
Cl
É
с
S
ос
15 £
«
с ч
тз
с «
а .g
I
о
<и
I
<5 £
Formula (71) shows that this adjustment is a distribution process of the vector of residuals s, between the components of the calibration vector X,p MCG using, as distribution coefficients the elements of matrix R'Q^1. The main weakness of the predictor y, comes from the fact that it cannot avoid negative estimates of water consumption by use.
Table 3
Distribution of water consumption by use predictions
Distribution characteristics Parameter estimates of model (5) Income 1 Income 2 Income 3
Mean valuea Essential usesb Leisure uses All uses 215 52 256 214 54 256 210 59 256
Median valuea Essential usesb Leisure usesc All uses 175 16 180 177 15 180 175 18 180
Minimum valuea Essential usesb Leisure usesc 25 -512 25 -527 25 -477
Maximum valuea Essential usesb Leisure usesc 1371 2010 1265 2075 1287 2026
a In liters per person and per day. b Computed for the full sample of 437 observations.
c Computed for the subsample of 340 observations of households using water to satisfy both essential and leisure needs.
Table 3 summarizes the results of the application of this approach to our sample of 437 water consumption observations, of which 340 observations were provided by households using water
94
No4(8) 2007
to satisfy both essential and leisure needs. Predictions of water consumption by use are carried 1
out using the three FGLS estimates of the model (50) presented in Section 6. This enables to assess ^
the sensitivity of these predictions to the assumptions underlying the household income .
imputation model. |
Not surprisingly, these model estimates lead to very narrow predictions of average and o median water consumption by use, namely
-V
PoccMMCKO-wBe^apcMM ceMUHap no экoнoмeтpмкe u CTaTMCTMKe
95
w
• for the essential uses of water, within 210 and 215 liters/person/day in the average and ¡2 within 175 and 177 liters/person/day in the median;^
• for the leisure uses of water, within 52 and 59 liters/person/day in the average and within 15 ^ and 18 liters/person/day in the median.
We conclude that the use of water for leisure needs (gardening and swimming pools) represents only a quarter of the average consumption for essential uses and a fifth of the average water consumption of the sample of 256 liters/person/day. Therefore, according to these figures, water consumption for leisure cannot explain the residential over-consumption of water in La Reunion with respect to that of continental France, estimated to 145 liters/person/day.
The comparaison of median values to mean ones sheds some light on the shape of personal water consumption distribution in La Reunion. Indeed, median water consumption is noticeably lower than mean water consumption and much closer to the average figure for continental France, showing by the same token that the personal water consumption in La Reunion is strongly asymmetric towards high figures, as shown by the maximum predicted consumption of 1265 to 1371 liters/person/day for essential uses and of 2010 to 2075 liters/person/day for leisure uses. Therefore, the average over-consumption of water by La Reunion inhabitants seems to be attributable to a quite small fringe of heavy residential consumers.
8. Conclusion
The design of policy measures intended to foster a rational use of water in the residential sector must rely on quantitative knowledge of the uses, which households are making of this natural resource. In this paper we develop an econometric methodology aiming at providing such information to environmental decision makers.The application of this methodology on a data set collected from a sample of households shows that our methodology can be an effective tool to provide economically and statistically reliable information to this end.
Regarding the terms of the mandate the study was intended to fulfill, our analysis points out the necessity to extend our database by collecting more detailed and reliable information allowing a more precise breakdown of household water consumption by uses. This is the main direction in which we are currently developing this research project.
Appendix
Imputing an Income Level to DIREN's Survey Households
The DIREN survey, conducted in 2004 on a sample of 2000 households, records household income level as an ordered qualitative variable, namely as belonging to one of the following five income intervals (in €/month): I, - [0; 750], 12 - [750; 1500], I3 - [1500; 3000], 14 - [3000; 4500] and 15 - [4500; w ]. But to quantify a proxy for the size of the population using water for leisure, we
<u w
■S ■c u W
§ <u
■S
ca
1С с
тз
u с
■S
JS
с о тз
<u «
,2 со
тз
с «
Q
№4(8) 2007
need to measure household income levels as a quantity. Using the middle point of these intervals to quantify this proxy proved to be inappropriate for mainy reasons. From a theoretical point of view, such an imputation method provides a biased estimate of the average income for any income interval in which the household income distribution is not uniform. Moreover, this method is uneffectiveforthe highest unbounded income interval. From a practical point of view, an application of this method brought about nonpositive definite estimates of the matrices S and 2, preventing the propercourse ofthe model estimation and that ofthe decomposition by use ofwater consumption observations.
Therefore, we decided to develop an imputation method based on an econometric model describing the observed qualitative information on household income according to an ordered po-lychotomous econometric model, where the unobserved household income level is specified as a latent variable.
This model is specified as follows:
y , = h & y* e Ih, (75)
where y, stands for the numerical encoding of observed income interval of household i, h for the <2 index of income interval Ih, h = 1,...,5, and y* for the unobserved (latent) income level of
<ъ household i (in€/month).
We also assume that unobserved incomes y* are distributed, within the household population, according to the lognormal random variable defined by the following regression model:
ln y* = x; p + s,, (76)
where Xi refers to the vector of indicators of income level for the household i, p to the vector of
| parameters including the constant term, and s, to the N(0, a2) random variable, identically and independently distributed within the DIREN sample of households.
i To impute an income level to household i, we use the mean square error (MSE) predictorfor y*
■§ given the available information, 3,, namely §
■1 y * = E( y*|3,), (77)
cc
"«3 where E (y,*| 3,) stands for the expected value of y* computed according to its conditional density ■!2
f f( y|3,) = f, (y). c
T3 Two distinct cases should be considered, according to whether household i has or not declaim red its income interval y,. In the first case, the available information is given by3, = x,, y,,p,a and ■f the conditional density f,(y) is that of a truncated lognormal random variable, LN(x'p,a2|Ih),
<u
§ , . y
§
,o
HJ Ф
—exp j-fN
fi (y) ^^-^-T---r, y e Ih, (78)
inyh+i-x;р^-фГinyh -x;рл
96
№4(8) 2007
where O(-) stands for the standard normal cumulative distribution function and yh, yh+1 for the lower and upper bounds of income interval Ih, respectively. Hence, the MSE predictor (84) is given by
y
J exp
y* =■
dy
Ф
inyh+]-x;pï finyh -x;p
-Ф
(79)
Notice that computing this predictor requires a numerical integration. When the household income interval y, is unknown, the available information is given by 3, = {x ;,p,a}, and the conditional density f, (y) is simply that of an untruncated lognormal random variable, LN(x,'p,a2), leading to the MSE predictor
y I * = exp ^x |p + —
(80)
To practically implement these imputation formulas, we need to estimate the unknown parameters p and a. We can provide efficient estimates of these parameters, by applying the maximum likelihood principle to the discrete outcome random variable yh This leads to maximizing the following logarithmic likelihood function of the random sample yh i - 1,...,n
lnL(P,a|y 1,...,Уп,xi,...,xn ) y<h in P{yi = h|x,},
(81)
s ¿2
с
I
Q
00 щ
.c 0Q Uj
<u «
■S ■î:
u «
§ <u
where
and
y h =
1 if y I = h 0 if yi ф h
P {y¡ = h|xf} = P {in yh < in y * < in yh+1| x ,} =
=p {in yh - x ;p<s, < in yh+1 - x ;p|x,} =
=Ф
in yh+1 - xФГin yh - x;p
-Ф
h = 1,...,5,
(82)
(83)
h+'
y
=1 h =0
where $
f in y1 - x, ' p
a
= 0and Ф
f in y6 -x,' p'
7
a
= 1.
Inserted in formulas (86) and (87), the maximum likelihood estimators for parameters p and a provide an asymptotically efficient estimator for the MSE predictor y *. We tested three specifications of this income imputation model.
• The first specification assumes that the form of the personal income distribution is not influenced by some household income indicator recorded by the DIREN survey. It is, therefore,
\
97
№4(8) 2007
a «naïve» specification used as a reference, with which the other two more informative specifications may be compared. This specification is written as
In y * =ß о +s i.
(84)
• The second specification assumes that the form of the personal income distribution is influenced by ordinal measure of household standard of living (HSL), stated by households on a four level scale, namely
HSL=1 if «The household hasn't enough money to live. It cannot make ends meet». HSL=2 if «The household has just enough to live, but by making great sacrifices». HSL=3 if «The household has enough money to live without making great sacrifices». HSL=4 if «The household makes no sacrifices, in any case nothing important».
This specification is written as
■S ?
щ
с
T3 <u u с ■S
JS
с
0 тз
<u <л
£ CQ
тз
с «
1
Cl
É
с
à ос
15
,<e
s «
с ч
тз
с «
а
I
о
In y * =ßl +2ß Л,- +Б,
where
Xu =
1 if HSL > k, 0 otherwise.
(85)
(86)
• The third specification assumes that the form of the personal income distribution is influenced by the household head's occupation (HHO), recorded according to a seven attribute classification, namely
HHO=1 if «Senior executive or profession». HHO=2 if «Farmer, trader, or middle manager». HHO=3 if «Others, student». HHO=4 if «Employee». HHO=5 if «Worker».
HHO=6 if «Foreman or independent profession». HHO=7 if «Retired».
This specification is written as
In y * =ßi kZm + Б<
where
Zki =
1 if HHO = k, 0 otherwise.
(87)
(88)
S Table 4 presents the maximum likelihood estimates of these specifications, and Table 5 shows § the imputed household income level derived from these estimates using prediction formulas (86) uj and (87). The model estimates are based on the data set provided by the volunteers who participated to the mailing survey aiming at collecting water consumption data from water bills.
98
k=2
k=2
№4(8) 2007
We shall not comment on these empirical results, but simply notice that parameter estimates are economically plausible and statistically significant. Moreover, from an empirical point of view, specification 2, using stated household standard of living as an income level indicator, performs better than specification 3 based on household head occupation.
Parameter estimates of income imputation model
Table 4
Parameter (91) Specification" (92M93) (94)-(95)
ß' 7.265 6.312 8.251
(0.072) (0.242) (0.144)
ß2 0.486 (0.258) -0.665 (0.225)
ß3 0.778 (0.116) -2.014 (0.224)
ß4 0.760 (0.191) -1.025 (0.213)
ß5 -1.177 (0.234)
ß6 -0.480 (0.245)
ß7 -1.065 (0.171)
CT 0.834 0.588 0.606
(0.064) (0.044) (0.045)
Log likelihood -233.31 -187.96 -189.36
P 2 ь Paic 0 0.181 0.162
3 ¿2
с
s
Q
c/j
Щ Щ
Ц1
(U «
■с
о «
О
8 §
(U
а
a Figures in brackets are estimated asymptotic standard errors. b Akaike information criterion (AIC) adjusted pseudo-R2.
Imputed household income (€/month)
Table 5
Income level indicator Ii Conditional imputation '2 '3 '4 Is Unconditional imputation
None Model specification (91)
498 1100 2116 3632 7041 2024
HSLa Model specification (92)-(93)
1 444 1006 1881 3449 5254 655
2 522 1061 1970 3506 5455 1065
3 599 1153 2142 3609 6044 2319
4 640 1232 2321 3717 7477 4957
99
№4(8) 2007
End of Table 5
■S ?
щ
с
тз
<и
U
с
■S
JS
с о тз
<и <л
со тз
с «
¡ Q
I
с ОС
15
,<в £
«
с Ч
тз
с «
а .g
I
о
<и ¡
<5 £
Income level Conditional imputation Unconditional
indicator Ii I2 I3 I4 I5 imputation
HHOb Model specification (94)-(95)
1 631 1217 2294 3705 7374 4603
2 593 1150 2146 3616 6149 2368
3 426 1003 1882 3454 5279 614
4 564 1110 2067 3569 5796 1651
5 548 1093 2035 3550 5684 1419
6 606 1170 2187 3640 6395 2848
l 560 1106 2058 3564 5765 1587
a Household standard of living. b Household head's occupation.
References
Binet M. E., Carlevaro F., Durand S., Paul M. Etude des habitudes de consommation d'eau domestique potable a La Reunion., Etude réalisee pour la DIREN, CERESUR, Universite de La Reunion, Saint-Denis de La Reunion. 2005.
Carlevaro F. Regionalisation d'agrégats nationaux au moyen d'indicateurs: une methodeeconometrique., in: Modélisation spatiale. Theorie et applications, ed. B. Guenier et J. H. P. Paelinck, Institut de Mathématiques Economiques et Librairie de l'Université de Dijon, Dijon. 1987.
Carlevaro F. Decomposition d'agrégats au moyen d'indicateurs. Une méthode econométrique // Revue suisse d'Economie politique et de Statistique. 1994. № 130(3).
Carlevaro F., Bertholet J.-L. Indices de consommation electrique pour les services generaux d'immeuble et evaluation de gisements de negawatts., Revue suisse d'Economie politique et de Statistique. 1998. № 134(3).
Carlevaro F., Bertholet J.-L. Aggregating Time Unbalanced Panel Observations: Methodology and Applications to Oil Consumption of Boiler Rooms., Chapter 8 in: Panel Data Econometrics: Future Directions, ed. J. Krish-nakumar and E. Ronchetti, Elsevier Science B.V. 2000.
Chow G. C, Lin A. Best Linear Unbiased Interpolation, Distribution and Extrapolation of Time Series by Related Series // The Review of Economics and Statistics. 1971. LIII(4).
GoldbergerA. S. Best Linear Unbiased Prediction in the Generalized Linear Regression Model// Journal of the American Statistical Association. 1962. № 57. P. 369-375.
Nasse P. Le systéme des comptes nationaux trimestriels// Annales de l'INSEE. 1973. №14. P. 119-161.
SchlesserC. Analyse microeconometrique par usages de la consummation résidentielle d'eau a l'île de La Reunion, unpublished master thesis in econometrics, Department of Econometrics, University of Geneva. 2006.
Vonesh E.F,ChinchilliV. H. Linear and Nonlinear Models for the Analysis of Repeated Measurements, Marcel Dekker, New York. 1997.
100