|A New Spatial Multiple Discrete-Continuous Modeling Approach to Land Use Change Analysis
Chandra R. Bhat*
The University of Texas at Austin, Dept of Civil, Architectural and Environmental Engineering
301 E. Dean Keeton St. Stop C1761, Austin TX 78712
Phone: 512-471-4535; Fax: 512-475-8744; Email: email@example.com
King Abdulaziz University, Jeddah 21589, Saudi Arabia
Subodh K. Dubey
The University of Texas at Austin, Dept of Civil, Architectural and Environmental Engineering
301 E. Dean Keeton St. Stop C1761, Austin TX 78712
Phone: 512-471-4535, Fax: 512-475-8744; E-mail: firstname.lastname@example.org
Mohammad Jobair Bin Alam
King Abdulaziz University, Department of Civil Engineering
P.O. Box 80204, Jeddah, 21589, Kingdom of Saudi Arabia
Phone: +966-2-6402000 (Ext.: 51339), Fax: +966-2-6952179, Email: email@example.com
Waleed H. Khushefati
King Abdulaziz University, Department of Civil Engineering
P.O. Box 80204, Jeddah, 21589, Kingdom of Saudi Arabia
Phone: +966-2-6402000 (Ext.: 51339), Fax: +966-2-6952179, Email: firstname.lastname@example.org
This paper formulates a multiple discrete-continuous probit (MDCP) land-use model within a spatially explicit economic structural framework for land-use change decisions. The spatial MDCP model is capable of predicting both the type and intensity of urban development patterns over large geographic areas, while also explicitly acknowledging geographic proximity-based spatial dependencies in these patterns. At a methodological level, the paper focuses on specifying and estimating a spatial MDCP model that allows the dependent variable to exist in multiple discrete states with an intensity associated with each discrete state. The formulation also accommodates spatial dependencies, as well as spatial heterogeneity and heteroscedasticity, in the dependent variable, and should be applicable in a wide variety of fields where social and spatial dependencies between decision agents (or observation units) lead to spillover effects in multiple discrete-continuous choices (or states). A simulation exercise is undertaken to evaluate the ability of the proposed maximum approximate composite marginal likelihood (MACML) approach to recover parameters from a cross-sectional spatial MDCP model. The results show that the MACML approach does well in recovering parameters. An empirical demonstration of the approach is undertaken using the city of Austin parcel level land use data.
Keywords: spatial econometrics, multiple discrete-continuous model, random-coefficients, land use analysis, MACML approach.
This research was partially supported by the U.S. Department of Transportation through the Data-Supported Transportation Operations and Planning (D-STOP) Tier 1 University Transportation Center and the Southwest Region University Transportation Center. The first author would also like to acknowledge support from a Humboldt Research Award from the Alexander von Humboldt Foundation, Germany. Finally, the authors are grateful to Lisa Macias for her help in formatting this document, and Marlon Boarnet and three anonymous referees who provided useful comments on an earlier version of the paper.
Land-use change models are used in a variety of fields such as planning, urban science, ecological science, climate science, geography, watershed hydrology, environmental science, political science, and transportation to examine future land-use scenarios as well as to evaluate the potential effects of policies directed toward engendering a socially or economically or ecologically desirable pattern of future land-use that minimizes negative externalities. More recently, there has been substantial attention in the scientific literature on biodiversity loss, deforestation consequences, and carbon emissions increases caused by patterns of urban and rural land-use development, and associated climate change impacts (for example, see Lewis et al., 2011). This is not surprising, since one of the most important “habitat” elements characterizing Earth’s terrestrial and aquatic ecosystems is the land use pattern (another is climate pattern, which is increasingly becoming closely related to the land use pattern). In this paper, we contribute to the vibrant and interdisciplinary literature on land-use analysis by proposing a new econometric approach to specify and estimate a model of land-use change that is capable of predicting both the type and intensity of urban development patterns over large geographic areas, while also explicitly acknowledging geographic proximity-based spatial dependencies in these patterns. As such, the motivations of this paper stem both from a methodological perspective as well as an empirical perspective. At a methodological level, the paper focuses on specifying and estimating a spatial multiple discrete-continuous probit (MDCP) model that allows the dependent variable to exist in multiple discrete states with an intensity associated with each discrete state. The formulation also accommodates spatial heterogeneity and heteroscedasticity in the dependent variable, and should be applicable in a wide variety of fields where social and spatial dependencies between decision agents (or observation units) lead to spillover effects in multiple discrete-continuous choices (or states). At an empirical level, the paper models land-use in multiple discrete states, along with the area invested in each land-use discrete state, within each spatial unit in an entire urban region. The model is a hybrid of three different strands of model types used in the land-use analysis literature.
The next section discusses the econometric context for the current paper, while the subsequent section presents the empirical context.
The Econometric Context
In the past decade, there has been increasing interest and attention on recognizing and explicitly accommodating spatial (and social) dependence among decision-makers (or other observation units) in urban and regional modeling, agricultural and natural resource economics, public economics, geography, sociology, political science, and epidemiology. The reader is referred to a special issue of Regional Science and Urban Economics entitled “Advances in spatial econometrics” (edited by Arbia and Kelejian, 2010) and another special issue of the Journal of Regional Science entitled “Introduction: Whither spatial econometrics?” (edited by Partridge et al., 2012) for a collection of recent papers on spatial dependence, and to Elhorst (2010), Anselin (2010), Ferdous and Bhat (2013) and Brady and Irwin (2011) for overviews of recent developments in the spatial econometrics field. Within the past few years, there has particularly been an explosion in studies that recognize and accommodate spatial dependency in discrete choice models. The typical way this is achieved is by applying spatial lag and spatial error-type structures developed in the context of continuous dependent variables to the linear (latent) propensity variables underlying discrete choice dependent variables (see reviews of this literature in Fleming, 2004; Franzese and Hays, 2008; LeSage and Pace, 2009; Hays et al., 2010; Brady and Irwin, 2011; and Sidharthan and Bhat, 2012). The two dominant techniques, both based on simulation methods, for the estimation of such spatial discrete models are the frequentist recursive importance sampling (RIS) estimator (which is a generalization of the more familiar Geweke-Hajivassiliou-Keane or GHK simulator; see Beron and Vijverberg, 2004) and the Bayesian Markov Chain Monte Carlo (MCMC)-based estimator (see LeSage and Pace, 2009). However, both of these methods are confronted with multi-dimensional normal integration, and are cumbersome to implement in typical empirical contexts with even moderate estimation sample sizes (see Bhat, 2011 and Franzese et al., 2010). Recently, Bhat and colleagues have suggested a maximum approximate composite marginal likelihood (MACML) inference approach for estimating spatial multinomial probit (MNP) models and a composite marginal likelihood (CML) inference approach for estimating spatial binary/ordered probit models. The MACML approach uses the CML approach, but also makes an additional analytic approximation to evaluate the multivariate normal cumulative distribution (MVNCD) function during estimation. These methods are easy to implement, require no simulation, and involve only univariate and bivariate cumulative normal distribution function evaluations, regardless of the number of alternatives, or the number of choice occasions per observation unit, or the number of observation units, or the nature of social/spatial dependence structures.
At the same time that spatial considerations are receiving widespread attention in the discrete choice literature, multiple discrete-continuous (MDC) models have also seen substantial application in different disciplines, including regional science (Kaza et al., 2012), transportation (Bhat, 2005; 2008; Bhat et al., 2013a), time use (Habib and Miller, 2008; Pinjari and Bhat, 2010), marketing and retailing (Kim et al., 2002; Allenby et al., 2010; Satomura et al., 2011), energy economics (Ahn et al., 2008), environmental economics (see von Haefen et al., 2004; Kuriyama et al., 2010), and tourism (LaMondia et al., 2008; Van Nostrand et al., 2013). In MDC situations, consumers may consume multiple alternatives at the same time, with a continuous dimension of the amount of consumption associated with each alternative. To be precise, there are multiple alternatives, with zero consumption of an alternative implying that the alternative is not consumed, and a positive consumption amount implying that the alternative is consumed. Examples of such MDC contexts include land-use type and intensity of land-use over a spatial unit, household vehicle type holdings and usage, consumer brand choice and purchase quantity, and recreational destination location choice and number of trips. While a variety of modeling approaches have been used in the literature to accommodate MDC choice contexts, the one that has dominated the recent literature is based on a utility maximization framework that assumes a non-linear (but increasing and continuously differentiable) utility structure to accommodate decreasing marginal utility (or satiation) with increasing investment in an alternative. Consumers are assumed to maximize this utility subject to a budget constraint. The optimal consumption quantities (including possibly zero investments in some alternatives) are obtained by writing the Karush-Kuhn-Tucker (KKT) first-order conditions of the utility function with respect to the investment quantities. Researchers from many disciplines have used such a KKT approach, and several additively separable and non-linear utility structures have been proposed. Of these, the general utility form proposed by Bhat (2008) subsumes other non-linear utility forms as special cases, and allows a clear interpretation of model parameters. In Bhat’s utility function form and other more restrictive utility forms, stochasticity is introduced in the baseline preference for each alternative to acknowledge the presence of unobserved (to the analyst) factors that may impact the utility of each alternative (the baseline preference is the marginal utility of each alternative at the point of zero consumption of the alternative). As in traditional discrete choice models, the most common distributions used for the kernel stochastic error term (across alternatives) are the generalized extreme value (GEV) distribution (see Bhat, 2008; Pinjari, 2011; Castro et al., 2012) and the multivariate normal distribution (see Kim et al., 2002 and Bhat et al., 2013b). The first distribution leads to a closed-form MDC generalized extreme value (or MDCGEV) model structure, while the second to a MDC probit (or MDCP) model structure. In both these structures, the analyst can further superimpose a mixing random distribution of coefficients in the baseline preference to accommodate unobserved heterogeneity across consumers (or observation units). Assuming a normal mixing error distribution, the use of a GEV kernel error term leads to a mixing of the normal distribution with a GEV kernel (leading to the mixed MDCGEV model or MMDCGEV structure), while the use of a probit kernel leads back to an MDCP model structure (because of the conjugate nature of the multivariate normal distribution in terms of addition). In this paper, we will use the MDCP structure because it allows us to use the MACML inference approach even in the presence of spatial dependence. This is the first such formulation and application of a spatial MDCP model in the econometric literature.1
The Empirical Context
There are several approaches to studying and modeling land-use change. Irwin and Geoghegan (2001) and Irwin (2010) provide a good taxonomy of these approaches. In the current paper, we derive our empirical discrete choice model based on drawing elements from three different types of models proposed and applied in the literature.
The first type of models, usually referred to as pattern-based models and developed by geographers and natural scientists, is well suited for land-use modeling over relatively large geographic extents (such as urban regions or entire states or even countries). The unit of analysis in these pattern-based models is typically an aggregated spatial unit (such as a large grid or a traffic analysis zone or a Census tract or a County or a State). One basis for these models originates from the mathematical representations of the discrete state of a cell (a very fine disaggregate unit of space) as a deterministic or probabilistic function of the states of neighboring cells in an earlier time period (see, for example, Wu and Webster, 1998; Clarke et al., 1997; Engelen and White, 2008). In these cellular automata-based models, the analyst hypothesizes the nature of the deterministic or probabilistic updating functions, simulates the states of cells over many “virtual” time periods, and aggregates up the states of the cells at the end to obtain land-use patterns. While such models may be able to “fit” the land-use patterns at the aggregated spatial unit level, the imposed updating functions are not based on actual data. Thus, there is no direct evidence linking the updating mechanism at the cell level to the spatial evolution of land-use patterns at the aggregate spatial unit level. Also, since such models do not use exogenous variables such as sociodemographic characteristics of spatial units, transportation network features, and other environmental features as the basis for explaining land-use, the policy value of these models is limited. An alternative basis of pattern-based models is to use empirical models estimated at the aggregate spatial unit level that relates variables such as distance to urban center, pedoclimatic or biophysical factors of the land in the spatial unit (such as slope, water content, aeration, and elevation), and transportation network and accessibility variables to land-use patterns (see, for example, Landis and Zhang, 1998a, 1998b; Brown et al., 2000; Parker et al., 2003; Brown and Duh, 2004; Robinson and Brown, 2009). Once estimated, these models can be used in a simulation setting to predict land-use patterns in response to different exogenously imposed policy scenarios. Unfortunately, these empirical models have not been formulated in a manner that appropriately recognizes the multiple discrete-continuous nature of land-use patterns in the aggregated spatial units. Further, these models typically do not adequately consider population characteristics of spatial units in explaining land-use patterns within that unit.
The second type of models, usually referred to as process-based models and considered by economists, is based on explicitly modeling landowners’ decisions of land-use type choice for their parcels. The most important aspect of these process-based models is that they explicitly consider the human element in land-use modeling; that is, landowner decisions (regarding the type of land-use to invest their parcel in), as influenced by a suite of economic, biophysical, accessibility, and policy variables, are acknowledged as the fundamental drivers of land-use patterns. The emphasis is on using the land-owner as the unit of analysis, rather than a piece of land. To elucidate, landowners are considered as economic agents who make forward-looking inter-temporal land use decisions based on profit-maximizing behavior regarding the conversion of a parcel of land to some other economically viable land use (for example, see Capozza and Li, 1994 and Towe et al., 2008). The stream of returns from converting a parcel from the current land-use to some other land-use is weighed against the costs entailed in the conversion from the current land-use to some other land-use. The premise then is that the land use at any time will correspond to the land use type with the highest present discounted sum of future net returns (stream of returns minus the cost of conversion). Such process-based models allow for the analysis of a rich set of policy scenarios, by enabling the modeling of individual-level behavioral changes to exogenously imposed policy scenarios. However, in addition to difficulties associated with incorporating spatial considerations at this micro-level, the data and computing demands can be very high when using process-based models, especially when the analysis is being conducted at the level of entire urban regions or states in a country (see Kaza et al., 2012). Further, individual landowners may not have carte blanche authority to develop their land any way they want to, because of the presence of land-use and zoning regulations. Besides, multiple parcels in very close proximity tend to get similarly developed, because multiple parcels can be under the purview of a single decision-making agent such as a county board or a community board (see McMillen and McDonald, 1991; Mayer and Somerville, 2000; Munroe et al., 2005).
The third type of models, referred to as spatial-based models, puts emphasis on spatial dependence among spatial units (in pattern-based models) or among landowners (in process-based models), as caused by diffusion effects, or zoning and land-use regulation effects, or social interaction effects, or observed and unobserved location-related influences (see Jones and Bullen, 1994 and Miller, 1999). Indeed, as expressed by Tobler’s (1970) first law of geography, “everything is related to everything else, but close things more so”. While some of these proximity-based spatial effects may be accommodated through the appropriate construction of spatial variables (such as accessibility to city centers and market places), there will inevitably be unobserved spatial variables (such as say neighborhood soil quality or attitudes/politics) that will create unobserved dependencies in land-use patterns of proximally located spatial units. Several different spatial formulations have been considered in land-use modeling to accommodate such spatial dependencies, though the two most dominant remain the spatial lag and spatial error formulations. Of these, the spatial lag structure is more appealing.2 The spatial lag formulation also generates spatial heteroscedasticity. In addition to the spatial lag-based and resulting heteroscedasticity effects just discussed, it is also likely that there is spatial heterogeneity (i.e., differences in relationships between the dependent variable of interest and the independent variables across decision-makers or spatial units in a study region (see, Fotheringham and Brunsdon, 1999; Bhat and Zhao, 2002; Bhat and Guo, 2004). Thus, it behooves the analyst to accommodate local variations (i.e., recognize spatial non-stationarity) in the relationship across a study region rather than settle for a single global relationship.
In the current study, we adopt an aggregate spatial unit of analysis of a quarter-of-a-mile square grid cell to study land-use over an entire urban region of Austin, Texas, with each grid having the “option” of investing (and converting) from one package of land-uses to another alternative package of land-uses. In doing so, some grids can invest entirely in a single land-use. The grid-level land-use is obtained by aggregating underlying parcel-level land-use information. However, we supplement this pattern-based modeling view with a process-based modeling view. Specifically, while the clear linkage between parcels and their human landowners in typical process-based models is admittedly not present, we consider a rich set of population demographics of the citizenry of each aggregate grid to approximate a collective decision-making process for that grid. In addition, the land-use in a grid may also be determined by community or county boards through zoning regulations. Besides, by using a grid size that is not too aggregate, we retain some of the process-based model characteristics of having a connection between the spatial unit of analysis and human decision-makers. But since there is no clear label possible for the decision-maker of a grid, we will use the terminology of the “grid” both as a spatial unit of analysis as well as the decision-maker for the spatial unit of analysis. The hybrid model just discussed is further enhanced by considering all the spatial analysis aspects considered in spatial-based models. Thus, while Kaza et al. (2012) also consider a hybrid land-use model based on Bhat’s (2008) MDCEV model, we consider the important spatial issues of dependence and heterogeneity due to unobserved as well as observed factors, as well as the resulting spatial heteroscedasticity, in our modeling approach. We also accommodate a general covariance matrix for the utilities of grid investments in the land-use categories. In accommodating all these effects, we adopt an MDCP model rather than the MDCEV model, since it is next to impossible to incorporate global spatial issues within the MDCEV structure when dealing with even a moderate number of spatial units.
2. MODELING METHODOLOGY
We derive the spatial MDCP model in the empirical context of the type and intensity of land-use over a grid, though the same formulation can be used in the many other multiple discrete-continuous contexts identified in the section entitled “The Econometric Context”. Also, in the discussion in this section, we will assume that each grid has the potential to invest in all possible land-uses. The case when some grids cannot be developed for specific land-use purposes (say, due to zoning or hazard mitigation restrictions) poses no complications whatsoever, since the only change in such a case is that the dimensionality of the integration in the likelihood contribution changes from one grid to the next. The next section presents the set-up for the aspatial MDCP model in a way that makes it convenient to extend to the spatial MDCP set-up discussed in the subsequent section.
The Aspatial MDCP Model
Let be the index for grids and let be the index for land use types. In the empirical context of this paper, the alternative land use types include (1) residential land-use (including single family, duplexes, three/four-plexes, apartments, condominiums, mobile homes, group quarters, and retirement housing), (2) commercial land-use (including commercial, office, hospitals, government services, educational services, cultural services, and parking), (3) industrial land-use (including manufacturing, warehousing, resource extraction (mining), landfills, and miscellaneous industrial), and (4) undeveloped land-use (including open and undeveloped spaces, preserves, parks, golf courses, and agricultural open spaces). The last among these alternatives serves as an “essential outside good” in that all grid cells inevitably will have at least some of their land area that remains undeveloped.3
Following Bhat (2008), consider a vector of dimension with elements , where is a specific value of land-use investment in land-use type k in grid q To determine grid q’s optimal allocation of its land area among the K alternative land-uses (that is, to obtain the optimal values of ), we consider the presence of a baseline utility > 0) for each land-use type k and grid q that represents the marginal utility at the point of zero parcel area under land use k in grid q. The idea is that the first unit of land in grid q will be assigned to the land-use type k that has the highest baseline utility. But the marginal utility accrued from investing in any land-use type k is assumed to decrease as the grid is invested more and more in the land-use type k. Thus, at some point, an additional unit of investment in land-use type k may provide less marginal value than the first unit of investment in some other land-use type k’, at which point there is investment in land-use type k’. Assume this process plays out until all the land area is allocated among the many land-uses. In that case, the marginal utility of the consumed alternatives at the optimal point will be equal, and will be greater than the baseline marginal utility of the non-consumed alternatives. This conceptualization forms the basis for the MDCP model. Formally, consider the following utility-maximizing function subject to the binding land area constraint:
where the utility function is quasi-concave, increasing and continuously differentiable.4 is a parameter associated with land-use type k. The utility function form in Equation (1) allows corner solutions (i.e., zero consumptions) for the land-use alternatives 1 through through the parameters (as long as ). On the other hand, the functional form for the final land-use alternative (the undeveloped land-use alternative) ensures that some land in each grid is in an undeveloped state (this can be easily proved given the specific utility structure used; see Bhat, 2008). The parameter for the first land-use types also contributes to the rate at which the marginal utility of investing in land-use type k decreases with increasing investment in the land-use type, which can be seen by computing the marginal utility of consumption with respect to good : . As long as >0, the marginal utility decreases with increasing, as required. Further, if increases, the rate of the marginal utility decrease itself decreases.
To complete the model structure, the baseline utility, which has to be non-negative, is parameterized as follows for each alternative:
where is a D-dimensional vector of attributes that characterizes land-use type k and grid q (including a dummy variable for each alternative except the last outside alternative, to capture intrinsic preferences for each alternative relative to the last alternative), is a grid-specific vector of coefficients (of dimension ), and captures the idiosyncratic (unobserved) characteristics that impact the baseline utility of land-use type k and grid q. We assume that the error term vector is multivariate normally distributed for a given grid : , where indicates a K-variate normal distribution with a mean vector of zeroes denoted by and a covariance matrix Further, to allow heterogeneity in responsiveness to exogenous variables across grids (i.e., spatial heterogeneity), we consider as a realization from a multivariate normal distribution with mean vector
Share with your friends: