Data and Design of the Experiments
The activity diary data used in this study were collected in the municipalities of Hendrik-Ido-Ambacht and Zwijndrecht in the Netherlands (South Rotterdam region) to develop the Albatross model system (Arentze & Timmermans, 2004). The data involve a full activity diary, implying that both in-home and out-of-home activities were reported. The sample covered all seven days of the week, but individual respondents were requested to complete the diaries for two designated consecutive days. Respondents were asked, for each successive activity, to provide information about the nature of the activity, the day, start and end time, the location where the activity took place, the transport mode, the travel time, accompanying individuals and whether the activity was planned or not. A pre-coded scheme was used for activity reporting. After cleaning, a data set of a random sample of 1649 respondents was used in the experiments.
There are some general variables that are used for each choice facet of the Albatross model (i.e. each oval box). These include (among others) household and person characteristics that might be relevant for the segmentation of the sample. Each dimension also has its own extensive list of more specific variables, which are not described here in detail.
Design of the Experiments
The aim of this study is to examine both the predictive capabilities and the potential advantages of the BNT classifier. To this end, the predictive performance of this integration technique is compared with a decision tree learning algorithm (CHAID) and with original Bayesian network learning.
For the CHAID decision tree approach, experiments were conducted for the full set of decision agents of the Albatross system. First, decision trees were therefore extracted from activity-travel diaries. Hereafter, these decision trees were converted into decision tables. Next, the decision tables were successively executed to predict the activity-travel patterns for the randomly selected sample of 1649 respondents.
For the Bayesian network approach, a Bayesian network was constructed for every decision agent using a structural learning algorithm, developed by Cheng et al. (1997). This implies that the structure of the network was not imposed on the basis of a-priori domain knowledge, but was learned from the data. The structural learning algorithm was also enhanced by adding a pruning stage. This pruning stage aims at reducing the size of the network without resulting in a significant loss of relevant information or loss of accuracy on the unseen test data. Therefore, the aim is to find a favourable trade-off between the size of the network and the predicted accuracy, since significant 'overpruning' will obviously damage the final accuracy results. This means that nodes, which are not valuable for decision-making, are pruned away. In order to decide which nodes in the network are suitable for pruning, the reduction in entropy between two nodes was calculated using Eq. (1), shown in Section 5. Obviously, a huge entropy reduction indicates a potentially important and useful node in the network. An entropy reduction of less than 0.05 bits was used as a threshold to prune the network. Once the pruned network is constructed for every decision agent, the model can be used for prediction. To this end, probability distributions of all the variables in the networks have to be computed. A parameter learning algorithm developed by Lauritzen (1995) was used to calculate these probability distributions. The last step is to transform the predictive model to the decision table formalism.
For building the BNT classifier, a decision tree is not derived directly from the original data, but from the Bayesian networks that are built in the previous step. The procedure for doing this was explained in Section 5. Once again, the decision tables are then sequentially executed to predict activity-travel patterns.
In the next section, we report the results of detailed quantitative analyses that were conducted to evaluate the BNT classifier for every decision agent in the Albatross model. The results of the three alternative approaches are validated in terms of accuracy percentages. The techniques are compared at both the activity pattern level and the trip level.