7. MODELS AND RESULTSOnce our dataset is assembled, we train a variety of classification algorithms including logistic regressions, support vector machines, and random forests.Here we present the results of running a variety of models to find optimal parameters and measure the value of different kinds of features.By varying the number of years in the training period,we determined that approximately three years of training data is optimal. See Figure 10. Note that the training period determines which blood samples are seen by the model but that all training examples include spatio-temporal features that draw on the entire history (blood tests and inspections)of an address or tract.By fitting the same model on an increasing set of features we can observe the value added by those features. Figure 11 shows that as we refine the spatial scale of our featuresthe model improves dramatically, with address-level features (building age, condition, and history of lead poisoning and inspections) being especially important.We can also categorize features as they were presented in Section 5.Figure 12 shows that the spatial and spatio-temporal aggregations are very important.We use the l1-penalized (inverse regularization coefficient C = .001) logistic regression for feature selection. We examine the most important features as measured by the magnitude of their (normalized) coefficients. Figures 14 and 13 show these features having negative and positive coefficients respectively, i.e. corresponding to reduced and increasedrisk for lead poisoning, respectively.See the captions for a descriptions of the features.