The nine crop categories that had a sufficient number of fields to be included in this study were alfalfa, almonds and pistachios, corn, cotton, other deciduous crops, other truck crops, subtropical, tomatoes, and vines. Other crops that are grown in the area but are not being studied included cucurbits, grains, pasture, safflower, and sugar beet. Since these “other” crops are not similar in structure or phenology, we did not attempt to group them into a combined category for classification. In other agricultural regions where less frequent crops show a higher degree of similarity, adding an “other crops” group to the classification could be an appropriate way to decrease error. However, the number of fields and total area for each crop show that all of the “other” crops accounted for less than 1% of the total validated area in each of the three years, leading us to the assumption that the error due to the omission of these crops in our classification and crop area calculations will be low.The random forest classifier is an ensemble classification and regression technique that creates a forest of classification trees by randomly selecting subsets of the training data with replacement for each tree, randomly selecting a variable to split at each node, and then creating a multitude of decision trees that vote for the most popular class. Random forest was chosen for this study due to its computational efficiency and proven high performance . Five hundred trees were computed using 150,000 cases of nine crop classes with 172 spectral variables. We randomly selected fields from each year to be used for either training or validation in order to minimize inflated accuracies due to spatial autocorrelation. Seventy percent of fields were assigned as training data,growing raspberries in containers while the other 30% were set aside for independent validation. From the training fields, 50,000 pixels from each year were randomly chosen from the pixels that contained ≥50% green vegetation, and then combined, creating a training set of 150,000 pixels across the three dates.
As suggested by Millard and Richardson , pixels were randomly sampled to create a training dataset that was representative of the true class proportions within the study area. A random forest was generated from these 150,000 pixels, to be used to classify each of the three images. A 50% GV threshold was used as the cutoff for selecting pixels for training and validation in the random forest in order to maximize accuracy while also maintaining validation data for the most infrequent categories, particularly the tomato, cotton, and other truck crop classes that had the fewest training fields. To determine the threshold, 10 trial random forests were run to estimate the classification accuracy for each crop class using threshold levels ranging from 10% GV to 100% GV at 10% increments. Each run used 10,000 randomly selected pixels and populated 500 trees. We found that while the majority of the crop classes increased in accuracy as the threshold increased, the tomato class began declining in accuracy after the 50% threshold due to a significantly reduced training sample size, and there were no training or validation data available in our study area when the GV threshold was ≥80% . Therefore, in order to include tomatoes in our classification, we chose a 50% GV threshold. Additionally, mixed pixels may not be spectrally similar to the pure pixels of the classes that they contain, so using only pure pixels to train a classifier can increase the error in areas with a high proportion of mixed pixels . To this point, we aimed to choose a threshold that could capture diversity within each crop category, and felt that a high GV threshold would be restrictive in that it may exclude younger crops or certain species of crops within each class . Therefore, 50% was chosen as a compromise between attaining high accuracy within all of the classes while also fairly representing the diversity within each crop class. Training a classifier on mixed pixels for agricultural applications has been shown to have similar accuracy as training on pure pixels . For each year, a pixel-level classification was generated using the multi-year random forest. From this image classification, independent validation was conducted using the 30% of fields that were not used for training.
From the validation fields 10,000 pixels containing at least 50% or more green vegetation were randomly chosen from each image for a total of 30,000 validation pixels over the three dates. Since multi-croppings or inter plantings were excluded from the training and validation, each field was assumed to be growing only one crop type. Therefore, to improve the results of the classifier for analysis of changes in crop area, a majority filter was applied to the random forest classification result to reclassify each pixel of a field as the crop category to which the plurality of pixels in that field were classified. For example, if 10% of the pixels in a field were classified as tomato, 30% as alfalfa, and 60% as corn, all of the pixels in that field were reclassified as corn.Only fields that contained a certain threshold of green vegetation were included in the field level reclassification in order to remove fallow fields from analysis. Two different field-level GV thresholds, 25% and 50%, were chosen to assess the impact of a field-level threshold on accuracy results. Final crop planting assessments were conducted using the 50% field-level threshold for an increased accuracy of analysis. 2.2.5. Accuracy Assessments Three different classification accuracies were computed and will be discussed. The first is the pixel-level out-of-bag error calculated by the random forest. The OOB error is an estimate of error that uses subsampling and bootstrapping to estimate the error of a sample, using only trees of the random forest that do not include the data point being validated . The second reported accuracy is pixel-level independent validation using classified pixels that were not included in training the random forest and were not in the same field as any pixel in the random forest training set. The third accuracy is a field-level accuracy using the majority reclassification of pixels in each field. To assess the benefit of a 224-band spectrometer such as AVIRIS over more commonly available multispectral sensors,large plastic pots for plants a random forest classifier was run with simulated Landsat Operational Land Imager and simulated Sentinel-2B data for accuracy comparison. AVIRIS images from all three of the dates were spectrally convolved to Landsat OLI bands 1–7 and 9 and to all of the bands of Sentinel-2B.
The spatial resolution was kept constant at 18 m. These simulated images were then run in random forest using 500 trees, the same nine crop categories, and the same 150,000 training points that were used for the AVIRIS classification. OOB accuracy at the pixel level and field level were both computed for analysis.Using random forest and the majority-filtered reclassification with a 50% GV threshold, predictive crop maps were generated for each year. After random forest was run on each AVIRIS image, a multi-year field polygon layer was used to identify individual fields for majority reclassification. Therefore, classified fields were not constrained to those fields that were included in the validation layer from a specific year, but could include any field in the study area that contained 50% or more green vegetation, whether it was registered as part of the validation layer or not. From these maps, crop area was assessed to analyze changes in cropping patterns within the study area over the course of the drought. We then used these maps to evaluate the hypothesis that higher-value, perennial crops were prioritized during the drought by analyzing factors including water use, economic value, and crop lifespan against the change in the planted area. Independent validation produced an overall accuracy of 89.6% when accounting for all of the years and all of the crops, which was lower than the OOB accuracy by around 4%. This decline in accuracy when validating independently is likely because the independent validation accounted for the potential of spatial auto correlation, which is likely to inflate OOB results. The OOB assessment used the same data for training the classifier as for validation, whereas the independent validation relied on 30,000 pixels randomly selected from polygons separate from those used in training. These 30,000 pixels made up only 1.6% of the 1.87 million potential validation pixels; those with GV greater than 50% that were not used in the random forest. The tree categories, almond/pistachio, other deciduous, and subtropical had the highest consistency between years with overall accuracies changed by less than 4% between the years . Cotton and truck crops were less consistent in accuracy from year to year than the other crop categories, and this inconsistency may be due, in part, to these two classes having two of the three fewest numbers of pixels used in the random forest.
Accuracy for other truck crops in 2015 was not applicable , because the user’s accuracy was NA for that year . Accuracy was NA for tomato fields in 2013 because no tomato pixels were identified in the validation layer for that year. Despite this omission, tomatoes were included in the study, as they are a major crop group in the area with a sufficient number of training and validation polygons from 2014 to 2015. Table 2.5 details the errors associated with each crop type. The classification of alfalfa resulted in high accuracies of near or over 90%. Almond and pistachio trees showed consistently high user and producer accuracies of 94.0% and 97.2%, respectively. The random forest was more likely to erroneously classify other crop classes as almond and pistachio than it was to misclassify almond and pistachio trees. This result was likely because the pixels of almond and pistachio trees were very prevalent in the study area, leading to a large amount of randomly sampled pixels for training, and leading the classifier to favor this class over less frequently occurring classes. Other deciduous crops and subtropical crops were most likely to be misclassified by or as each other or as almond and pistachio, illustrating that tree crops were likely to be misclassified as another tree crop. Importantly, the three tree crop categories showed a tendency toward being over mapped, while the other six classes of non-tree species were all under mapped by the classifier. Of those, cotton and other truck were the most likely to be under mapped, with producer’s accuracies of 49.8% and 56.7%, respectively. The results showed that the classes that were more prevalent and had more validation data were more likely to have higher accuracy than the infrequent classes. 2.3.1.3. Field-Level Validation after Majority Filter The final majority-filtered reclassification of pixels to create fields that contained only one crop type had the highest accuracy at 94.4%. The overall accuracy is computed as the percentage of total fields that were correctly classified using random forest and a majority filter. The higher field-level accuracy obtained by the reclassification confirmed assumptions that using a majority filter would smooth out the stray pixels that may lie between rows of crops or that capture weeds or other plant matter growing near the crop, which may lead to classification confusion. When assessing the majority-filtered reclassification results by year , seven of the nine classes had accuracies over 80% in all three years, with other truck crops and cotton being the exceptions. An important finding is that the field-level classification improved the pixel-level classification the majority of the time when a 50% GV threshold was used at the field level. When assessing pixel and field-level accuracies over all three years , the only accuracy that decreased from the pixel to the field level was the user’s accuracy of almond and pistachio orchards. As the most common crop category, assessing the accuracy at the field level increased the over mapping of this class. However, all of the other user and producer’s accuracies increased. When looking at the accuracies separated by year and by class , 22 of the 27 classes improved in accuracy from the pixel to the field level. The field-level accuracies in tables 3 and 6 were computed using a 50% GV threshold, meaning that only fields that had at least 50% green vegetation or more were included in the accuracy assessment.