Why are polygons important

Calculated estimates for user-created areas

ArcGIS Business Analyst Web App leverages the GeoEnrichment service, which uses the concept of study area to define the location of the point or area that you want to enrich with additional information. When one or more points are entered as a study area, the service creates a circular buffer of one mile around the point or points to collect and append enrichment data. Optionally, you can change the size of the ring buffer or create travel time catchment areas around a point.

The GeoEnrichment service uses a fine-grained geographic query methodology to aggregate data for rings and other polygons. A geographic query methodology determines how data is collected and summarized or aggregated for input features. For standard geographic units, e.g. B. Federal states, cantons, districts or postcodes, the link between a specified area and its attribute data is a simple one-to-one relationship. If z. For example, if an input layer for a sales area contains a selection of postal codes, the data query is simply a matter of collecting the data from those areas.

Data mapping method

The data mapping method assigns block group data to user-defined areas. This examines the population in the block group and determines the size of the proportion of the population in a block group that overlaps a user-defined area. This method is used in the United States and similarly in Canada. The population data available for census blocks - a more detailed geographic level than block groups - is used to determine the distribution of the population within a block group. If the geographic center of a block is in the custom area, the block group data is weighted based on the total population of the block. The geographical distribution of the population at the counting block level determines the proportion of the counting block group data that is assigned to the user-defined areas, as shown in the example.

Note:

Depending on the data, households, residential units or companies are used as weighting at block group level. The use of block focal points is advantageous because it takes into account the possibility that the population is not evenly geographically distributed throughout the block group.

How the data breakdown works

ArcGIS Business Analyst Web App uses a data breakdown algorithm to distribute demographic, business, economic, and landscape variables on input polygon features. The algorithm analyzes the individual polygons to be enriched relative to a point dataset and a detailed dataset of reporting unit polygons with attributes for the selected variables. Depending on how a polygon to be enriched overlays these datasets, the algorithm determines the appropriate amount to be assigned to each variable.

Depending on the country in which the enrichment polygon is located, the granular point dataset represents one of the following options:

  • Census Block Points: USA and Canada only. These points are initially generated as focal points of the most detailed survey areas for the calculation in these countries: census blocks in the USA and distribution areas in Canada. In some cases, Esri has moved these points to residential areas from primarily industrial or other non-residential areas. Each point contains attributes for the number of people and households that are in the corresponding calculation area.
  • Settlement points: Esri generates settlement points for most countries based on a probabilistic settlement model based on Landsat8 imagery and street intersections. Road crossings are particularly helpful where buildings cannot be seen due to the dense forest cover. Settlement points are initially created as a demymmetrical grid surface and thus refer to points where people cannot live or where people have been abandoned. This grid surface is generated with a resolution of 75 meters, which is roughly the size of a building block. The model assigns a colonization probability score to each cell or point, which indicates the likelihood of people living there.
  • Address-based settlement points: Switzerland and Netherlands only. Some countries directly publish the dots of the residential addresses of their citizens. Esri aggregates the number of these address points in a grid with a resolution of 75 meters and converts this into a point dataset that corresponds to settlement points.
  • Settlement points for building footprints: AIS group data for Spain only. The number of building footprint centers of gravity of residential buildings is summed up on a grid with a resolution of 75 meters in order to generate a dataset of settlement points.

The GeoEnrichment service uses geographies at the highest level of detail with the latest census data or reliable estimates available for commercial use by each country. For example, for 2018, the South Africa dataset has 85,483 features, the Hungary dataset has 3,177 features, and the Japan dataset has 217,201 features.

Most countries update the data every two years, but some countries update it annually because the data is readily available. Esri distributes the updates on a quarterly basis throughout the year. Country data represent the latest estimates available. In general, the data that Esri publishes represents demographic and economic conditions nine months prior to the publication date.

Breakdown method

The following illustration shows the relationship between the purple polygon to be enriched and the dark blue settlement points, as well as the detailed statistical polygons with gray outlines on which the enrichment is based. How to enrich the purple ring with the general population:

  1. Select the statistical polygons that are entirely within the ring polygon. These polygons are displayed in white. Calculate the sum of the total population variable for these polygons.
  2. Select the statistical polygons that partially intersect the ring polygon. They are shown in light green. For each of these polygons, do the following:
    1. Select all of the dark blue settlement points that are in it. Using the total population variable from the statistical polygon and the sum of the probability scores for settlement, determine the ratio of people per settlement score unit.
    2. Calculate the sum of the probability of colonization only for the points inside the purple ring and derive the number of people represented by these points.

      The dark blue settlement points represent two types of information. First, there is a grid of points, which are arranged at regular intervals of 75 meters and which is generated as described above. As some reporting units are small enough to fall between the 75-meter grid, the second step is to add the centers of gravity of these units so that these areas are not left out.

Variations in the breakdown method

The above description applies to most countries. For the USA and Canada, however, a simplified process applies, as the points already have an attribute with the population living there. Accordingly, only the sum of the population attribute for the points within the enrichment polygon is needed to determine the total population. The values ​​of other variables are determined based on previously calculated means or proportions of the population or summaries.

The information above describes the standard method of breakdown, which is called BlockApportionment in the ArcGIS REST API for the GeoEnrichment service. If the service detects a very large polygon, a faster and less computationally intensive method is used. It is called the CentroidsInPolygon. The name of the method used is derived from the metadata for the results of an enrichment process.

With the "Centers in Polygon" method, coarser geographies and their focal points are used as the basis for the breakdown. For example, in the USA, instead of the block group polygons of the US Census Bureau, the boundaries of the census districts of the US Census Bureau and instead of the block points the district focal points are used as the basis for the breakdown. As the size of the polygons increases, progressively coarser polygon geographies and their centers of gravity are used by the "Centers of gravity in polygon" method. These thresholds are based on the buffer diameters:

  • The US uses the following diameter and polygon / point datasets:
    • Block groups and block points at 0 to 504 miles.
    • Census districts and block points based on generalization level 2 at 505 to 786 miles.
    • Census districts and block points based on generalization level 3 at 787 to 866 miles.
    • Census districts and block points based on generalization level 4 at 867 to 954 miles.
    • Counties and block points are based on generalization level 5 for even larger areas.
  • A global list of all breakdown settings can be found in this table.

It is therefore important to separate large polygons from smaller polygons when managing data as inputs for the "Enrich Layer" tools.

The differences in the results with the various methods are usually very small, but the coarser geographies may not produce optimal results. The GeoEnrichment Service has the detailedAggregationMethod property in which the standard behavior described above can be overridden. However, if detailedAggregationMethod is specified, only one large polygon can be enriched at a time.

Note:

Extremely large polygons that exceed the size described above for the "Centers of gravity in polygon" method are not processed. A warning is issued for them.

In contrast, for tiny polygons (i.e. polygons that are smaller than a block of buildings in an urban area, or smaller than a square kilometer or square mile in a rural area), results may not be generated because there are no intersections or no residential areas. or counting block points are included.

Additional considerations

One of the most common questions about GeoEnrichment is: How reliable are the results? Ultimately, it depends on the data available for each country. This varies even within most countries, depending on whether the area to be enriched is densely or less densely populated.

Each country has two reliability scores. The potential range is between 1.0 (best rating) and 5.0 (worst rating), but no country is rated at either of these extremes. The most important score that affects the reliability of the data breakdown is the ratio between the area of ​​the population polygon and the estimated number of people who live there. If it is a large polygon and only a few people, it reduces the likelihood that settlement points will overlap in inhabited locations. In most countries there is a mix of circumstances. A good example is Saudi Arabia. The cities there are represented by many polygons for each small area, while the vast desert areas are only represented by a few polygons.

A second reliability score relates to overall reliability. This value includes the ratio between the polygon area and the population and an assessment of the reliability of the census data of the respective country as well as the complexity of the footprint of the settlement. The reliability of the census data takes into account the age of the last official census, the collection method and the completeness of this census and other estimates and surveys used to derive the current estimate. The complexity of the footprint is important because the creation of the settlement points is based on a grid model that is sensitive to lower values ​​for the settlement probability at the edges of the settlement. The reason for this is resampling.


Feedback on this topic?