maup package

Submodules

Module contents

exception maup.AssigmentWarning[source]

Bases: UserWarning

Warning raised when some source geometries are not assigned to any target.

class maup.IndexedGeometries(geometries)[source]

Bases: object

assign(targets)[source]
covered_by(container)[source]
enumerate_intersections(targets)[source]
intersections(geometry)[source]
query(geometry)[source]
maup.adjacencies(geometries, adjacency_type='rook', output_type='geoseries', *, warn_for_overlaps=True, warn_for_islands=True)[source]

Returns adjacencies between geometries. The default return type is a GeoSeries with a MultiIndex, whose (i, j)th entry is the pairwise intersection between geometry i and geometry j. We ensure that i < j always holds, so that any adjacency is represented just once in the series. If output_type == “geodataframe”, the return type is a range-indexed GeoDataFrame with a “neighbors” column containing the pair (i,j) for the geometry consisting of the intersection between geometry i and geometry j.

maup.assign(sources, targets)[source]

Assign source geometries to targets. A source is assigned to the target that covers it, or, if no target covers the entire source, the target that covers the most of its area.

maup.close_gaps(geometries, relative_threshold=0.1, force_polygons=False)[source]

Closes gaps between geometries by assigning the hole to the polygon that shares the greatest perimeter with the hole.

If the area of the gap is greater than relative_threshold times the area of the polygon, then the gap is left alone. The default value of relative_threshold is 0.1. This is intended to preserve intentional gaps while closing the tiny gaps that can occur as artifacts of geospatial operations. Set relative_threshold=None to attempt close all gaps. Due to floating point precision issues, all gaps may not be closed.

Optional “force_polygons” argument included to apply an automatic filter non-polygonal fragments during operation.

maup.crop_to(source, target)[source]

Crops the source geometries to the target geometries.

maup.doctor(source, target=None, silent=False, accept_holes=False)[source]

Detects quality issues in a given set of source and target geometries. Quality issues include overlaps, gaps, invalid geometries, non-perfect tiling, and not entirely overlapping source and targets. If maup.doctor() returns True, votes should not be lost when prorating or assigning (beyond a few due to rounding, etc.). Passing a target to doctor is optional.

If silent is True, then print outputs are suppressed. (Default is silent = False.)

If accept_holes is True, then holes alone do not cause doctor to return a value of False. (Default is accept_holes = False.)

maup.expand_to(source, target, force_polygons=False)[source]

Expands the source geometries to the target geometries.

maup.intersections(sources, targets, output_type='geoseries', area_cutoff=None)[source]

Computes all of the nonempty intersections between two sets of geometries. By default, the returned ~geopandas.GeoSeries will have a MultiIndex, where the geometry at index (i, j) is the intersection of sources[i] and targets[j] (if it is not empty). If output_type == “geodataframe”, the return type is a range-indexed GeoDataFrame with “source” and “target” columns containing the indices i,j, respectively, for the intersection of sources[i] and targets[j] :param sources: geometries :type sources: GeoSeries or GeoDataFrame :param targets: geometries :type targets: GeoSeries or GeoDataFrame :rtype: GeoSeries :param area_cutoff: (optional) if provided, only return intersections with

area greater than area_cutoff

maup.normalize(weights, level=0)[source]

Takes a series of MultiIndexed weights and normalizes them with respect to one level (level 0 by default).

maup.prorate(relationship, data, weights, aggregate_by='sum')[source]

Prorate data from one set of geometries to another, using their ~maup.intersections or an assignment.

Parameters:
  • relationship – the intersections() of the geometries you are getting data from (sources) and the geometries you are moving the data to; or, a series assigning sources to targets

  • data (pandas.Series or pandas.DataFrame) – the data you want to move (must be indexed the same as the source geometries)

  • weights (pandas.Series) – the weights to use when prorating from sources to inters

  • aggregate_by (function) – (optional) the function to use for aggregating from inters to targets. The default is "sum".

maup.quick_repair(geometries, relative_threshold=0.1, force_polygons=False)[source]

New name for autorepair function from Maup 1.x. Uses simplistic algorithms to repair most gaps and overlaps.

The default relative_threshold is 0.1. This default is chosen to include tiny overlaps that can be safely auto-fixed while preserving major overlaps that might indicate deeper issues and should be handled on a case-by-case basis. Set relative_threshold=None to attempt to resolve all overlaps. See resolve_overlaps() and close_gaps() for more.

For a more careful repair that takes adjacencies and higher-order overlaps between geometries into account, consider using smart_repair instead.

maup.resolve_overlaps(geometries, relative_threshold=0.1, force_polygons=False)[source]

For any pair of overlapping geometries, assigns the overlapping area to the geometry that shares the greatest perimeter with the overlap. Returns the GeoSeries of geometries, which will have no overlaps.

If the ratio of the overlap’s area to either of the overlapping geometries’ areas is greater than relative_threshold, then the overlap is ignored. The default relative_threshold is 0.1. This default is chosen to include tiny overlaps that can be safely auto-fixed while preserving major overlaps that might indicate deeper issues and should be handled on a case-by-case basis. Set relative_threshold=None to attempt to resolve all overlaps. Due to floating point precision issues, all overlaps may not be resolved.

Optional “force_polygons” argument included to apply an automatic filter non-polygonal fragments during operation.

maup.smart_repair(geometries_df, snapped=True, snap_precision=9, fill_gaps=True, fill_gaps_threshold=0.1, disconnection_threshold=0.0001, nest_within_regions=None, min_rook_length=None)[source]

Repairs topology issues (overlaps, gaps, invalid polygons) in a geopandas GeoDataFrame or GeoSeries, with an emphasis on preserving intended adjacency relations between geometries as closely as possible.

Specifically, the algorithm (1) Applies shapely.make_valid to all polygon geometries. (2) If snapped = True (default), snaps all polygon vertices to a grid of size no

more than 10^(-snap_precision) times the max of width/height of the entire extent of the input. HIGHLY RECOMMENDED to avoid topological exceptions due to rounding errors. Default value for snap_precision is 9; if topological exceptions still occur, try reducing snap_precision (which must be integer- valued) to 8 or 7.

  1. Resolves all overlaps.

  2. If fill_gaps = True (default), closes all simply connected gaps with area less than fill_gaps_threshold times the largest area of all geometries adjoining the gap. Default threshold is 10%; if fill_gaps_threshold = None then all simply connected gaps will be filled.

  3. If nest_within_regions is a secondary GeoDataFrame/GeoSeries of region boundaries (e.g., counties in a state) then all of the above will be performed so that repaired geometries nest cleanly into the region boundaries; each repaired geometrywill be contained in the region with which the original geometry has the largest area of intersection. Default value is None.

  4. If min_rook_length is given a numerical value, replaces all rook adjacencies with length below this value with queen adjacencies. Note that this is an absolute value and not a relative value, so make sure that the value provided is in the correct units with respect to the input’s CRS. Default value is None.

  5. Sometimes the repair process creates tiny fragments that are disconnected from the district that they are assigned to. A final cleanup step assigns any such fragments to a neighboring geometry if their area is less than disconnection_threshold times the area of the largest connected component of their assigned geometry. Default threshold is 0.01%, and this seems to work well in practice.

maup.snap_to_grid(geometries, n=-7)[source]

Snap the geometries to a grid by rounding to the nearest 10^n. Helps to resolve floating point precision issues in shapefiles.