Plotting maps
Here are examples to plot geographic data using plotly and matplotlib. Matplotlib is probably the choice if you need a rendered image. Plotly creates interactive plots and has a more modern interface.
To handle the different geo-types returned by elasticsearch we first look at conversion utilities. Skip it if you just want to see pretty images.
Coordinate conversion
A metric aggregation like
geo_centroid
already returns
latitude and
longitude
values.
Bucket-aggregations like
geotile_grid
and
geohash_grid
return keys that
can be mapped to geo-coordinates.
map-tiles
The geotile_grid
aggregation
uses map-tiles
(wikipedia) as bucket
keys. They represent zoom
/x
/y
as seen below:
from elastipy import Search
s = Search(index="elastipy-example-car-accidents")
agg = s.agg_geotile_grid("tiles", field="location", precision=6)
agg.execute().to_dict()
{'6/33/21': 131436,
'6/34/21': 36158,
'6/33/20': 35218,
'6/33/22': 32519,
'6/34/22': 19237,
'6/34/20': 13802}
To convert the keys to geo-coordinates we can use a helper function in elastipy:
from elastipy import geotile_to_lat_lon
{
geotile_to_lat_lon(key): value
for key, value in agg.items()
}
{(50.736455137010644, 8.4375): 131436,
(50.736455137010644, 14.0625): 36158,
(54.162433968067795, 8.4375): 35218,
(47.04018214480665, 8.4375): 32519,
(47.04018214480665, 14.0625): 19237,
(54.162433968067795, 14.0625): 13802}
Becaue the tiles are actually areas the latitude and longitude just
represent a single point within the area. The point can be defined as
the offset
parameter and defaults to (.5, .5)
which is the
center of the tile.
Here we print the top-left and bottom-right coordinates for each map-tile:
for key, value in agg.items():
tl = geotile_to_lat_lon(key, offset=(0, 1))
bl = geotile_to_lat_lon(key, offset=(1, 0))
print(f"{tl} - {bl}: {value}")
(48.92249926375824, 5.625) - (52.48278022207821, 11.25): 131436
(48.92249926375824, 11.25) - (52.48278022207821, 16.875): 36158
(52.48278022207821, 5.625) - (55.77657301866769, 11.25): 35218
(45.08903556483103, 5.625) - (48.92249926375824, 11.25): 32519
(45.08903556483103, 11.25) - (48.92249926375824, 16.875): 19237
(52.48278022207821, 11.25) - (55.77657301866769, 16.875): 13802
geohash
The geohash_grid
aggregation
returns geohash
(wikipedia) bucket keys.
from elastipy import Search
s = Search(index="elastipy-example-car-accidents")
agg = s.agg_geohash_grid("tiles", field="location", precision=2)
agg.execute().to_dict()
{'u1': 113676, 'u0': 85497, 'u3': 41653, 'u2': 27544}
The pygeohash package can be used to translate them:
import pygeohash
{
pygeohash.decode(key): value
for key, value in agg.items()
}
{(53.0, 6.0): 113676,
(48.0, 6.0): 85497,
(53.0, 17.0): 41653,
(48.0, 17.0): 27544}
For convenience the pygeohash function is wrapped by
elastipy.geohash_to_lat_lon
.
plotly backend
The plotly python library enables creating browser-based plots in python. It supports a range of map plots. In particular the mapbox based plots are interesting because they use WebGL and render quite fast even for a large number of items.
geo-centroid
Let’s plot an overview of the german car accidents (included in elastipy examples).
s = Search(index="elastipy-example-car-accidents")
agg = s.agg_terms("city", field="city", size=10000)
agg = agg.metric_geo_centroid("location", field="location")
df = agg.execute().df()
print(f"{df.shape[0]} cities")
df.head()
8451 cities
city | city.doc_count | location.lat | location.lon | |
---|---|---|---|---|
0 | München | 4979 | 48.145224 | 11.558930 |
1 | Köln | 4562 | 50.940086 | 6.961585 |
2 | Frankfurt am Main | 2639 | 50.117909 | 8.653241 |
3 | Bremen | 2459 | 53.091255 | 8.800806 |
4 | Düsseldorf | 2390 | 51.224550 | 6.799716 |
The geo_centroid
aggregation
above returns the center coordinate of all accidents within a city.
(It’s not necessarily the center of the city but the
centroid of all accidents
that are assigned to the city.)
Below we pass the pandas DataFrame to the plotly express function and tell it the names of the latitude and longitude columns. The number of accidents per city is also used for the color and size of the points.
import plotly.express as px
fig = px.scatter_mapbox(
df,
lat="location.lat", lon="location.lon",
color="city.doc_count", opacity=.5, size="city.doc_count",
zoom=4.8,
mapbox_style="carto-positron",
hover_data=["city"],
labels={"city.doc_count": "number of accidents"},
)
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
The most amazing thing we should notice is that the federal state Mecklenburg-Vorpommern does not have any accidents! 🍀
density heatmap
The plotly express tools are just lovely ♥ ❤️ ♥ ❤️
fig = px.density_mapbox(
df,
lat="location.lat", lon="location.lon",
z="city.doc_count",
zoom=4.8,
mapbox_style="carto-positron",
hover_data=["city"],
labels={"city.doc_count": "number of accidents"},
)
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
geohash_grid aggregation
Below is the same data-set but aggregated with the
geohash_grid
aggregation.
import plotly.graph_objects as go
import plotly.express as px
from elastipy import geotile_to_lat_lon
s = Search(index="elastipy-example-car-accidents")
agg = s.agg_geotile_grid("location", field="location", precision=10, size=1000)
df = agg.execute().df()
# put lat and lon columns into dataframe
df[["lat", "lon"]] = list(df["location"].map(geotile_to_lat_lon))
print(df.head())
fig = px.scatter_mapbox(
df,
lat="lat", lon="lon",
color="location.doc_count", opacity=.5, size="location.doc_count",
mapbox_style="carto-positron",
zoom=5,
labels={"location.doc_count": "number of accidents"},
)
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
location location.doc_count lat lon
0 10/550/335 6468 52.589701 13.535156
1 10/540/330 5817 53.644638 10.019531
2 10/544/355 5021 48.107431 11.425781
3 10/549/335 4314 52.589701 13.183594
4 10/531/340 4242 51.508742 6.855469
geotile_grid aggregation
Let’s see if we can do something with the
geotile_grid aggregation
. The
lengthy function in the middle builds a list of lines connecting each
corner in each returned map-tile.
fillcolor
in mapbox can only be one fixed color
and does not support color scaling (like the
marker).import plotly.graph_objects as go
import plotly.colors
from elastipy import Search, geotile_to_lat_lon
s = Search(index="elastipy-example-car-accidents")
agg = s.agg_geotile_grid(
"location",
field="location", precision=8, size=1000,
)
agg.execute()
lat, lon = [], []
for key, value in agg.items():
tl = geotile_to_lat_lon(key, offset=(0, 1))
tr = geotile_to_lat_lon(key, offset=(1, 1))
bl = geotile_to_lat_lon(key, offset=(0, 0))
br = geotile_to_lat_lon(key, offset=(1, 0))
lat += [tl[0], tr[0], br[0], bl[0], tl[0], None]
lon += [tl[1], tr[1], br[1], bl[1], tl[1], None]
fig = go.Figure(go.Scattermapbox(
lat=lat, lon=lon,
fill="toself",
fillcolor="rgba(0,0,0,.1)",
))
fig.update_layout(
mapbox=dict(
style="carto-positron",
zoom=5,
center=dict(lat=51., lon=10.3),
),
margin={"r": 0, "t": 0, "l": 0, "b": 0},
)
matplotlib backend
Matplotlib does not come with specific geo functionality out-of-the-box. Instead a couple of additional libraries must be used.
geotile_grid aggregation
Here is an example using
geopandas. It extends the
pandas.DataFrame
with the
geopandas.GeoDataFrame
class.
The GeoDataFrame will pick the "geometry"
column from a DataFrame by
default. The values must be
shapely geometries.
from shapely.geometry import Point
import geopandas
import matplotlib.pyplot as plt
import matplotlib.colors
from elastipy import Search, geotile_to_lat_lon
s = Search(index="elastipy-example-car-accidents")
agg = s.agg_geotile_grid("location", field="location", precision=10)
df = agg.execute().df()
# take hash from location column,
# convert to latitude and longitude
# and create a shapely.Point
# (which expects longitude, latitude)
df["geometry"] = df.pop("location").map(
lambda v: Point(geotile_to_lat_lon(v)[::-1])
)
# have a color for each point with matplotlib tools
cmap = plt.cm.magma
norm = matplotlib.colors.Normalize(
df["location.doc_count"].min(), df["location.doc_count"].max()
)
df["color"] = df["location.doc_count"].map(lambda v: cmap(norm(v))[:3] + (.5,))
gdf = geopandas.GeoDataFrame(df)
fig, ax = plt.subplots(figsize=(10, 10))
# plot a shapefile from https://biogeo.ucdavis.edu/data/gadm3.6
geopandas.read_file("cache/gadm36_DEU_1.shp").plot(ax=ax, color="#e0e0e0")
gdf.plot(
c=gdf["color"], markersize=gdf["location.doc_count"] / 3,
aspect=1.3,
ax=ax,
)