Extra material: Shortest path problem in real life#

import subprocess
import sys


class ColabInstaller:
    def __init__(self):
        reqs = subprocess.check_output([sys.executable, "-m", "pip", "freeze"])
        self.installed_packages = [r.decode().split("==")[0] for r in reqs.split()]

    def on_colab(self):
        return "google.colab" in sys.modules

    def install(self, package):
        if self.on_colab():
            subprocess.check_call(
                [sys.executable, "-m", "pip", "install", "-q", package]
            )
            self.installed_packages.append(package)

    def upgrade(self, package):
        if self.on_colab():
            subprocess.check_call(
                [sys.executable, "-m", "pip", "install", "-q", "--upgrade", package]
            )


if "google.colab" in sys.modules:
    colab = ColabInstaller()
    colab.upgrade("numpy")
    colab.upgrade("matplotlib")
    colab.install("geopandas")
    colab.install("geopy")
    colab.install("osmnx")
    colab.install("osmnet")
    colab.install("pandana")
    colab.upgrade("geopy")

Introduction#

Google brought with maps the world to our screens, including accurate geocoding and routing for several modalities. For the most, the usage of maps is interactive. As data and analytics professionals we often need a programmatically support for the services that maps offer us, preferably free. It also offers a plethora of development support, but unfortunately most is paid. That is even more so for maps.

Some background information and history#

Geoff Boeing is a true leader in demystifying urban data analytics, with a strong emphasis on street networks. His peer reviewed publications are open and accompanied by usable demonstrations using his own OSMnx package. Professor Peter Sanders, see also his Wikipedia page, has moved his interests to other areas but his route planning project shaped the world of truly scalable road routing algorithms. From his alumni I distinguish two persons:

Finally, I mention Fletcher Foti who gave us pandana.

Geocoding and map visualization#

The world is mapped with the geographic coordinate system but we have difficulties remembering latitudes and longitudes. We learn and remember the world better from addresses.

import osmnx as ox
import osmnet as osm
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import pandas as pd
import geopy
import folium
import networkx as nx
from IPython.display import display
def FreeLocator():
    return geopy.Photon(user_agent="myGeocoder")


def locate_geopy(description):
    location = FreeLocator().geocode(description)
    if location is not None:
        return location.latitude, location.longitude
    return None, None


pd.options.display.float_format = "{:.6f}".format

data = {
    "address": [
        "Centraal Station",
        "VU Hoofdgebouw",
        "Amsterdam Business School",
        "University of Amsterdam",
        "Johan Cruijff ArenA",
        "Ziggo Dome",
    ],
    "color": ["blue", "black", "orange", "green", "red", "purple"],
}

df = pd.DataFrame(data)
df["city"] = "Amsterdam"
df["country"] = "NL"

locations = [
    locate_geopy(",".join(row[["address", "city", "country"]]))
    for _, row in df.iterrows()
]
df["lat"] = [loc[0] for loc in locations]
df["lon"] = [loc[1] for loc in locations]
display(df)

Amsterdam = FreeLocator().geocode("Amsterdam, NL")

Map = folium.Map(location=(Amsterdam.latitude, Amsterdam.longitude), zoom_start=13)

for _, row in df.iterrows():
    folium.Marker(
        (row.lat, row.lon), icon=folium.Icon(color=row.color), tooltip=row.address
    ).add_to(Map)
Map
address color city country lat lon
0 Centraal Station blue Amsterdam NL 52.378901 4.900581
1 VU Hoofdgebouw black Amsterdam NL 52.334180 4.866901
2 Amsterdam Business School orange Amsterdam NL 52.365107 4.911718
3 University of Amsterdam green Amsterdam NL 52.368133 4.889804
4 Johan Cruijff ArenA red Amsterdam NL 52.312896 4.942167
5 Ziggo Dome purple Amsterdam NL 52.313793 4.938057
Make this Notebook Trusted to load map: File -> Trust Notebook
ox.settings.log_console = True
ox.settings.use_cache = True
%%time 
G_walk = ox.graph_from_place('Amsterdam, NL', network_type='walk')
CPU times: user 19.9 s, sys: 389 ms, total: 20.2 s
Wall time: 20.2 s
print(G_walk.number_of_nodes(), G_walk.number_of_edges())
43178 121686
node_positions = {
    node: (data["x"], data["y"]) for node, data in G_walk.nodes(data=True)
}
nx.draw_networkx_nodes(G_walk, pos=node_positions, node_size=1)
<matplotlib.collections.PathCollection at 0x2d4faae00>
../../_images/cd89dd2caf4098839eb73a758b2b601285c1a1c7a26c6be24aea269dd4305cbf.png
df["osmnx"] = ox.distance.nearest_nodes(G_walk, df.lon, df.lat)
df
address color city country lat lon osmnx
0 Centraal Station blue Amsterdam NL 52.378901 4.900581 5629072001
1 VU Hoofdgebouw black Amsterdam NL 52.334180 4.866901 294648098
2 Amsterdam Business School orange Amsterdam NL 52.365107 4.911718 46356661
3 University of Amsterdam green Amsterdam NL 52.368133 4.889804 8121695895
4 Johan Cruijff ArenA red Amsterdam NL 52.312896 4.942167 10967577086
5 Ziggo Dome purple Amsterdam NL 52.313793 4.938057 10967577086
%time route = nx.shortest_path(G_walk,df.iloc[0].osmnx,df.iloc[1].osmnx,weight='length')
print(route)
CPU times: user 68.4 ms, sys: 2.87 ms, total: 71.3 ms
Wall time: 70 ms
[5629072001, 5629072000, 5629071975, 4239313191, 4239313081, 4239313075, 4239312638, 4239312162, 6358902057, 3175792586, 3175727845, 10633464529, 10633464528, 9959600348, 9959600349, 9959600284, 9971788077, 9971788076, 9817686663, 9817686662, 10208267268, 6841663011, 6841663007, 6841654478, 8608994367, 8072948169, 6841636869, 6841636868, 6841645041, 6841680457, 6841680467, 6841680468, 6841680481, 8036446640, 8036446607, 8036446617, 8036446621, 8036446623, 8036446626, 9140788997, 8036446635, 8036446637, 3412535141, 3416675281, 3416675272, 3416675263, 3416675257, 6949114049, 6949114048, 10823781800, 6949114037, 6949114047, 6949114046, 5792613757, 46332984, 7191491361, 8852212652, 46331976, 46330373, 1976839698, 7191576329, 7191576325, 5805311952, 10073385072, 5805311930, 5805311950, 46321108, 46318613, 5805314256, 5805314261, 6653510565, 6653544388, 6653544407, 6653544420, 1976839614, 46308487, 46307158, 46305753, 46304949, 285675805, 7223761981, 7223772071, 7223772074, 7223772084, 7223772082, 7223772080, 7223772078, 1976839592, 1976839591, 1976839588, 7223803102, 7223803096, 7325122869, 7325122856, 7325122853, 7325122864, 46289809, 1976839543, 46287308, 1976839531, 1976839523, 46281244, 6653580391, 6653580392, 6653580398, 6653580397, 7476378298, 6653564474, 6653563585, 6653550884, 6653550875, 6653550880, 6653550883, 46255148, 46255078, 300941745, 46254897, 4870233192, 10975468937, 10975468947, 10975468934, 4485558056, 364677068, 3898433610, 4485558057, 4485558058, 3899501151, 1927808972, 1768493608, 1768493609, 1927808969, 1768493613, 1768498742, 353179998, 1768493612, 1768498738, 353179973, 295730948, 5160332347, 5160332346, 5160332350, 295730934, 437292535, 9038018126, 46232076, 7503879501, 46232112, 46223096, 880766217, 4485430209, 4485430210, 10886209159, 10886209156, 5918158593, 4311353914, 3406638501, 3406638499, 4457838803, 46215450, 10886156712, 10886156714, 10886156713, 294648098]
route_map = ox.plot_route_folium(G_walk, route)
display(route_map)
/var/folders/zb/k0pc77yj6c3gv6gr6jj41g1r0000gn/T/ipykernel_90780/3687851812.py:1: UserWarning: The `folium` module has been deprecated and will be removed in a future release. You can generate and explore interactive web maps of graph nodes, edges, and/or routes automatically using GeoPandas.GeoDataFrame.explore instead, for example like: `ox.graph_to_gdfs(G, nodes=False).explore()`. See the OSMnx examples gallery for complete details and demonstrations.
  route_map = ox.plot_route_folium(G_walk, route)
Make this Notebook Trusted to load map: File -> Trust Notebook

Dijkstra on steroids for road networks#

nodes = pd.DataFrame.from_dict(dict(G_walk.nodes(data=True)), orient="index")
edges = nx.to_pandas_edgelist(G_walk)
%%time
import pandana
network = pandana.Network(nodes['x'], nodes['y'], edges['source'], edges['target'], edges[['length']],twoway=True)
CPU times: user 2.75 s, sys: 44.6 ms, total: 2.8 s
Wall time: 2.98 s
network.nodes_df.head()
x y
6316199 4.888396 52.370173
25596455 4.923563 52.364840
25596477 4.906097 52.367000
25645989 4.925075 52.365727
25658579 4.930438 52.364544
network.edges_df.head()
from to length
0 6316199 46379627 42.497000
1 6316199 46389218 225.577000
2 6316199 391355271 62.907000
3 25596455 8383889398 1.791000
4 25596455 46356773 41.700000
df["pandana"] = network.get_node_ids(df.lon, df.lat).values
df
address color city country lat lon osmnx pandana
0 Centraal Station blue Amsterdam NL 52.378901 4.900581 5629072001 5629071974
1 Amsterdam Business School black Amsterdam NL 52.365107 4.911718 46356661 46356661
2 Artis green Amsterdam NL -30.401207 -56.481521 1768194163 1768194163
3 Arena red Amsterdam NL 52.315990 4.942931 4622542635 2928936658
4 Ziggo Dome purple Amsterdam NL 52.313629 4.938207 1925143759 4622542635
%time path_pandana = network.shortest_path(df.iloc[2].pandana, df.iloc[3].pandana)
CPU times: user 10.6 ms, sys: 5.56 ms, total: 16.1 ms
Wall time: 15.3 ms
%time path_nx = nx.shortest_path(G_walk,df.iloc[2].osmnx,df.iloc[3].osmnx,weight='length')
CPU times: user 268 ms, sys: 17.9 ms, total: 286 ms
Wall time: 291 ms
A = set(path_pandana)
B = set(path_nx)
(A | B) - (A & B)
{46232158,
 46237572,
 46240581,
 46241169,
 46241205,
 46242937,
 46242940,
 46245940,
 46247806,
 46255649,
 46260704,
 46264996,
 46265577,
 46265994,
 46266419,
 46267226,
 46267229,
 46274385,
 46274670,
 46275359,
 46275510,
 46278192,
 46278488,
 46280478,
 46280842,
 46281091,
 46281209,
 46281430,
 46281524,
 46281920,
 46283239,
 46283971,
 46284393,
 46284949,
 46285251,
 46285451,
 46288061,
 46291556,
 254477712,
 262890789,
 497304704,
 497304720,
 878805275,
 878806012,
 878806213,
 878808875,
 1101933078,
 1259406528,
 1545015322,
 1548111106,
 1688255243,
 1725810217,
 1725960722,
 1925156931,
 1925156934,
 2801224787,
 2928936658,
 3132312675,
 4382189126,
 4382189139,
 4538188789,
 4622542635,
 4684390284,
 4684390320,
 5133014954,
 5323513026,
 5394045965,
 5394053135,
 5438332847,
 5438332850,
 6614935661,
 6653983649,
 6654004829,
 6879899205,
 6879899216,
 6892142909,
 6892143052,
 6958932853,
 6958932854,
 6958932858,
 7124847108,
 7434739098,
 7434739102,
 8492030537,
 8492030539,
 8506404118,
 8518477120,
 8518477127,
 9355815528,
 9708635342}
origs = [o for o in df.pandana for d in df.pandana]
dests = [d for o in df.pandana for d in df.pandana]
%time distances = network.shortest_path_lengths(origs, dests)
CPU times: user 14.1 ms, sys: 3.95 ms, total: 18 ms
Wall time: 16.7 ms
import numpy as np

n = len(df)
pd.options.display.float_format = "{:.2f}".format
pd.DataFrame(
    np.array(list(distances)).reshape(n, n), index=df.address, columns=df.address
)
address Centraal Station Amsterdam Business School Artis Arena Ziggo Dome
address
Centraal Station 0.00 2218.20 11501.29 6274.28 6084.35
Amsterdam Business School 2218.20 0.00 11825.68 4130.58 4537.02
Artis 11501.29 11825.68 0.00 14651.95 13954.61
Arena 6274.28 4130.58 14651.95 0.00 4127.14
Ziggo Dome 6084.35 4537.02 13954.61 4127.14 0.00
np.random.seed(2023)
n = 500
sample = np.random.choice(
    np.array(network.nodes_df.index.values.tolist()), n, replace=False
)
origs = [o for o in sample for d in sample]
dests = [d for o in sample for d in sample]
%time distances = network.shortest_path_lengths(origs, dests)
%time table = pd.DataFrame(np.array(list(distances)).reshape(n,n),index=sample,columns=sample)
CPU times: user 5.23 s, sys: 65.9 ms, total: 5.3 s
Wall time: 5.33 s
CPU times: user 21.1 ms, sys: 297 µs, total: 21.4 ms
Wall time: 21.5 ms
departure = table.max(axis=1).idxmax()
arrival = table.loc[departure].idxmax()
%time path_pandana = network.shortest_path(departure, arrival)
%time path_nx = nx.shortest_path(G_walk,departure,arrival,weight='length')
A = set(path_pandana)
B = set(path_nx)
(A | B) - (A & B)
CPU times: user 4.6 ms, sys: 3.47 ms, total: 8.07 ms
Wall time: 7.24 ms
CPU times: user 594 ms, sys: 17.1 ms, total: 611 ms
Wall time: 617 ms
set()
%time paths = network.shortest_paths(origs,dests)
CPU times: user 12.9 s, sys: 549 ms, total: 13.5 s
Wall time: 13.5 s
sum(map(len, paths))
35377572
for u, v in zip(paths[1][:-1], paths[1][1:]):
    print(G_walk.get_edge_data(u, v)[0].get("name", ""))
Oostoever

Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat



Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Orteliuskade
Orteliuskade
Orteliuskade
Van Middellandtstraat
Orteliusstraat
Postjesweg
Van Spilbergenstraat
Postjesweg




Hoofdweg
Hoofdweg
Hoofdweg
Hoofdweg
Hoofdweg
Hoofdweg
Surinameplein
Surinameplein
Surinamestraat
Surinamestraat
Surinamestraat
['Surinamestraat', 'Overtoom']
Overtoom

Overtoom
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg


Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Laan der Hesperiden



Stadionplein
Laan der Hesperiden
Laan der Hesperiden
Laan der Hesperiden
Olympisch Stadion
Olympisch Stadion
Olympisch Stadion

Na Druk Gelukbrug


Burgerweeshuispad
route_map = ox.plot_route_folium(G_walk, paths[1], color="red", map=Map)
display(route_map)
Make this Notebook Trusted to load map: File -> Trust Notebook