Mobility

class pyspainmobility.mobility.mobility.Mobility(version: int = 2, zones: str = 'municipalities', start_date: str = None, end_date: str = None, output_directory: str = None, use_dask: bool = False)[source]

This is the object taking care of the data download and preprocessing of (i) daily origin-destination matrices (ii), overnight stays and (iii) number of trips. The data is downloaded from the Spanish Ministry of Transport, Mobility and Urban Agenda (MITMA) Open Data portal. Additional information can be found at https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data. The data is available for two versions: version 1 (2020-02-14 to 2021-05-09) and version 2 (2022-01-01 onward). Data are available at different levels of granularity: districts (distritos), municipalities (municipios) and large urban areas (grandes áreas urbanas). Concerning version 1, data are LUA are not available. Also, overnight stays are not available for version 1.

Parameters:
  • version (int) – The version of the data to download. Default is 2. Version must be 1 or 2. Version 1 contains the data from 2020 to 2021. Version 2 contains the data from 2022 onwards.

  • zones (str) – The zones to download the data for. Default is municipalities. Zones must be one of the following: districts, dist, distr, distritos, municipalities, muni, municipal, municipios, lua, large_urban_areas, gau, gaus, grandes_areas_urbanas

  • start_date (str) – The start date of the data to download. Date must be in the format YYYY-MM-DD. A start date is required

  • end_date (str) – The end date of the data to download. Default is None. Date must be in the format YYYY-MM-DD. if not specified, the end date will be the same as the start date.

  • output_directory (str) – The directory to save the raw data and the processed parquet. Default is None. If not specified, the data will be saved in a folder named ‘data’ in user’s home directory.

  • use_dask (bool) – Whether to use Dask for processing large datasets. Default is False. Requires dask to be installed.

Examples

>>> from pyspainmobility import Mobility
>>> # instantiate the object
>>> mobility_data = Mobility(version=2, zones='municipalities', start_date='2022-01-01', end_date='2022-01-06', output_directory='/Desktop/spain/data/')
>>> # download and save the origin-destination data
>>> mobility_data.get_od_data(keep_activity=True)
>>> # download and save the overnight stays data
>>> mobility_data.get_overnight_stays_data()
>>> # download and save the number of trips data
>>> mobility_data.get_number_of_trips_data()
__init__(version: int = 2, zones: str = 'municipalities', start_date: str = None, end_date: str = None, output_directory: str = None, use_dask: bool = False)[source]
get_od_data(keep_activity: bool = False, return_df: bool = False, social_agg: bool = False)[source]

Function to download and save the origin-destination data.

Parameters:
  • keep_activity (bool) – Default value is False. If True, the columns ‘activity_origin’ and ‘activity_destination’ will be kept in the final dataframe. If False, the columns will be dropped. The columns contain the activity of the origin and destination zones. The possible values are: ‘home’, ‘work_or_study’, ‘other_frequent’, ‘other_non_frequent’. Consider that keeping the activity columns will increase the size of the final dataframe and the saved files significantly.

  • return_df (bool) – Default value is False. If True, the function will return the dataframe in addition to saving it to a file.

  • social_agg (bool) – Default value is False. Adds socio-demographic breakdown.

  • income ()

  • age ()

  • gender ()

Examples

>>> from pyspainmobility import Mobility
>>> # instantiate the object
>>> mobility_data = Mobility(version=2, zones='municipalities', start_date='2022-01-01', end_date='2022-01-06', output_directory='/Desktop/spain/data/')
>>> # download and save the origin-destination data
>>> mobility_data.get_od_data(keep_activity=True)
>>> # download and save the od data and return the dataframe
>>> df = mobility_data.get_od_data(keep_activity=False, return_df=True)
>>> print(df.head())
    date  hour id_origin id_destination  n_trips  trips_total_length_km
0  2023-04-01     0     01001          01001    5.006              19.878000
1  2023-04-01     0     01001       01009_AM   14.994              70.697000
2  2023-04-01     0     01001       01058_AM    9.268              87.698000
3  2023-04-01     0     01001          01059   42.835             512.278674
4  2023-04-01     0     01001          48036    2.750             147.724000
get_overnight_stays_data(return_df: bool = False)[source]

Function to download and save the overnight stays data.

Parameters:

return_df (bool) – Default value is False. If True, the function will return the dataframe in addition to saving it to a file.

Examples

>>> from pyspainmobility import Mobility
>>> # instantiate the object
>>> mobility_data = Mobility(version=2, zones='municipalities', start_date='2022-01-01', end_date='2022-01-06', output_directory='/Desktop/spain/data/')
>>> # download and save the overnight stays data and return the dataframe
>>> df = mobility_data.get_overnight_stays_data( return_df=True)
>>> print(df.head())
   date residence_area overnight_stay_area    people
0  2023-04-01          01001               01001  2716.303
1  2023-04-01          01001            01009_AM    14.088
2  2023-04-01          01001            01017_AM     2.476
3  2023-04-01          01001            01058_AM    18.939
4  2023-04-01          01001               01059   144.118
get_number_of_trips_data(return_df: bool = False)[source]

Function to download and save the data regarding the number of trips to an area of certain demographic categories.

Parameters:

return_df (bool) – Default value is False. If True, the function will return the dataframe in addition to saving it to a file.

Examples

>>> from pyspainmobility import Mobility
>>> # instantiate the object
>>> mobility_data = Mobility(version=2, zones='municipalities', start_date='2022-01-01', end_date='2022-01-06', output_directory='/Desktop/spain/data/')
>>> # download and save the overnight stays data and return the dataframe
>>> df = mobility_data.get_number_of_trips_data( return_df=True)
>>> print(df.head())
date overnight_stay_area   age  gender number_of_trips   people
0  2023-04-01               01001  0-25    male               0  128.457
1  2023-04-01               01001  0-25    male               1   38.537
2  2023-04-01               01001  0-25    male               2  129.136
3  2023-04-01               01001  0-25    male              2+  129.913
4  2023-04-01               01001  0-25  female               0  188.744