Skip to content

Repository about computing heterogeneity indexes using Optimal Transport.

Notifications You must be signed in to change notification settings

jolatechno/ot-heterogeneity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ot-heterogeneity

A project to compute optimal transport based heterogeneity indexes.

1 - Usage

The librairy can simply be installed using pip install oterogeneity and then imported and used as documented here :

import oterogeneity as oth
from oterogeneity import utils

unitary_direction_matrix, distance = utils.compute_unitary_direction_matrix_polar(lat, lon)
results = oth.ot_heterogeneity_populations(
	distrib_canidates, distance_matrix, unitary_direction_matrix=unitary_direction_matrix
)

1.a - The result class

The ot_heterogeneity_results class contains all of the results of a computation of spatial heterogeneity based on optimal transport using our method.

It contains the following attributes (that may be None if not applicable) :

  • size (int): Number of spatial units (town, polling stations, etc...)
  • num_categories (int): number of distinct categories
  • num_dimensions (int): number of spacial dimensions (typically 2)
  • has_direction (bool): whether the result contains directionality fields or not
  • global_heterogeneity (float): global heterogeneity index
  • global_heterogeneity_per_category (np.array): 1d-array of length num_categories that contains the local heterogeneity index for each category.
  • local_heterogeneity (np.array): 1d-array of length size that contains the local heterogeneity index for each location
  • local_signed_heterogeneity (np.array): either a 2d-array of shape (num_categories, size) when num_categories > 1, or a 1d-array of length size if num_categories = 1, that contains the signed heterogeneity index for each category and each location.
  • local_exiting_heterogeneity (np.array): 1d-array of length size that contains the heterogeneity index based only on exiting flux for each location.
  • local_entering_heterogeneity (np.array): 1d-array of length size that contains the heterogeneity index based only on entering flux for each location.
  • local_heterogeneity_per_category (np.array): 1d-array of length size that contains the heterogeneity index for each location.
  • local_exiting_heterogeneity_per_category (np.array): 2d-array of shape (num_categories, size) that contains the heterogeneity index based only on exiting flux for each category and each location.
  • local_entering_heterogeneity_per_category (np.array): 2d-array of shape (num_categories, size) that contains the heterogeneity index based only on entering flux for each category and each location.
  • direction (np.array): 2d-array of shape (num_dimensions, size) representing the vectorial field of directionality.
  • direction_per_category (np.array): 3d-array of shape (num_categories, num_dimensions, size) representing the vectorial field of directionality for each category.

1.b - Functions

1.b.1 - ot_heterogeneity_from_null_distrib

The ot_heterogeneity_from_null_distrib function is the most general function implementing our method for measuring spatial heterogeneity.

def ot_heterogeneity_from_null_distrib(
	distrib: np.array, null_distrib: np.array, distance_mat: np.array,
	transport_plane: np.array=None, return_transport_plane: bool=False,
	unitary_direction_matrix: np.array=None, local_weight_distrib: np.array=None,
	category_weights: np.array=None, epsilon_exponent: float=-1e-3,
	use_same_exponent_weight: bool=True, min_value_avoid_zeros: float=1e-5,
	ot_solve_kwargs : dict={}
)

The following parameters are passed to the function :

  • distrib (np.array): 2d-array of shape (num_categories, size) representing the population distribution, i.e. the population of each category in each location.
  • null_distrib (np.array): either a 2d-array of shape (num_categories, size) or a 1d-array of length size if every category has the same null distribution, representing the null distribution (distribution without heterogeneity), to which the distribution will be compared.
  • distance_mat (np.array): 2d-array of shape (size, size) representing the distance between each locality.

With the following parameters being optional :

  • transport_plane (np.array): either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.
  • return_transport_plane (bool): if true, the function will also return the transport plane.
  • unitary_direction_matrix (np.array): 3d-array of shape (num_categories, size, size) representing the unitary vector between each location.
  • local_weight_distrib (np.array): 1d-array of length size representing the weight for each location. By default this weight is simply the proportion of the total population located in each location.
  • category_weights (np.array): 1d-array of length num_categories representing the weight for each num_category. By default this weight is simply the proportion of the total population that belong to each category.
  • epsilon_exponent (float): the distance matrix is exponentiated (element-wise) by an exponent 1+epsilon_exponent
  • use_same_exponent_weight (bool): if true the cost (i.e. distant) is exponentiated by the same exponent as the one for the cost matrix in the optimal-transport computation.
  • min_value_avoid_zeros (float): value below wich a value is concidered zero.
  • ot_solve_kwargs (dict): list of additional amed argument to pass to the ot.solve (or ot.solve_batch) function that is used as a backend.

The function returns a result as an object of class ot_heterogeneity_results.

If return_transport_plane is true, the function also returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.

If return_transport_plane is true, the function also returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.

1.b.2 - ot_heterogeneity_populations

The ot_heterogeneity_populations function uses the total population distribution accross all classes as the null distribution. It thus assumes the nul distribution is the distribution where the total population at each location doesn't change, and the proportion of each category is the same as the global distribution of classes.

def ot_heterogeneity_populations(
	distrib, distance_mat: np.array, total_population_distrib: np.array=None,
	unitary_direction_matrix: np.array=None, transport_plane: np.array=None,
	return_transport_plane: bool=False, epsilon_exponent: float=-1e-3,
	use_same_exponent_weight: bool=True, min_value_avoid_zeros: float=1e-5,
	ot_solve_kwargs : dict={}
)

The following parameters are passed to the function :

  • distrib (np.array): 2d-array of shape (num_categories, size) representing the population distribution, i.e. the population of each category in each location.
  • distance_mat (np.array): 2d-array of shape (size, size) representing the distance between each locality.

With the following parameters being optional :

  • transport_plane (np.array): either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.
  • return_transport_plane (bool): if true, the function will also return the transport plane.
  • unitary_direction_matrix (np.array): 3d-array of shape (num_categories, size, size) representing the unitary vector between each location.
  • epsilon_exponent (float): the distance matrix is exponentiated (element-wise) by an exponent 1+epsilon_exponent
  • use_same_exponent_weight (bool): if true the cost (i.e. distant) is exponentiated by the same exponent as the one for the cost matrix in the optimal-transport computation.
  • min_value_avoid_zeros (float): value below wich a value is concidered zero.
  • ot_solve_kwargs (dict): list of additional amed argument to pass to the ot.solve (or ot.solve_batch) function that is used as a backend.

The function returns a result as an object of class ot_heterogeneity_results.

If return_transport_plane is true, the function also returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.

1.b.1 - ot_heterogeneity_linear_regression

The ot_heterogeneity_linear_regression function will be documented later on.

def ot_heterogeneity_linear_regression(
	distrib: np.array, prediction_distrib: np.array, distance_mat: np.array,
	local_weight_distrib: np.array=None, transport_plane: np.array=None,
	return_transport_plane: bool=False, unitary_direction_matrix: np.array=None,
	fit_regression : bool=True, regression=sklearn.linear_model.LinearRegression(), 
	epsilon_exponent: float=-1e-3, use_same_exponent_weight: bool=True,
	min_value_avoid_zeros: float=1e-5, ot_solve_kwargs : dict={}
)

1.c - Utility functions

The utility functions are located in the utils package, so they should be used from this subpackage :

import oterogeneity as oth
from oterogeneity import utils

unitary_dir_ma, distance_mat  = utils.compute_unitary_direction_matrix_polar(lat, lon)
# Or :
unitary_dir_mat, distance_mat = oth.utils.compute_unitary_direction_matrix_polar(lat, lon)

1.c.1 - compute_optimal_transport_flux

def compute_optimal_transport_flux(
	distributions_to: np.array, distributions_from: np.array, distance_mat: np.array,
	ot_solve_kwargs : dict={}, force_for_loop : bool=False
)

The compute_optimal_transport_flux function computes the distance between a list of coordinates.

  • distributions_to (np.array): 2d-array of shape (num_dimensions, size) or 1d-array of length size representing the end distribution of population.
  • distributions_from (np.array): 2d-array of shape (num_dimensions, size) or 1d-array of length size representing the starting distribution of population that will be transported to distributions_to.

With the following parameters being optional :

  • distance_mat (np.array): 2d-array of shape (size, size) filled with the distance between each location.
  • ot_solve_kwargs (dict): list of additional amed argument to pass to the ot.solve (or ot.solve_batch) function that is used as a backend.
  • force_for_loop (bool): force solving using ot.solve instead of ot.solve_batch.

It returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.

1.c.2 - compute_distance_matrix

The compute_distance_matrix function computes the distance between a list of coordinates.

def compute_distance_matrix(coordinates: np.array, exponent: float=2)

The compute_distance_matrix function takes the following parameters :

  • coordinates (np.array): 2d-array of shape (num_dimensions, size) representing the position of each location.

With the following parameters being optional :

  • exponent (float): the exponent used in the norm (2 is the euclidien norm).

It returns the distance matrix filled with the distance between each location.

1.c.3 - compute_distance_matrix_polar

The compute_distance_matrix_polar function computes the distance between a list of coordinates from polar coordinates on a sphere. by default it can be used for typical coordinates on earth.

def compute_distance_matrix_polar(
	latitudes: np.array, longitudes: np.array,
	radius: float=6378137, unit: str="deg"
)

The compute_distance_matrix function takes the following parameters :

  • latitudes (np.array): 1d-array of length size with the latitudes of each point.
  • longitudes (np.array): 1d-array of length size with the longitudes of each point.

With the following parameters being optional :

  • radius (float): radius of the sphere (by default 6378137 which is the radius of the earth in meters).
  • unit (str): a string to define the unit of the longitude and latituden, eather "rad", "deg" (default), "arcmin", or "arcsec".

It returns the distance matrix filled with the distance between each location.

1.c.4 - compute_unitary_direction_matrix

The compute_unitary_direction_matrix function computes the matrix of unitary vectors used to computed direction in the main functions.

def compute_unitary_direction_matrix(
	coordinates: np.array, distance_mat: np.array=None,
	exponent: float=2
)

The compute_unitary_direction_matrix function takes the following parameters :

  • coordinates (np.array): 2d-array of shape (num_dimensions, size) representing the position of each location.

With the following parameters being optional :

  • distance_mat (np.array): you can optionally pass a 2d-array of shape (size, size) filled with the distance between each location. If not passed it will be computed and returned.
  • exponent (float): the exponent used in the norm (2 is the euclidien norm). If a distance matrix is passed, it must have been computed with the same exponent as the one passed to this function.

It returns the following values :

  • unitary_direction_matrix (np.array): 3d-array of shape (num_categories, size, size) representing the unitary vector between each location.
  • distance_mat (np.array): a distance matrix is returned if it was not passed as a parameter (to avoid recomputing it), it is a 2d-array of shape (size, size) filled with the distance between each location.

1.c.5 - compute_unitary_direction_matrix_polar

The compute_unitary_direction_matrix_polar function computes the matrix of unitary vectors used to computed direction in the main functions, between a list of coordinates from polar coordinates on a sphere. by default it can be used for typical coordinates on earth.

def compute_unitary_direction_matrix_polar(
	latitudes: np.array, longitudes: np.array,
	distance_mat: np.array=None, radius: float=6378137, unit: str="deg"
)

The compute_unitary_direction_matrix_polar function takes the following parameters :

  • latitudes (np.array): 1d-array of length size with the latitudes of each point.
  • longitudes (np.array): 1d-array of length size with the longitudes of each point.

With the following parameters being optional :

  • radius (float): radius of the sphere (by default 6378137 which is the radius of the earth in meters).
  • distance_mat (np.array): you can optionally pass a 2d-array of shape (size, size) filled with the distance between each location. If not passed it will be computed and returned.
  • unit (str): a string to define the unit of the longitude and latituden, eather "rad", "deg" (default), "arcmin", or "arcsec".

It returns the following values :

  • unitary_direction_matrix (np.array): 3d-array of shape (num_categories, size, size) representing the unitary vector between each location.
  • distance_mat (np.array): a distance matrix is returned if it was not passed as a parameter (to avoid recomputing it), it is a 2d-array of shape (size, size) filled with the distance between each location.

2 - License

"oterogeneity" (c) by @jolatechno - Joseph Touzet

"oterogeneity" is licensed under a
Creative Commons Attribution 4.0 International License.

You should have received a copy of the license along with this
work. If not, see <https://creativecommons.org/licenses/by/4.0/>.

About

Repository about computing heterogeneity indexes using Optimal Transport.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages