ot-heterogeneity

A project to compute optimal transport based heterogeneity indexes.

1 - Usage

The librairy can simply be installed using pip install oterogeneity and then imported and used as documented here :

import oterogeneity as oth
from oterogeneity import utils

unitary_direction_matrix, distance = utils.compute_unitary_direction_matrix_polar(lat, lon)
results = oth.ot_heterogeneity_populations(
	distrib_canidates, distance_matrix, unitary_direction_matrix=unitary_direction_matrix
)

1.a - The result class

The ot_heterogeneity_results class contains all of the results of a computation of spatial heterogeneity based on optimal transport using our method.

It contains the following attributes (that may be None if not applicable) :

size (int): Number of spatial units (town, polling stations, etc...)
num_categories (int): number of distinct categories
num_dimensions (int): number of spacial dimensions (typically 2)
has_direction (bool): whether the result contains directionality fields or not
global_heterogeneity (float): global heterogeneity index
global_heterogeneity_per_category (np.array): 1d-array of length num_categories that contains the local heterogeneity index for each category.
local_heterogeneity (np.array): 1d-array of length size that contains the local heterogeneity index for each location
local_signed_heterogeneity (np.array): either a 2d-array of shape (num_categories, size) when num_categories > 1, or a 1d-array of length size if num_categories = 1, that contains the signed heterogeneity index for each category and each location.
local_exiting_heterogeneity (np.array): 1d-array of length size that contains the heterogeneity index based only on exiting flux for each location.
local_entering_heterogeneity (np.array): 1d-array of length size that contains the heterogeneity index based only on entering flux for each location.
local_heterogeneity_per_category (np.array): 1d-array of length size that contains the heterogeneity index for each location.
local_exiting_heterogeneity_per_category (np.array): 2d-array of shape (num_categories, size) that contains the heterogeneity index based only on exiting flux for each category and each location.
local_entering_heterogeneity_per_category (np.array): 2d-array of shape (num_categories, size) that contains the heterogeneity index based only on entering flux for each category and each location.
direction (np.array): 2d-array of shape (num_dimensions, size) representing the vectorial field of directionality.
direction_per_category (np.array): 3d-array of shape (num_categories, num_dimensions, size) representing the vectorial field of directionality for each category.

1.b - Functions

1.b.1 - `ot_heterogeneity_from_null_distrib`

The ot_heterogeneity_from_null_distrib function is the most general function implementing our method for measuring spatial heterogeneity.

def ot_heterogeneity_from_null_distrib(
	distrib: np.array, null_distrib: np.array, distance_mat: np.array,
	transport_plane: np.array=None, return_transport_plane: bool=False,
	unitary_direction_matrix: np.array=None, local_weight_distrib: np.array=None,
	category_weights: np.array=None, epsilon_exponent: float=-1e-3,
	use_same_exponent_weight: bool=True, min_value_avoid_zeros: float=1e-5,
	ot_solve_kwargs : dict={}
)

The following parameters are passed to the function :

distrib (np.array): 2d-array of shape (num_categories, size) representing the population distribution, i.e. the population of each category in each location.
null_distrib (np.array): either a 2d-array of shape (num_categories, size) or a 1d-array of length size if every category has the same null distribution, representing the null distribution (distribution without heterogeneity), to which the distribution will be compared.
distance_mat (np.array): 2d-array of shape (size, size) representing the distance between each locality.

With the following parameters being optional :

transport_plane (np.array): either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.
return_transport_plane (bool): if true, the function will also return the transport plane.
unitary_direction_matrix (np.array): 3d-array of shape (num_categories, size, size) representing the unitary vector between each location.
local_weight_distrib (np.array): 1d-array of length size representing the weight for each location. By default this weight is simply the proportion of the total population located in each location.
category_weights (np.array): 1d-array of length num_categories representing the weight for each num_category. By default this weight is simply the proportion of the total population that belong to each category.
epsilon_exponent (float): the distance matrix is exponentiated (element-wise) by an exponent 1+epsilon_exponent
use_same_exponent_weight (bool): if true the cost (i.e. distant) is exponentiated by the same exponent as the one for the cost matrix in the optimal-transport computation.
min_value_avoid_zeros (float): value below wich a value is concidered zero.
ot_solve_kwargs (dict): list of additional amed argument to pass to the ot.solve (or ot.solve_batch) function that is used as a backend.

The function returns a result as an object of class ot_heterogeneity_results.

If return_transport_plane is true, the function also returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.

1.b.2 - `ot_heterogeneity_populations`

The ot_heterogeneity_populations function uses the total population distribution accross all classes as the null distribution. It thus assumes the nul distribution is the distribution where the total population at each location doesn't change, and the proportion of each category is the same as the global distribution of classes.

def ot_heterogeneity_populations(
	distrib, distance_mat: np.array, total_population_distrib: np.array=None,
	unitary_direction_matrix: np.array=None, transport_plane: np.array=None,
	return_transport_plane: bool=False, epsilon_exponent: float=-1e-3,
	use_same_exponent_weight: bool=True, min_value_avoid_zeros: float=1e-5,
	ot_solve_kwargs : dict={}
)

The following parameters are passed to the function :

distrib (np.array): 2d-array of shape (num_categories, size) representing the population distribution, i.e. the population of each category in each location.
distance_mat (np.array): 2d-array of shape (size, size) representing the distance between each locality.

With the following parameters being optional :

transport_plane (np.array): either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.
return_transport_plane (bool): if true, the function will also return the transport plane.
unitary_direction_matrix (np.array): 3d-array of shape (num_categories, size, size) representing the unitary vector between each location.
epsilon_exponent (float): the distance matrix is exponentiated (element-wise) by an exponent 1+epsilon_exponent
use_same_exponent_weight (bool): if true the cost (i.e. distant) is exponentiated by the same exponent as the one for the cost matrix in the optimal-transport computation.
min_value_avoid_zeros (float): value below wich a value is concidered zero.
ot_solve_kwargs (dict): list of additional amed argument to pass to the ot.solve (or ot.solve_batch) function that is used as a backend.

The function returns a result as an object of class ot_heterogeneity_results.

If return_transport_plane is true, the function also returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.

1.b.1 - `ot_heterogeneity_linear_regression`

The ot_heterogeneity_linear_regression function will be documented later on.

def ot_heterogeneity_linear_regression(
	distrib: np.array, prediction_distrib: np.array, distance_mat: np.array,
	local_weight_distrib: np.array=None, transport_plane: np.array=None,
	return_transport_plane: bool=False, unitary_direction_matrix: np.array=None,
	fit_regression : bool=True, regression=sklearn.linear_model.LinearRegression(), 
	epsilon_exponent: float=-1e-3, use_same_exponent_weight: bool=True,
	min_value_avoid_zeros: float=1e-5, ot_solve_kwargs : dict={}
)

1.c - Utility functions

The utility functions are located in the utils package, so they should be used from this subpackage :

import oterogeneity as oth
from oterogeneity import utils

unitary_dir_ma, distance_mat  = utils.compute_unitary_direction_matrix_polar(lat, lon)
# Or :
unitary_dir_mat, distance_mat = oth.utils.compute_unitary_direction_matrix_polar(lat, lon)

1.c.1 - `compute_optimal_transport_flux`

def compute_optimal_transport_flux(
	distributions_to: np.array, distributions_from: np.array, distance_mat: np.array,
	ot_solve_kwargs : dict={}, force_for_loop : bool=False
)

The compute_optimal_transport_flux function computes the distance between a list of coordinates.

distributions_to (np.array): 2d-array of shape (num_dimensions, size) or 1d-array of length size representing the end distribution of population.
distributions_from (np.array): 2d-array of shape (num_dimensions, size) or 1d-array of length size representing the starting distribution of population that will be transported to distributions_to.

With the following parameters being optional :

distance_mat (np.array): 2d-array of shape (size, size) filled with the distance between each location.
ot_solve_kwargs (dict): list of additional amed argument to pass to the ot.solve (or ot.solve_batch) function that is used as a backend.
force_for_loop (bool): force solving using ot.solve instead of ot.solve_batch.

It returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.

1.c.2 - `compute_distance_matrix`

The compute_distance_matrix function computes the distance between a list of coordinates.

def compute_distance_matrix(coordinates: np.array, exponent: float=2)

The compute_distance_matrix function takes the following parameters :

coordinates (np.array): 2d-array of shape (num_dimensions, size) representing the position of each location.

With the following parameters being optional :

exponent (float): the exponent used in the norm (2 is the euclidien norm).

It returns the distance matrix filled with the distance between each location.

1.c.3 - `compute_distance_matrix_polar`

The compute_distance_matrix_polar function computes the distance between a list of coordinates from polar coordinates on a sphere. by default it can be used for typical coordinates on earth.

def compute_distance_matrix_polar(
	latitudes: np.array, longitudes: np.array,
	radius: float=6378137, unit: str="deg"
)

The compute_distance_matrix function takes the following parameters :

latitudes (np.array): 1d-array of length size with the latitudes of each point.
longitudes (np.array): 1d-array of length size with the longitudes of each point.

With the following parameters being optional :

radius (float): radius of the sphere (by default 6378137 which is the radius of the earth in meters).
unit (str): a string to define the unit of the longitude and latituden, eather "rad", "deg" (default), "arcmin", or "arcsec".

It returns the distance matrix filled with the distance between each location.

1.c.4 - `compute_unitary_direction_matrix`

The compute_unitary_direction_matrix function computes the matrix of unitary vectors used to computed direction in the main functions.

def compute_unitary_direction_matrix(
	coordinates: np.array, distance_mat: np.array=None,
	exponent: float=2
)

The compute_unitary_direction_matrix function takes the following parameters :

coordinates (np.array): 2d-array of shape (num_dimensions, size) representing the position of each location.

With the following parameters being optional :

distance_mat (np.array): you can optionally pass a 2d-array of shape (size, size) filled with the distance between each location. If not passed it will be computed and returned.
exponent (float): the exponent used in the norm (2 is the euclidien norm). If a distance matrix is passed, it must have been computed with the same exponent as the one passed to this function.

It returns the following values :

unitary_direction_matrix (np.array): 3d-array of shape (num_categories, size, size) representing the unitary vector between each location.
distance_mat (np.array): a distance matrix is returned if it was not passed as a parameter (to avoid recomputing it), it is a 2d-array of shape (size, size) filled with the distance between each location.

1.c.5 - `compute_unitary_direction_matrix_polar`

The compute_unitary_direction_matrix_polar function computes the matrix of unitary vectors used to computed direction in the main functions, between a list of coordinates from polar coordinates on a sphere. by default it can be used for typical coordinates on earth.

def compute_unitary_direction_matrix_polar(
	latitudes: np.array, longitudes: np.array,
	distance_mat: np.array=None, radius: float=6378137, unit: str="deg"
)

The compute_unitary_direction_matrix_polar function takes the following parameters :

latitudes (np.array): 1d-array of length size with the latitudes of each point.
longitudes (np.array): 1d-array of length size with the longitudes of each point.

With the following parameters being optional :

radius (float): radius of the sphere (by default 6378137 which is the radius of the earth in meters).
distance_mat (np.array): you can optionally pass a 2d-array of shape (size, size) filled with the distance between each location. If not passed it will be computed and returned.
unit (str): a string to define the unit of the longitude and latituden, eather "rad", "deg" (default), "arcmin", or "arcsec".

It returns the following values :

unitary_direction_matrix (np.array): 3d-array of shape (num_categories, size, size) representing the unitary vector between each location.
distance_mat (np.array): a distance matrix is returned if it was not passed as a parameter (to avoid recomputing it), it is a 2d-array of shape (size, size) filled with the distance between each location.

2 - License

"oterogeneity" (c) by @jolatechno - Joseph Touzet

"oterogeneity" is licensed under a
Creative Commons Attribution 4.0 International License.

You should have received a copy of the license along with this
work. If not, see <https://creativecommons.org/licenses/by/4.0/>.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
oterogeneity		oterogeneity
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ot-heterogeneity

1 - Usage

1.a - The result class

1.b - Functions

1.b.1 - `ot_heterogeneity_from_null_distrib`

1.b.2 - `ot_heterogeneity_populations`

1.b.1 - `ot_heterogeneity_linear_regression`

1.c - Utility functions

1.c.1 - `compute_optimal_transport_flux`

1.c.2 - `compute_distance_matrix`

1.c.3 - `compute_distance_matrix_polar`

1.c.4 - `compute_unitary_direction_matrix`

1.c.5 - `compute_unitary_direction_matrix_polar`

2 - License

About

Uh oh!

Releases 2

Packages

Languages

jolatechno/ot-heterogeneity

Folders and files

Latest commit

History

Repository files navigation

ot-heterogeneity

1 - Usage

1.a - The result class

1.b - Functions

1.b.1 - ot_heterogeneity_from_null_distrib

1.b.2 - ot_heterogeneity_populations

1.b.1 - ot_heterogeneity_linear_regression

1.c - Utility functions

1.c.1 - compute_optimal_transport_flux

1.c.2 - compute_distance_matrix

1.c.3 - compute_distance_matrix_polar

1.c.4 - compute_unitary_direction_matrix

1.c.5 - compute_unitary_direction_matrix_polar

2 - License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

1.b.1 - `ot_heterogeneity_from_null_distrib`

1.b.2 - `ot_heterogeneity_populations`

1.b.1 - `ot_heterogeneity_linear_regression`

1.c.1 - `compute_optimal_transport_flux`

1.c.2 - `compute_distance_matrix`

1.c.3 - `compute_distance_matrix_polar`

1.c.4 - `compute_unitary_direction_matrix`

1.c.5 - `compute_unitary_direction_matrix_polar`

Packages