A project to compute optimal transport based heterogeneity indexes.
The librairy can simply be installed using pip install oterogeneity and then imported and used as documented here :
import oterogeneity as oth
from oterogeneity import utils
unitary_direction_matrix, distance = utils.compute_unitary_direction_matrix_polar(lat, lon)
results = oth.ot_heterogeneity_populations(
distrib_canidates, distance_matrix, unitary_direction_matrix=unitary_direction_matrix
)The ot_heterogeneity_results class contains all of the results of a computation of spatial heterogeneity based on optimal transport using our method.
It contains the following attributes (that may be None if not applicable) :
size(int): Number of spatial units (town, polling stations, etc...)num_categories(int): number of distinct categoriesnum_dimensions(int): number of spacial dimensions (typically 2)has_direction(bool): whether the result contains directionality fields or notglobal_heterogeneity(float): global heterogeneity indexglobal_heterogeneity_per_category(np.array): 1d-array of lengthnum_categoriesthat contains the local heterogeneity index for each category.local_heterogeneity(np.array): 1d-array of lengthsizethat contains the local heterogeneity index for each locationlocal_signed_heterogeneity(np.array): either a 2d-array of shape (num_categories,size) whennum_categories> 1, or a 1d-array of lengthsizeifnum_categories= 1, that contains the signed heterogeneity index for each category and each location.local_exiting_heterogeneity(np.array): 1d-array of lengthsizethat contains the heterogeneity index based only on exiting flux for each location.local_entering_heterogeneity(np.array): 1d-array of lengthsizethat contains the heterogeneity index based only on entering flux for each location.local_heterogeneity_per_category(np.array): 1d-array of lengthsizethat contains the heterogeneity index for each location.local_exiting_heterogeneity_per_category(np.array): 2d-array of shape (num_categories,size) that contains the heterogeneity index based only on exiting flux for each category and each location.local_entering_heterogeneity_per_category(np.array): 2d-array of shape (num_categories,size) that contains the heterogeneity index based only on entering flux for each category and each location.direction(np.array): 2d-array of shape (num_dimensions,size) representing the vectorial field of directionality.direction_per_category(np.array): 3d-array of shape (num_categories,num_dimensions,size) representing the vectorial field of directionality for each category.
The ot_heterogeneity_from_null_distrib function is the most general function implementing our method for measuring spatial heterogeneity.
def ot_heterogeneity_from_null_distrib(
distrib: np.array, null_distrib: np.array, distance_mat: np.array,
transport_plane: np.array=None, return_transport_plane: bool=False,
unitary_direction_matrix: np.array=None, local_weight_distrib: np.array=None,
category_weights: np.array=None, epsilon_exponent: float=-1e-3,
use_same_exponent_weight: bool=True, min_value_avoid_zeros: float=1e-5,
ot_solve_kwargs : dict={}
)The following parameters are passed to the function :
distrib(np.array): 2d-array of shape (num_categories,size) representing the population distribution, i.e. the population of each category in each location.null_distrib(np.array): either a 2d-array of shape (num_categories,size) or a 1d-array of lengthsizeif every category has the same null distribution, representing the null distribution (distribution without heterogeneity), to which the distribution will be compared.distance_mat(np.array): 2d-array of shape (size,size) representing the distance between each locality.
With the following parameters being optional :
transport_plane(np.array): either a 3d array of shape (num_dimensions,size,size) or a 2d array of shape (size,size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.return_transport_plane(bool): if true, the function will also return the transport plane.unitary_direction_matrix(np.array): 3d-array of shape (num_categories,size,size) representing the unitary vector between each location.local_weight_distrib(np.array): 1d-array of lengthsizerepresenting the weight for each location. By default this weight is simply the proportion of the total population located in each location.category_weights(np.array): 1d-array of lengthnum_categoriesrepresenting the weight for each num_category. By default this weight is simply the proportion of the total population that belong to each category.epsilon_exponent(float): the distance matrix is exponentiated (element-wise) by an exponent1+epsilon_exponentuse_same_exponent_weight(bool): if true the cost (i.e. distant) is exponentiated by the same exponent as the one for the cost matrix in the optimal-transport computation.min_value_avoid_zeros(float): value below wich a value is concidered zero.ot_solve_kwargs(dict): list of additional amed argument to pass to theot.solve(orot.solve_batch) function that is used as a backend.
The function returns a result as an object of class ot_heterogeneity_results.
If return_transport_plane is true, the function also returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.
If return_transport_plane is true, the function also returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.
The ot_heterogeneity_populations function uses the total population distribution accross all classes as the null distribution. It thus assumes the nul distribution is the distribution where the total population at each location doesn't change, and the proportion of each category is the same as the global distribution of classes.
def ot_heterogeneity_populations(
distrib, distance_mat: np.array, total_population_distrib: np.array=None,
unitary_direction_matrix: np.array=None, transport_plane: np.array=None,
return_transport_plane: bool=False, epsilon_exponent: float=-1e-3,
use_same_exponent_weight: bool=True, min_value_avoid_zeros: float=1e-5,
ot_solve_kwargs : dict={}
)The following parameters are passed to the function :
distrib(np.array): 2d-array of shape (num_categories,size) representing the population distribution, i.e. the population of each category in each location.distance_mat(np.array): 2d-array of shape (size,size) representing the distance between each locality.
With the following parameters being optional :
transport_plane(np.array): either a 3d array of shape (num_dimensions,size,size) or a 2d array of shape (size,size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.return_transport_plane(bool): if true, the function will also return the transport plane.unitary_direction_matrix(np.array): 3d-array of shape (num_categories,size,size) representing the unitary vector between each location.epsilon_exponent(float): the distance matrix is exponentiated (element-wise) by an exponent1+epsilon_exponentuse_same_exponent_weight(bool): if true the cost (i.e. distant) is exponentiated by the same exponent as the one for the cost matrix in the optimal-transport computation.min_value_avoid_zeros(float): value below wich a value is concidered zero.ot_solve_kwargs(dict): list of additional amed argument to pass to theot.solve(orot.solve_batch) function that is used as a backend.
The function returns a result as an object of class ot_heterogeneity_results.
If return_transport_plane is true, the function also returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.
The ot_heterogeneity_linear_regression function will be documented later on.
def ot_heterogeneity_linear_regression(
distrib: np.array, prediction_distrib: np.array, distance_mat: np.array,
local_weight_distrib: np.array=None, transport_plane: np.array=None,
return_transport_plane: bool=False, unitary_direction_matrix: np.array=None,
fit_regression : bool=True, regression=sklearn.linear_model.LinearRegression(),
epsilon_exponent: float=-1e-3, use_same_exponent_weight: bool=True,
min_value_avoid_zeros: float=1e-5, ot_solve_kwargs : dict={}
)The utility functions are located in the utils package, so they should be used from this subpackage :
import oterogeneity as oth
from oterogeneity import utils
unitary_dir_ma, distance_mat = utils.compute_unitary_direction_matrix_polar(lat, lon)
# Or :
unitary_dir_mat, distance_mat = oth.utils.compute_unitary_direction_matrix_polar(lat, lon)def compute_optimal_transport_flux(
distributions_to: np.array, distributions_from: np.array, distance_mat: np.array,
ot_solve_kwargs : dict={}, force_for_loop : bool=False
)The compute_optimal_transport_flux function computes the distance between a list of coordinates.
distributions_to(np.array): 2d-array of shape (num_dimensions,size) or 1d-array of lengthsizerepresenting the end distribution of population.distributions_from(np.array): 2d-array of shape (num_dimensions,size) or 1d-array of lengthsizerepresenting the starting distribution of population that will be transported todistributions_to.
With the following parameters being optional :
distance_mat(np.array): 2d-array of shape (size,size) filled with the distance between each location.ot_solve_kwargs(dict): list of additional amed argument to pass to theot.solve(orot.solve_batch) function that is used as a backend.force_for_loop(bool): force solving usingot.solveinstead ofot.solve_batch.
It returns the transport plane (np.array) which is either a 3d array of shape (num_dimensions, size, size) or a 2d array of shape (size, size) if distributions_from is only 1d. Element of index (n, i, j) reprensents the flux of population n from locality i to locality j.
The compute_distance_matrix function computes the distance between a list of coordinates.
def compute_distance_matrix(coordinates: np.array, exponent: float=2)The compute_distance_matrix function takes the following parameters :
coordinates(np.array): 2d-array of shape (num_dimensions,size) representing the position of each location.
With the following parameters being optional :
exponent(float): the exponent used in the norm (2 is the euclidien norm).
It returns the distance matrix filled with the distance between each location.
The compute_distance_matrix_polar function computes the distance between a list of coordinates from polar coordinates on a sphere. by default it can be used for typical coordinates on earth.
def compute_distance_matrix_polar(
latitudes: np.array, longitudes: np.array,
radius: float=6378137, unit: str="deg"
)The compute_distance_matrix function takes the following parameters :
latitudes(np.array): 1d-array of lengthsizewith the latitudes of each point.longitudes(np.array): 1d-array of lengthsizewith the longitudes of each point.
With the following parameters being optional :
radius(float): radius of the sphere (by default 6378137 which is the radius of the earth in meters).unit(str): a string to define the unit of the longitude and latituden, eather "rad", "deg" (default), "arcmin", or "arcsec".
It returns the distance matrix filled with the distance between each location.
The compute_unitary_direction_matrix function computes the matrix of unitary vectors used to computed direction in the main functions.
def compute_unitary_direction_matrix(
coordinates: np.array, distance_mat: np.array=None,
exponent: float=2
)The compute_unitary_direction_matrix function takes the following parameters :
coordinates(np.array): 2d-array of shape (num_dimensions,size) representing the position of each location.
With the following parameters being optional :
distance_mat(np.array): you can optionally pass a 2d-array of shape (size,size) filled with the distance between each location. If not passed it will be computed and returned.exponent(float): the exponent used in the norm (2 is the euclidien norm). If a distance matrix is passed, it must have been computed with the same exponent as the one passed to this function.
It returns the following values :
unitary_direction_matrix(np.array): 3d-array of shape (num_categories,size,size) representing the unitary vector between each location.distance_mat(np.array): a distance matrix is returned if it was not passed as a parameter (to avoid recomputing it), it is a 2d-array of shape (size,size) filled with the distance between each location.
The compute_unitary_direction_matrix_polar function computes the matrix of unitary vectors used to computed direction in the main functions, between a list of coordinates from polar coordinates on a sphere. by default it can be used for typical coordinates on earth.
def compute_unitary_direction_matrix_polar(
latitudes: np.array, longitudes: np.array,
distance_mat: np.array=None, radius: float=6378137, unit: str="deg"
)The compute_unitary_direction_matrix_polar function takes the following parameters :
latitudes(np.array): 1d-array of lengthsizewith the latitudes of each point.longitudes(np.array): 1d-array of lengthsizewith the longitudes of each point.
With the following parameters being optional :
radius(float): radius of the sphere (by default 6378137 which is the radius of the earth in meters).distance_mat(np.array): you can optionally pass a 2d-array of shape (size,size) filled with the distance between each location. If not passed it will be computed and returned.unit(str): a string to define the unit of the longitude and latituden, eather "rad", "deg" (default), "arcmin", or "arcsec".
It returns the following values :
unitary_direction_matrix(np.array): 3d-array of shape (num_categories,size,size) representing the unitary vector between each location.distance_mat(np.array): a distance matrix is returned if it was not passed as a parameter (to avoid recomputing it), it is a 2d-array of shape (size,size) filled with the distance between each location.
"oterogeneity" (c) by @jolatechno - Joseph Touzet
"oterogeneity" is licensed under a
Creative Commons Attribution 4.0 International License.
You should have received a copy of the license along with this
work. If not, see <https://creativecommons.org/licenses/by/4.0/>.