From 0334ac51e2c3e65cf7c9484392afdcd5dd3b0555 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 24 Oct 2025 08:11:15 +0000 Subject: [PATCH 1/2] Initial plan From 9c96b3839db23c97d9ac82eccf91bce00a922f02 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 24 Oct 2025 08:19:20 +0000 Subject: [PATCH 2/2] Add NetworkX tutorials to exercise notebooks (ex-1-2, ex-3-4, ex-6) Co-authored-by: andreafailla <77196675+andreafailla@users.noreply.github.com> --- ex-1-2.ipynb | 171 +++++++++++++++++++++++++++++++++++++++- ex-3-4.ipynb | 214 ++++++++++++++++++++++++++++++++++++++++++++++++++- ex-6.ipynb | 146 ++++++++++++++++++++++++++++++++++- 3 files changed, 526 insertions(+), 5 deletions(-) diff --git a/ex-1-2.ipynb b/ex-1-2.ipynb index 40f2978..1cd9ecb 100644 --- a/ex-1-2.ipynb +++ b/ex-1-2.ipynb @@ -54,6 +54,82 @@ "# Part 1: Reading graphs from files / writing graphs to files + graph basics" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## NetworkX Tutorial: Graph I/O and Basics\n", + "\n", + "Before starting the exercises, let's review the key NetworkX concepts and functions you'll need.\n", + "\n", + "### Reading and Writing Graphs\n", + "\n", + "NetworkX supports various graph file formats:\n", + "\n", + "**Reading graphs:**\n", + "```python\n", + "# Read edge list formats\n", + "G = nx.read_edgelist('file.txt') # Simple edge list\n", + "G = nx.read_weighted_edgelist('file.txt') # With weights\n", + "\n", + "# Read other formats\n", + "G = nx.read_pajek('file.net') # Pajek format\n", + "G = nx.read_gml('file.gml') # GML format\n", + "G = nx.read_graphml('file.graphml') # GraphML format\n", + "```\n", + "\n", + "**Writing graphs:**\n", + "```python\n", + "nx.write_edgelist(G, 'output.txt')\n", + "nx.write_pajek(G, 'output.net')\n", + "nx.write_gml(G, 'output.gml')\n", + "```\n", + "\n", + "### Graph Types\n", + "\n", + "- `nx.Graph()` - Undirected graph\n", + "- `nx.DiGraph()` - Directed graph\n", + "- `nx.MultiGraph()` - Undirected multigraph (multiple edges between nodes)\n", + "- `nx.MultiDiGraph()` - Directed multigraph\n", + "\n", + "### Basic Graph Operations\n", + "\n", + "```python\n", + "# Graph information\n", + "G.number_of_nodes() # Count nodes\n", + "G.number_of_edges() # Count edges\n", + "G.nodes() # Get all nodes\n", + "G.edges() # Get all edges\n", + "\n", + "# Adding/removing nodes and edges\n", + "G.add_node(1)\n", + "G.add_edge(1, 2)\n", + "G.remove_node(1)\n", + "G.remove_edge(1, 2)\n", + "\n", + "# Checking existence\n", + "G.has_node(1)\n", + "G.has_edge(1, 2)\n", + "\n", + "# Graph comparison\n", + "nx.is_isomorphic(G1, G2) # Check if graphs are structurally identical\n", + "```\n", + "\n", + "### Directed vs Undirected Graphs\n", + "\n", + "When reading a graph as directed vs undirected:\n", + "- **Undirected**: Each edge (u,v) is bidirectional\n", + "- **Directed**: Each edge has a specific direction from u to v\n", + "- Converting directed to undirected typically doubles edge count (each directed edge becomes one undirected edge)\n", + "\n", + "**\ud83d\udcda References:**\n", + "- [NetworkX I/O Documentation](https://networkx.org/documentation/stable/reference/readwrite/index.html)\n", + "- [NetworkX Graph Types](https://networkx.org/documentation/stable/reference/classes/index.html)\n", + "- [NetworkX Tutorial](https://networkx.org/documentation/stable/tutorial.html)\n", + "\n", + "---" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -235,6 +311,99 @@ "# Part 2: Connected components, Giant Component & Subgraphs" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## NetworkX Tutorial: Connected Components and Subgraphs\n", + "\n", + "### Connected Components\n", + "\n", + "A connected component is a maximal set of nodes where every node is reachable from every other node.\n", + "\n", + "**For undirected graphs:**\n", + "```python\n", + "# Get all connected components\n", + "components = nx.connected_components(G) # Returns generator of sets\n", + "component_list = list(nx.connected_components(G))\n", + "\n", + "# Number of connected components\n", + "num_components = nx.number_connected_components(G)\n", + "\n", + "# Get the largest connected component (Giant Component)\n", + "largest_cc = max(nx.connected_components(G), key=len)\n", + "giant_component = G.subgraph(largest_cc).copy()\n", + "\n", + "# Check if graph is connected\n", + "nx.is_connected(G)\n", + "```\n", + "\n", + "**For directed graphs:**\n", + "```python\n", + "# Strongly connected components (path in both directions)\n", + "scc = nx.strongly_connected_components(G)\n", + "\n", + "# Weakly connected components (treating edges as undirected)\n", + "wcc = nx.weakly_connected_components(G)\n", + "```\n", + "\n", + "### Subgraphs\n", + "\n", + "```python\n", + "# Create a subgraph from a set of nodes\n", + "nodes_subset = [1, 2, 3, 4, 5]\n", + "subG = G.subgraph(nodes_subset) # Returns a view\n", + "subG_copy = G.subgraph(nodes_subset).copy() # Create independent copy\n", + "\n", + "# Induced subgraph (includes all edges between nodes in subset)\n", + "induced_subG = G.subgraph(nodes_subset)\n", + "\n", + "# Edge-based subgraph\n", + "edges_subset = [(1,2), (2,3)]\n", + "edge_subG = G.edge_subgraph(edges_subset)\n", + "```\n", + "\n", + "### Giant Component\n", + "\n", + "The giant component is the largest connected component in a graph. It's particularly important in network analysis:\n", + "\n", + "```python\n", + "# Extract giant component\n", + "largest_cc = max(nx.connected_components(G), key=len)\n", + "giant = G.subgraph(largest_cc).copy()\n", + "\n", + "# Size comparison\n", + "print(f\"Original graph: {G.number_of_nodes()} nodes\")\n", + "print(f\"Giant component: {giant.number_of_nodes()} nodes\")\n", + "print(f\"Fraction in giant: {giant.number_of_nodes()/G.number_of_nodes():.2%}\")\n", + "```\n", + "\n", + "### Network Resilience\n", + "\n", + "Studying how networks respond to node/edge removal:\n", + "\n", + "```python\n", + "# Random failures - remove random nodes\n", + "import random\n", + "nodes_to_remove = random.sample(list(G.nodes()), k=10)\n", + "G_failed = G.copy()\n", + "G_failed.remove_nodes_from(nodes_to_remove)\n", + "\n", + "# Targeted attacks - remove high-degree nodes\n", + "degree_sorted = sorted(G.degree(), key=lambda x: x[1], reverse=True)\n", + "top_nodes = [node for node, deg in degree_sorted[:10]]\n", + "G_attacked = G.copy()\n", + "G_attacked.remove_nodes_from(top_nodes)\n", + "```\n", + "\n", + "**\ud83d\udcda References:**\n", + "- [NetworkX Components](https://networkx.org/documentation/stable/reference/algorithms/component.html)\n", + "- [NetworkX Subgraphs](https://networkx.org/documentation/stable/reference/classes/generated/networkx.Graph.subgraph.html)\n", + "- [Network Resilience Analysis](https://networkx.org/documentation/stable/reference/algorithms/connectivity.html)\n", + "\n", + "---" + ] + }, { "cell_type": "markdown", "id": "94a45b3e", @@ -377,4 +546,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/ex-3-4.ipynb b/ex-3-4.ipynb index 2eca1c0..3cd45b3 100644 --- a/ex-3-4.ipynb +++ b/ex-3-4.ipynb @@ -48,6 +48,102 @@ "# Part 3. Plotting graphs" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## NetworkX Tutorial: Graph Visualization\n", + "\n", + "NetworkX provides several built-in layout algorithms and drawing functions to visualize graphs.\n", + "\n", + "### Basic Graph Drawing\n", + "\n", + "```python\n", + "import matplotlib.pyplot as plt\n", + "import networkx as nx\n", + "\n", + "# Simple drawing\n", + "nx.draw(G)\n", + "plt.show()\n", + "\n", + "# Drawing with labels\n", + "nx.draw(G, with_labels=True)\n", + "plt.show()\n", + "```\n", + "\n", + "### Layout Algorithms\n", + "\n", + "Different layouts position nodes differently in 2D space:\n", + "\n", + "```python\n", + "# Common layouts\n", + "pos = nx.spring_layout(G) # Force-directed (default)\n", + "pos = nx.circular_layout(G) # Nodes in a circle\n", + "pos = nx.random_layout(G) # Random positions\n", + "pos = nx.shell_layout(G) # Concentric circles\n", + "pos = nx.spectral_layout(G) # Based on graph spectrum\n", + "pos = nx.kamada_kawai_layout(G) # Force-directed variant\n", + "\n", + "# Use a layout\n", + "nx.draw(G, pos=pos, with_labels=True)\n", + "plt.show()\n", + "```\n", + "\n", + "### Customizing Visualizations\n", + "\n", + "```python\n", + "# Customize node appearance\n", + "nx.draw(G, \n", + " pos=pos,\n", + " node_color='lightblue', # Node color\n", + " node_size=500, # Node size\n", + " node_shape='o', # Node shape (o=circle, s=square, etc.)\n", + " with_labels=True, # Show node labels\n", + " font_size=10, # Label font size\n", + " font_color='black', # Label color\n", + " font_weight='bold') # Label weight\n", + "\n", + "# Customize edge appearance\n", + "nx.draw(G,\n", + " pos=pos,\n", + " edge_color='gray', # Edge color\n", + " width=2, # Edge width\n", + " style='solid') # Edge style (solid, dashed, dotted)\n", + "\n", + "# Color nodes by property (e.g., degree)\n", + "node_colors = [G.degree(n) for n in G.nodes()]\n", + "nx.draw(G, pos=pos, node_color=node_colors, cmap=plt.cm.Blues, with_labels=True)\n", + "plt.colorbar()\n", + "```\n", + "\n", + "### Drawing Components Separately\n", + "\n", + "```python\n", + "# Draw nodes\n", + "nx.draw_networkx_nodes(G, pos, node_color='lightblue', node_size=500)\n", + "\n", + "# Draw edges\n", + "nx.draw_networkx_edges(G, pos, edge_color='gray', width=2)\n", + "\n", + "# Draw labels\n", + "nx.draw_networkx_labels(G, pos, font_size=10)\n", + "\n", + "# Draw edge labels (useful for weights)\n", + "edge_labels = nx.get_edge_attributes(G, 'weight')\n", + "nx.draw_networkx_edge_labels(G, pos, edge_labels)\n", + "\n", + "plt.axis('off') # Hide axes\n", + "plt.show()\n", + "```\n", + "\n", + "**\ud83d\udcda References:**\n", + "- [NetworkX Drawing Documentation](https://networkx.org/documentation/stable/reference/drawing.html)\n", + "- [NetworkX Layout Algorithms](https://networkx.org/documentation/stable/reference/generated/networkx.drawing.layout.spring_layout.html)\n", + "- [Matplotlib Color Maps](https://matplotlib.org/stable/tutorials/colors/colormaps.html)\n", + "\n", + "---" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -75,6 +171,118 @@ "# Part 4. Degree Analysis" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## NetworkX Tutorial: Degree Analysis\n", + "\n", + "The degree of a node is the number of edges connected to it. Degree analysis is fundamental in network science.\n", + "\n", + "### Basic Degree Functions\n", + "\n", + "```python\n", + "# Get degree of a single node\n", + "degree = G.degree(node)\n", + "\n", + "# Get degrees of all nodes (returns DegreeView)\n", + "degrees = G.degree() # Returns list of (node, degree) tuples\n", + "degree_dict = dict(G.degree()) # Convert to dictionary\n", + "\n", + "# For directed graphs\n", + "in_degree = G.in_degree() # Number of incoming edges\n", + "out_degree = G.out_degree() # Number of outgoing edges\n", + "```\n", + "\n", + "### Degree Distribution\n", + "\n", + "The degree distribution P(k) is the probability that a randomly selected node has degree k:\n", + "\n", + "```python\n", + "from collections import Counter\n", + "import numpy as np\n", + "\n", + "# Calculate degree distribution\n", + "degrees = [d for n, d in G.degree()]\n", + "degree_count = Counter(degrees)\n", + "\n", + "# Normalize to get probabilities\n", + "total_nodes = G.number_of_nodes()\n", + "degree_dist = {k: count/total_nodes for k, count in degree_count.items()}\n", + "\n", + "# Plot degree distribution\n", + "import matplotlib.pyplot as plt\n", + "plt.bar(degree_dist.keys(), degree_dist.values())\n", + "plt.xlabel('Degree')\n", + "plt.ylabel('Probability')\n", + "plt.title('Degree Distribution')\n", + "plt.show()\n", + "\n", + "# Log-log plot (useful for power-law distributions)\n", + "plt.loglog(degree_dist.keys(), degree_dist.values(), 'o')\n", + "plt.xlabel('Degree (log scale)')\n", + "plt.ylabel('Probability (log scale)')\n", + "plt.show()\n", + "```\n", + "\n", + "### Degree Statistics\n", + "\n", + "```python\n", + "degrees = [d for n, d in G.degree()]\n", + "\n", + "# Basic statistics\n", + "avg_degree = np.mean(degrees)\n", + "median_degree = np.median(degrees)\n", + "max_degree = max(degrees)\n", + "min_degree = min(degrees)\n", + "\n", + "# Find nodes with specific degree properties\n", + "max_degree_node = max(G.degree(), key=lambda x: x[1])[0]\n", + "high_degree_nodes = [n for n, d in G.degree() if d > threshold]\n", + "```\n", + "\n", + "### The Friendship Paradox\n", + "\n", + "On average, your friends have more friends than you do. This happens because high-degree nodes are more likely to be friends with others:\n", + "\n", + "```python\n", + "# Average degree of a node\n", + "avg_degree = np.mean([G.degree(n) for n in G.nodes()])\n", + "\n", + "# Average degree of neighbors\n", + "avg_neighbor_degree = nx.average_neighbor_degree(G)\n", + "avg_of_neighbor_degrees = np.mean(list(avg_neighbor_degree.values()))\n", + "\n", + "print(f\"Average degree: {avg_degree:.2f}\")\n", + "print(f\"Average neighbor degree: {avg_of_neighbor_degrees:.2f}\")\n", + "# Typically: avg_of_neighbor_degrees > avg_degree\n", + "```\n", + "\n", + "### Rich Club Coefficient\n", + "\n", + "Measures the tendency of high-degree nodes to connect to each other:\n", + "\n", + "```python\n", + "# Calculate rich club coefficient\n", + "rich_club = nx.rich_club_coefficient(G, normalized=False)\n", + "\n", + "# Plot rich club coefficient vs degree\n", + "degrees = sorted(rich_club.keys())\n", + "coeffs = [rich_club[k] for k in degrees]\n", + "plt.plot(degrees, coeffs)\n", + "plt.xlabel('Degree k')\n", + "plt.ylabel('Rich Club Coefficient \u03c6(k)')\n", + "plt.show()\n", + "```\n", + "\n", + "**\ud83d\udcda References:**\n", + "- [NetworkX Degree Functions](https://networkx.org/documentation/stable/reference/classes/generated/networkx.Graph.degree.html)\n", + "- [NetworkX Assortativity](https://networkx.org/documentation/stable/reference/algorithms/assortativity.html)\n", + "- [Friendship Paradox Paper](https://www.jstor.org/stable/2781907)\n", + "\n", + "---" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -86,7 +294,7 @@ "- **Facebook Network** (undirected)\n", "- **Zachary's Karate Club**: (nx.karate_club_graph)\n", "- **Florentine Families**: Historical marriage ties between powerful families in Renaissance Florence. (nx.florentine_families_graph)\n", - "- **Les Misérables**: Co-occurrence network of characters in Victor Hugo's novel *Les Misérables*. (nx.les_miserables_graph)\n", + "- **Les Mis\u00e9rables**: Co-occurrence network of characters in Victor Hugo's novel *Les Mis\u00e9rables*. (nx.les_miserables_graph)\n", "\n", "Steps:\n", "1. Extract the degree of each node in the network.\n", @@ -109,7 +317,7 @@ "\n", "## Exercise 3: the Friendship Paradox \n", "\n", - "Friendship Paradox: “on average, your friends have more friends than you do” -> :(\n", + "Friendship Paradox: \u201con average, your friends have more friends than you do\u201d -> :(\n", "\n", "Compare the average degree of all nodes to the average degree of their neighbors.\n", "\n", @@ -224,4 +432,4 @@ }, "nbformat": 4, "nbformat_minor": 2 -} +} \ No newline at end of file diff --git a/ex-6.ipynb b/ex-6.ipynb index 3d4caaf..4a64b2d 100644 --- a/ex-6.ipynb +++ b/ex-6.ipynb @@ -56,6 +56,150 @@ "# Part 6. Assortativity, clustering, centrality, path length" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## NetworkX Tutorial: Assortativity, Clustering, Centrality, and Path Length\n", + "\n", + "This section covers advanced network analysis metrics that characterize the structure and organization of networks.\n", + "\n", + "### Assortativity\n", + "\n", + "Assortativity measures the tendency of nodes to connect to similar nodes. Degree assortativity measures whether high-degree nodes tend to connect to other high-degree nodes:\n", + "\n", + "```python\n", + "# Degree assortativity coefficient (-1 to 1)\n", + "# Positive: high-degree nodes connect to high-degree nodes\n", + "# Negative: high-degree nodes connect to low-degree nodes\n", + "# Zero: no correlation\n", + "assortativity = nx.degree_assortativity_coefficient(G)\n", + "\n", + "# Attribute assortativity (e.g., by node attribute)\n", + "attr_assortativity = nx.attribute_assortativity_coefficient(G, 'attribute_name')\n", + "\n", + "# Numeric attribute assortativity\n", + "numeric_assortativity = nx.numeric_assortativity_coefficient(G, 'numeric_attribute')\n", + "```\n", + "\n", + "### Clustering Coefficient\n", + "\n", + "Measures the degree to which nodes cluster together. High clustering means \"friends of friends are friends\":\n", + "\n", + "```python\n", + "# Local clustering coefficient (for a single node)\n", + "local_clustering = nx.clustering(G, node)\n", + "\n", + "# Clustering coefficients for all nodes\n", + "clustering_dict = nx.clustering(G)\n", + "\n", + "# Average clustering coefficient (global measure)\n", + "avg_clustering = nx.average_clustering(G)\n", + "\n", + "# Transitivity (alternative global clustering measure)\n", + "transitivity = nx.transitivity(G)\n", + "```\n", + "\n", + "The difference:\n", + "- **Average clustering**: average of local clustering coefficients\n", + "- **Transitivity**: ratio of triangles to connected triples (gives more weight to high-degree nodes)\n", + "\n", + "### Centrality Measures\n", + "\n", + "Centrality identifies the most important nodes in a network. Different measures capture different notions of importance:\n", + "\n", + "```python\n", + "# Degree Centrality: number of connections\n", + "degree_cent = nx.degree_centrality(G)\n", + "\n", + "# Betweenness Centrality: number of shortest paths passing through node\n", + "betweenness_cent = nx.betweenness_centrality(G)\n", + "\n", + "# Closeness Centrality: inverse of average distance to other nodes\n", + "closeness_cent = nx.closeness_centrality(G)\n", + "\n", + "# Eigenvector Centrality: importance based on importance of neighbors\n", + "eigenvector_cent = nx.eigenvector_centrality(G)\n", + "\n", + "# PageRank: Google's algorithm (variant of eigenvector centrality)\n", + "pagerank = nx.pagerank(G)\n", + "\n", + "# Katz Centrality: considers all paths, not just shortest\n", + "katz_cent = nx.katz_centrality(G)\n", + "\n", + "# Find most central nodes\n", + "top_degree = sorted(degree_cent.items(), key=lambda x: x[1], reverse=True)[:5]\n", + "top_betweenness = sorted(betweenness_cent.items(), key=lambda x: x[1], reverse=True)[:5]\n", + "```\n", + "\n", + "**Which centrality to use?**\n", + "- **Degree**: local influence, direct connections\n", + "- **Betweenness**: control of information flow, bridges\n", + "- **Closeness**: speed of information spread\n", + "- **Eigenvector/PageRank**: influenced by important neighbors\n", + "\n", + "### Path Length\n", + "\n", + "Path length measures distance between nodes:\n", + "\n", + "```python\n", + "# Shortest path between two nodes\n", + "path = nx.shortest_path(G, source, target)\n", + "path_length = nx.shortest_path_length(G, source, target)\n", + "\n", + "# All shortest paths from a source\n", + "lengths = nx.single_source_shortest_path_length(G, source)\n", + "\n", + "# Average shortest path length (only for connected graphs)\n", + "if nx.is_connected(G):\n", + " avg_path_length = nx.average_shortest_path_length(G)\n", + "\n", + "# Diameter: maximum shortest path length\n", + "if nx.is_connected(G):\n", + " diameter = nx.diameter(G)\n", + "\n", + "# Eccentricity: maximum distance from a node to all others\n", + "eccentricity = nx.eccentricity(G)\n", + "\n", + "# Radius: minimum eccentricity\n", + "if nx.is_connected(G):\n", + " radius = nx.radius(G)\n", + "```\n", + "\n", + "### Homophily and Modularity\n", + "\n", + "Homophily is the tendency of similar nodes to be connected:\n", + "\n", + "```python\n", + "# Local homophily: fraction of neighbors with same attribute\n", + "def local_homophily(G, node, attribute):\n", + " node_attr = G.nodes[node][attribute]\n", + " neighbors = list(G.neighbors(node))\n", + " if not neighbors:\n", + " return 0\n", + " same_attr = sum(1 for n in neighbors if G.nodes[n][attribute] == node_attr)\n", + " return same_attr / len(neighbors)\n", + "\n", + "# Global homophily: across entire network\n", + "def global_homophily(G, attribute):\n", + " total_edges = 0\n", + " same_attr_edges = 0\n", + " for u, v in G.edges():\n", + " total_edges += 1\n", + " if G.nodes[u][attribute] == G.nodes[v][attribute]:\n", + " same_attr_edges += 1\n", + " return same_attr_edges / total_edges if total_edges > 0 else 0\n", + "```\n", + "\n", + "**\ud83d\udcda References:**\n", + "- [NetworkX Assortativity](https://networkx.org/documentation/stable/reference/algorithms/assortativity.html)\n", + "- [NetworkX Clustering](https://networkx.org/documentation/stable/reference/algorithms/clustering.html)\n", + "- [NetworkX Centrality](https://networkx.org/documentation/stable/reference/algorithms/centrality.html)\n", + "- [NetworkX Shortest Paths](https://networkx.org/documentation/stable/reference/algorithms/shortest_paths.html)\n", + "\n", + "---" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -276,4 +420,4 @@ }, "nbformat": 4, "nbformat_minor": 2 -} +} \ No newline at end of file