Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
510cba0
Merge branch 'main' into dev
gitronald Mar 12, 2026
9402162
version [prerelease]: 0.3.1a0
gitronald Mar 12, 2026
73373b1
add type hints, docstrings, and modern syntax across all modules
gitronald Mar 12, 2026
aa7f980
add ruff linting, dev dependencies, and format all files
gitronald Mar 12, 2026
7cd1153
add test suite with pytest and coverage
gitronald Mar 12, 2026
2e1c836
add github actions ci workflow for testing
gitronald Mar 12, 2026
e7afdbb
update ignores
gitronald Mar 12, 2026
426a7f5
drop future annotations, bump requires-python to 3.11
gitronald Mar 12, 2026
ffafb21
replace pandas and numpy with polars
gitronald Mar 12, 2026
795954b
migrate source code from pandas to polars
gitronald Mar 12, 2026
2a02a6a
update demo script to use polars
gitronald Mar 12, 2026
2650099
update tests for polars api
gitronald Mar 12, 2026
1696770
add integration tests with abortion tree fixture data
gitronald Mar 12, 2026
5eae685
update readme for new add_metanodes api
gitronald Mar 12, 2026
3a305bb
version [prerelease]: 0.3.1a1
gitronald Mar 12, 2026
2a02e6e
replace deprecated str.concat with str.join
gitronald Mar 13, 2026
be7b9e9
add dev dependencies for network plotting
gitronald Mar 13, 2026
0fe8cfa
add plot_network with igraph layout and adjustText
gitronald Mar 13, 2026
b95dcad
add plot_network tests
gitronald Mar 13, 2026
22ca1f0
fix multi-parent concatenation in add_parent_nodes
gitronald Mar 13, 2026
6e346e9
Merge branch 'update/pandas-to-polars' into feature/network-py-plots
gitronald Mar 13, 2026
1035277
Merge pull request #11 from gitronald/update/pandas-to-polars
gitronald Mar 13, 2026
f7b5a0b
add spacing and label_alpha params to plot_network
gitronald Mar 21, 2026
53f1d25
update readme with python plot, add plot generation script
gitronald Mar 21, 2026
02e45f8
update todo for completed tasks
gitronald Mar 21, 2026
a91a9b1
Merge branch 'feature/network-py-plots' into dev
gitronald Mar 21, 2026
5a4ba56
fix bugs in requester, set_edge_attributes, and parse_bing_qry
gitronald Mar 21, 2026
b182f85
fix demo script for mixed-type google data
gitronald Mar 21, 2026
cfe3436
version [patch]: 0.3.1
gitronald Mar 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Tests

on:
push:
branches: [dev, main]
pull_request:
branches: [dev, main]

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11", "3.12", "3.13", "3.14"]

steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v5

- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}

- name: Install dependencies
run: uv sync --all-groups --python ${{ matrix.python-version }}

- name: Run tests with coverage
run: uv run pytest -v --cov=suggests --cov-report=term-missing
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
*__pycache__
.claude
.coverage
.venv
archive
build
Expand Down
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ Reduce to new information obtained in suggestions. E.g. `abortion -> abortion la

```py
In [5]: edges = suggests.add_parent_nodes(edges)
In [6]: edges = edges.apply(suggests.add_metanodes, axis=1)
In [6]: edges = suggests.add_metanodes(edges)
In [7]: show_cols = ['source','target','grandparent','parent','source_add','target_add']
In [8]: edges[show_cols].head()
Out[9]:
Expand All @@ -172,9 +172,13 @@ Out[9]:
9 abortion laws 2019 abortion laws 2019 georgia NaN abortion laws 2019 georgia
```

Plotted in [Gephi](https://gephi.org/). The size of nodes corresponds to their PageRank, and node colors indicate communities that were determined using Gephi's default community detection algorithm, the Louvain method:
Plotted in [Gephi](https://gephi.org/) from an older dataset that is no longer available. The size of nodes corresponds to their PageRank, and node colors indicate communities that were determined using Gephi's default community detection algorithm, the Louvain method:

![Abortion Association Network](img/abortion_plot_pagerank.png?raw=true "Abortion Association Network")
![Abortion Association Network (Gephi)](img/abortion_plot_pagerank_gephi.png?raw=true "Abortion Association Network (Gephi)")

The same network can be generated programmatically with `plot_network()`, using the test fixture dataset (`tests/fixtures/abortion-20260312-122801-edges.csv`). Generated by `scripts/plot_abortion_tree.py`. Nodes represent unique search suggestions, and directed edges connect each suggestion to the suggestions it produced. Node sizes are proportional to squared degree (emphasizing highly connected hubs), and colors indicate communities detected using the Louvain method. Only nodes above the 98th percentile of PageRank are labeled, with font sizes scaled by degree. Layout uses igraph's Fruchterman-Reingold algorithm:

![Abortion Association Network (Python)](img/abortion_plot_pagerank_python.png?raw=true "Abortion Association Network (Python)")

## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Expand Down
9 changes: 9 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# TODO

- [ ] Fix UnboundLocalError in requester when request fails before response is assigned (no plan)
- [ ] Metanode token diff quality issues ([plan](.claude/plans/006-case-sensitive-token-diff.md))
- [x] Add network plot function to nets.py ([plan](.claude/plans/004-network-plot.md))
- [x] Migrate pandas to polars ([plan](.claude/plans/003-pandas-to-polars.md))
- [x] Fix metanode processing bugs ([plan](.claude/plans/005-fix-metanode-bugs.md))
- [x] Modernize project: docstrings, type hints, ruff, tests, CI ([plan](.claude/plans/002-modernize-project.md))
- [x] Add language parameter ([plan](.claude/plans/001-language-parameter.md))
File renamed without changes
Binary file added img/abortion_plot_pagerank_python.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 16 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
[project]
name = "suggests"
version = "0.3.0"
version = "0.3.1"
description = "Algorithm auditing tools for search engine autocomplete"
license = "MIT"
readme = "README.md"
authors = [{ name = "Ronald E. Robertson", email = "rer@acm.org" }]
keywords = ["suggestions", "autocomplete", "google", "bing"]
requires-python = ">=3.10"
keywords = ["suggestions", "autocomplete", "google", "bing", "search engine", "search queries"]
requires-python = ">=3.11"
dependencies = [
"requests>=2.28",
"pandas>=2.0",
"numpy>=2.0",
"polars>=1.0",
"beautifulsoup4>=4.11",
]

Expand All @@ -20,6 +19,18 @@ homepage = "http://github.com/gitronald/suggests"
[project.scripts]
demo = 'scripts.demo:main'

[dependency-groups]
dev = [
"adjusttext>=1.3.0",
"igraph>=1.0.0",
"matplotlib>=3.7",
"networkx>=3.0",
"pytest>=6.2",
"pytest-cov>=4.0",
"ruff>=0.15.4",
"scipy>=1.11",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
32 changes: 19 additions & 13 deletions scripts/demo.py
Original file line number Diff line number Diff line change
@@ -1,29 +1,35 @@
import json
"""Demo script for suggests package."""

import datetime
import json

import polars as pl

import suggests
import pandas as pd

def main():

def main() -> None:
crawl_id = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
get_suggests_tree_args = {
'root': 'dog',
'source': 'bing',
'max_depth': 1,
'crawl_id': crawl_id,
'save_to': f'./data/tests/suggests-{crawl_id}.json',
'sesh_headers': {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0'
}
"root": "dog",
"source": "bing",
"max_depth": 1,
"crawl_id": crawl_id,
"save_to": f"./data/tests/suggests-{crawl_id}.json",
"sesh_headers": {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0"
},
}
print(json.dumps(get_suggests_tree_args, indent=2))
tree = suggests.get_suggests_tree(**get_suggests_tree_args)
tree_df = pd.DataFrame(tree)
tree_df = pl.DataFrame(tree, strict=False)
print(f"\nSuggestion Tree: ({tree_df.shape[0]:,}, {tree_df.shape[1]})")
print(tree_df.head())

edges = suggests.to_edgelist(tree)
print(f"Suggestion Network Edges: ({edges.shape[0]:,}, {edges.shape[1]})")
print(edges.head())


if __name__ == "__main__":
main()
main()
33 changes: 33 additions & 0 deletions scripts/plot_abortion_tree.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/usr/bin/env python
"""Plot the abortion suggestion tree network from the test fixture."""

import sys
from pathlib import Path

import matplotlib
matplotlib.use("Agg")

import polars as pl
from suggests.nets import plot_network

FIXTURE_DIR = Path(__file__).parent.parent / "tests" / "fixtures"
EDGES_CSV = FIXTURE_DIR / "abortion-20260312-122801-edges.csv"
IMG_DIR = Path(__file__).parent.parent / "img"


def main(save_to: str = "") -> None:
edges = pl.read_csv(EDGES_CSV)
save_to = save_to or str(IMG_DIR / "abortion_plot_pagerank_python.png")
fig = plot_network(
edges,
root="abortion",
label_quantile=0.98,
label_alpha=0.7,
spacing=2.0,
save_to=save_to,
)
print(f"Saved to {save_to} ({edges.shape[0]} edges, {len(fig.axes[0].texts)} labels)")


if __name__ == "__main__":
main(save_to=sys.argv[1] if len(sys.argv) > 1 else "")
27 changes: 19 additions & 8 deletions suggests/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,22 @@
__version__ = "0.3.0"
"""Algorithm auditing tools for search engine autocomplete."""

from .suggests import get_suggests
from .suggests import get_suggests_tree
__version__ = "0.3.1"

from .parsing import parse_bing
from .parsing import parse_google
from .parsing import to_edgelist
from .parsing import (
add_metanodes,
add_parent_nodes,
parse_bing,
parse_google,
to_edgelist,
)
from .suggests import get_suggests, get_suggests_tree

from .parsing import add_parent_nodes
from .parsing import add_metanodes
__all__ = [
"add_metanodes",
"add_parent_nodes",
"get_suggests",
"get_suggests_tree",
"parse_bing",
"parse_google",
"to_edgelist",
]
123 changes: 69 additions & 54 deletions suggests/logger.py
Original file line number Diff line number Diff line change
@@ -1,79 +1,94 @@
""" Configure a logger using a dictionary
"""
"""Configure a logger using a dictionary."""

import logging
import logging.config

# Formatters: change what gets logged
minimal = '%(message)s'
detailed = '%(asctime)s | %(process)d | %(levelname)s | %(name)s | %(message)s '
formatters = {
'minimal': {'format': minimal},
'detailed': {'format': detailed}
}
minimal = "%(message)s"
detailed = "%(asctime)s | %(process)d | %(levelname)s | %(name)s | %(message)s "
formatters = {"minimal": {"format": minimal}, "detailed": {"format": detailed}}

class Logger(object):
""" Get logger and set console and file outputs

Ex:
```
from logger import Summary
log = Logger('summary.log').get_logger('mylogger')

```
class Logger:
"""Get logger and set console and file outputs.

Args:
file_name: Path for file logging output
file_format: Format type for file output ('minimal' or 'detailed')
file_mode: File open mode
console: Whether to enable console logging
console_format: Format type for console output ('minimal' or 'detailed')
console_level: Logging level for console output
"""
def __init__(self,
file_name='', file_format='detailed', file_mode='w',
console=True, console_format='detailed', console_level='DEBUG'):


def __init__(
self,
file_name: str = "",
file_format: str = "detailed",
file_mode: str = "w",
console: bool = True,
console_format: str = "detailed",
console_level: str = "DEBUG",
) -> None:
# Handlers: change file and console logging details
handlers = {}
handlers: dict[str, dict] = {}
if console:
assert console_format in formatters.keys(), \
f'Must select formatting type from {list(formatters.keys())}'
assert console_format in formatters, (
f"Must select formatting type from {list(formatters.keys())}"
)

handlers['console_handle'] = {
'class': 'logging.StreamHandler',
'level': 'DEBUG',
'formatter': console_format,
handlers["console_handle"] = {
"class": "logging.StreamHandler",
"level": "DEBUG",
"formatter": console_format,
}

if file_name:
assert type(file_name) is str, 'Must provide name for file logging'
assert file_format in formatters.keys(), \
f'Must select formatting type from {list(formatters.keys())}'
assert isinstance(file_name, str), "Must provide name for file logging"
assert file_format in formatters, (
f"Must select formatting type from {list(formatters.keys())}"
)

handlers['file_handle'] = {
'class': 'logging.FileHandler',
'level': 'INFO',
'formatter': file_format,
'filename': file_name,
'mode': file_mode
handlers["file_handle"] = {
"class": "logging.FileHandler",
"level": "INFO",
"formatter": file_format,
"filename": file_name,
"mode": file_mode,
}

# Loggers: change logging options for root and other packages
loggers = {
# Package logger (not root)
'suggests': {
'handlers': list(handlers.keys()),
'level': 'DEBUG',
'propagate': False
"suggests": {
"handlers": list(handlers.keys()),
"level": "DEBUG",
"propagate": False,
},
# External loggers
'requests': {'level': 'WARNING'},
'urllib3': {'level': 'WARNING'},
'matplotlib': {'level': 'WARNING'},
'chardet.charsetprober': {'level': 'INFO'},
'parso': {'level': 'INFO'} # Fix for ipython autocomplete bug
"requests": {"level": "WARNING"},
"urllib3": {"level": "WARNING"},
"matplotlib": {"level": "WARNING"},
"chardet.charsetprober": {"level": "INFO"},
"parso": {"level": "INFO"}, # Fix for ipython autocomplete bug
}

self.log_config = {
'version': 1,
'disable_existing_loggers': False,
'formatters': formatters,
'handlers': handlers,
'loggers': loggers
self.log_config = {
"version": 1,
"disable_existing_loggers": False,
"formatters": formatters,
"handlers": handlers,
"loggers": loggers,
}

def start(self, name="suggests"):

def start(self, name: str = "suggests") -> logging.Logger:
"""Initialize and return a named logger.

Args:
name: Logger name

Returns:
Configured logger instance
"""
logging.config.dictConfig(self.log_config)
return logging.getLogger(name)
Loading
Loading