Austin Landlord Mapper

A data pipeline and interactive web application that identifies and maps residential landlords in Austin, TX—with a focus on corporate and "financialized" ownership—by integrating Travis County Appraisal District (TCAD) property records, Austin Open Data, Texas Comptroller business filings, and US Census socioeconomic data.

Live app: https://ontheseams.shinyapps.io/landlord_mapper_app/

Additional project materials: https://drive.google.com/drive/folders/1e2Ahq9sNNQ2K_Q-RuTrdkWzDL6FH_gAa?usp=sharing

What this repository does

The pipeline answers the question: who owns residential rental property in Austin, and how are those owners connected to one another? It produces a geocoded, enriched dataset that can be explored through an interactive Shiny application allowing users to:

Search for a property by address, owner name, corporate name, or ownership group ID
See all properties held by a single owner or corporate family on a map
Explore a network graph of landlord connections via shared ownership
Download filtered tabular data for further analysis

Repository structure

File / Folder	Language	Purpose
`_targets.R`	R	Pipeline orchestration (via the `targets` package). Defines every step as a reproducible target. Run `targets::tar_make()` to execute the full pipeline.
`TCAD_parse.py`	Python	Parses the large TCAD JSON export (a ZIP file with hundreds of thousands of records) into flat CSV files using streaming JSON parsing (`ijson`).
`target_helper_functions.R`	R	Core data-merging and property classification. Joins all CSV files into a single parcel dataset, derives owner-occupancy and financialization flags, and counts property units.
`scrape_helper_functions.R`	R	Downloads the TCAD special export from traviscad.org; scrapes Austin Open Data (code complaints) via headless Chrome + Selenium; queries the Texas Comptroller Franchise Tax API for corporate details.
`supplementary_scrape_helper_functions.R`	R	Downloads Census ACS data (used to build a Social Vulnerability Index), generates per-property owner "fingerprint" strings, computes pairwise cosine string-distance matrices, clusters properties into ownership groups, and geocodes parcel addresses via the Census geocoder.
`final_output_helper_functions.R`	R	Merges the geocoded parcel dataset with the Housing Hardship Index and SVI data to produce the final `owners_info_total` dataset consumed by the Shiny app.
`austin_parcel_land_transactions.R`	R	Standalone tri-county script that produces an all-parcel Austin parcel-year file with current land value, deed/sale counts by year, and corporate-like transaction-party indicators where source records support them.
`HHI Data 2024 United States.xlsx`	Data	Housing Hardship Index scores and ranks for US ZIP codes (2024).
`shinyApp/app.R`	R	Interactive Shiny dashboard for exploring the final dataset: property map (Leaflet), owner table (DT), and landlord network graph (networkD3).

Pipeline walkthrough

The pipeline is managed by the targets R package, which tracks dependencies between steps and re-runs only what has changed.

Step 1 – Download TCAD data (`tcad_data`)

download_tcad_austin() scrapes traviscad.org/publicinformation for the latest "Special Export (JSON)" ZIP file and downloads it if it is new.

Step 2 – Parse TCAD data (`tcad_parse`)

TCAD_parseYear() (Python, called via reticulate) streams through the TCAD JSON with ijson and writes five flat CSV files:

Output file	Contents
`austin_propertyChar_data.csv`	Zoning codes per parcel
`austin_propertyProf_data.csv`	Improvement/land state codes, area, stories, year built
`austin_situs_data.csv`	Property street address components
`austin_owner_data.csv`	Owner name, mailing address, exemptions, ownership percentage
`austin_deeds_data.csv`	Deed history: buyer/seller names and dates

Step 3 – Merge and classify parcels (`austin_parcel_data_merged`)

target_property_gen() joins the five CSV files on parcel ID (situs_pID), then:

Filters to residential parcels (state codes A*/B* or SF/MF zoning).
Estimates unit count from improvement state codes and total floor area.
Flags owner-occupancy by comparing the owner's mailing address to the situs address, or by checking for a homestead exemption (HS).
Flags financialization by matching owner names against a list of corporate entity markers (LLC, LP, LTD, INC, etc.) and real-estate-sector keywords (INVEST, MANAGE, REALT, HOLDING, …).
Derives is_target: non-owner-occupied residential parcels held by a financialized entity.
Derives is_mom_and_pop: owner-occupied residential parcels held by a non-financialized entity.

Step 4 – Append code-complaint counts (`austin_parcel_data_merged_code`)

code_compl_merge() downloads the Austin Code Complaint dataset from Austin Open Data via headless Selenium and counts complaints per parcel address (parallelised with doFuture).

Step 5 – Scrape Texas Comptroller data (`austin_parcel_data_merged_owner`)

owner_scrape_actual() looks up each financialized owner's name in the Texas Comptroller Franchise Tax API, retrieving:

Legal business name, Texas Taxpayer Number (TTN), mailing address
State of formation, SOS registration status
Registered agent name and address
Officer / owner names and addresses

Step 6 – Cluster properties into ownership groups (`situs_group_assignments`)

situs_owner_string_gen() builds a composite "fingerprint" string for each parcel that concatenates all known names and addresses (owner, corporation, registered agent, scraped officer names). situs_owner_string_dist_matrix() then computes a pairwise cosine string-distance matrix over these fingerprints for all target/large-building parcels. situs_neighor_gen() translates close distances plus exact-match links (shared owner name, mailing address, corporate name, registered agent) into connected ownership groups, iterating until transitive closure is reached.

Step 7 – Geocode parcels (`owners_info_total`)

parcel_geolocate() sends unique situs addresses to the US Census geocoder (in batches of 10,000) and joins back latitude/longitude and census block/tract/ZCTA geography identifiers.

Step 8 – Enrich with socioeconomic context (`owners_data_total_supp`)

final_data_merge() joins the parcel dataset with:

Housing Hardship Index (HHI Data 2024 United States.xlsx) – overall score and rank by ZIP code.
Social Vulnerability Index built from Census ACS 5-year estimates (via censusapi): poverty, education, unemployment, housing cost burden, insurance coverage, age, disability, minority population, housing type, vehicle access, and limited English proficiency, organised into four SVI themes following the CDC/ATSDR methodology.

Step 9 – Shiny application (`shinyApp/app.R`)

Reads owners_info_total.csv and owners_info_d3graph.rds and provides three dashboard panels:

Property Search – Leaflet map with cosine-matched address search; table with full parcel/owner/corporate metadata; CSV download.
Landlord Network Analysis – Force-directed network graph (networkD3) showing properties connected by common ownership.
Tenant Stress (planned) – Heatmap of rent burden and evictions.
Property Quality (planned) – Heatmap of code violations, fines, and building age.

Setup and usage

Prerequisites

R packages (installed automatically by renv if a lockfile is present, otherwise install manually):

install.packages(c(
  "targets", "tarchetypes", "crew",
  "tibble", "dplyr", "purrr", "readr", "lubridate",
  "rvest", "selenider", "selenium", "httr", "httr2",
  "reticulate", "stringdist", "tidygeocoder",
  "censusapi", "acs", "tidycensus",
  "forecast", "xgboost", "doFuture", "doRNG", "foreach", "future",
  "qs2", "readxl",
  # Shiny app
  "shiny", "shinydashboard", "leaflet", "DT", "plotly",
  "networkD3", "igraph", "sp", "stringi", "listviewer"
))

Python packages (used via reticulate):

pip install pandas numpy ijson

ijson requires the yajl2_c backend. Install the yajl C library for your OS (e.g., brew install yajl on macOS or apt install libyajl-dev on Ubuntu).

Other tools:

Google Chrome + ChromeDriver (for headless Selenium scraping)
A Census API key

Configuration

Open _targets.R and replace YOUR OWN CENSUS API KEY GOES HERE with your Census API key:
```
Sys.setenv(CENSUS_KEY = "your_key_here")
```
If Chrome downloads files to a non-default location, update download_location in supplementary_scrape_helper_functions.R → austin_open_data_dl().

Running the pipeline

library(targets)
tar_make()        # run the full pipeline
tar_visnetwork()  # inspect the dependency graph
tar_read(owners_data_total_supp)  # inspect the final output

Intermediate files (CSV, RDS) are cached in the working directory and in the _targets/ store. Only targets whose upstream inputs have changed will be re-run.

Running the Shiny app locally

shiny::runApp("shinyApp")

The app expects owners_info_total.csv and owners_info_d3graph.rds to be present inside shinyApp/. Copy or symlink these files from the pipeline output directory before launching.

Key derived fields in the output dataset

Field	Description
`is_residential`	Property has a residential improvement or zoning code
`is_owner_occupied`	Owner mailing address matches situs address, or homestead exemption present
`is_financialized`	Owner name contains a corporate entity marker or real-estate keyword
`is_target`	Non-owner-occupied + financialized + residential
`is_mom_and_pop`	Owner-occupied + non-financialized + residential
`property_units`	Estimated number of housing units (derived from state codes and floor area)
`is_owner_out_of_state`	Owner's mailing state differs from property state
`group_assign`	Numeric group ID linking properties to a common ownership cluster
`veneer_owner`	Shell entity name on the TCAD record
`corp_business_name`	Legal business name from Texas Comptroller
`corp_TTN`	Texas Taxpayer Number
`corp_registered_agent_name`	Registered agent name
`situs_lat` / `situs_long`	Geocoded coordinates (Census geocoder)
`HHI_score` / `HHI_rank`	Housing Hardship Index for the parcel's ZIP code
`rpl_themes`	Overall Social Vulnerability Index percentile rank

Standalone script: `standalone_corporate_parcels.R`

For users who want to produce a filtered dataset of likely corporate-owned parcels without running the full targets pipeline, standalone_corporate_parcels.R provides a self-contained alternative. It requires only R, jq (a command-line JSON processor), and the TCAD Special Export ZIP file.

TCAD Special Export structure

The TCAD Special Export is a large JSON file (several GB uncompressed) distributed as a ZIP archive from traviscad.org/publicinformation. It contains a top-level JSON array where each element represents a single tax account (parcel). Each parcel object has:

Top-level scalar fields — pID (parcel ID), propType, inactive, geometry (see below), and others.
Nested arrays — owners, situses, propertyProfile, propertyCharacteristics, deeds, taxingunits, valuations, sales, permits, appeals, and more.

Because the file is too large to load into memory wholesale, the script uses a streaming approach: jq is invoked via shell pipelines to expand each nested section into newline-delimited JSON (NDJSON), which jsonlite::stream_in() then reads page-by-page (PAGE_SIZE rows at a time). Peak memory usage is roughly proportional to a single page, not the full file.

Parcel geometry and coordinates

Each parcel object contains a geometry field holding a JSON-encoded string of the form "[lat, lon]" — for example, "[30.2545186553, -97.7620645363]". This must be parsed with jq's fromjson filter before the coordinate values can be extracted:

(try (.geometry | fromjson) catch null) as $g
| {
    pID: .pID,
    lat: (if ($g|type)=="array" then ($g[0] // null) else null end),
    lon: (if ($g|type)=="array" then ($g[1] // null) else null end)
  }

Many parcel records have geometry = "[null, null]". These missing records are not simply bad rows: they are often condominium units, apartment-style accounts, multi-parcel ownership records, utility/common-area records, or other tax accounts where TCAD represents the taxable account separately from the physical parcel geometry. In other words, the JSON is closer to an appraisal-account export than a clean one-feature-per-polygon GIS layer.

The standalone script therefore uses a staged coordinate workflow. Each stage only fills records that are still missing coordinates, and the final output includes coord_source so approximate fills can be audited or filtered out later:

JSON geometry: parse top-level geometry into lat and lon.
JSON geoID siblings: propagate coordinates among JSON records sharing propertyIdentification[0].geoID.
JSON links.linkedPID: use TCAD-linked account relationships when a linked parcel already has coordinates.
JSON repeated situs address: when multiple JSON records share the same situs street number, street name, and ZIP, use the median known coordinate for that address.
Parcel polygon exact ID fallback: if data/Parcel_poly.zip exists, join coords_df$geoID to Parcel_poly$PID_10, compute a representative point with st_point_on_surface(), and fill from that polygon point.
Address point exact fallback: if data/Addresses.zip exists, join by exact street number, normalized street name, normalized street type, and ZIP. Blank/missing street suffixes are allowed to match each other.
Unique missing-ZIP address fallback: for remaining records with street number/name/type but no ZIP, fill only when that address key maps to exactly one address point in Addresses.zip.
Address-point parcel-ID fallback: use Addresses.zip parcel identifiers (PID / PARCEL_ID) only when they match JSON geoID and the address point also agrees with the TCAD situs street number and street name.
Nearest address point fallback: for remaining numeric addresses, use nearby address points on the same street, street type, and ZIP within ADDRESS_NEAREST_MAX_DELTA house numbers. If the missing address falls between two known points, the script uses the median of those two coordinates; otherwise it uses the nearest point.

The JSON-native stages are preferred because they do not require any external GIS file. The parcel polygon fallback is the next safest option because it uses an identifier match (geoID to PID_10). Address-point matching is looser, so the script moves from exact address matches to carefully constrained missing-ZIP and parcel-ID matches before using nearest-address interpolation. Nearest-address matching is explicitly approximate, but it is useful for records where the physical location is clear from neighboring address points even though TCAD did not publish a direct geometry for the account.

For example, the geoID sibling fill is:

geoid_lookup <- coords_df |>
  filter(!is.na(lat), !is.na(lon), !is.na(geoID)) |>
  group_by(geoID) |>
  summarise(
    lat_fill = median(lat),
    lon_fill = median(lon),
    .groups = "drop"
  )

coords_df <- coords_df |>
  left_join(geoid_lookup, by = "geoID") |>
  mutate(
    lat = dplyr::coalesce(lat, lat_fill),
    lon = dplyr::coalesce(lon, lon_fill)
  ) |>
  select(-lat_fill, -lon_fill)

The median is used as a conservative representative point when several records match the same key. In most cases there is only one coordinate, so the median is identical to the source value. When there are several nearby points, the median is less sensitive to an outlier than a mean.

The script writes output/coords_summary.csv and output/missing_coord_diagnostics.csv during development runs. These files show how many records each stage recovered and why any remaining records could not be matched. Remaining unmatched records usually fall into one of two categories:

They have no usable geoID, so neither JSON sibling matching nor parcel polygon matching can work.
They have a geoID, but that ID is not present in Parcel_poly$PID_10, which suggests the record is an account/unit/common-area representation rather than a standalone polygon feature in the parcel layer.

Because some remaining records are residential and can be corporate-owned, they should not be dismissed as harmless junk by default. They can be excluded from spatial filtering only with the understanding that the mapped result may undercount some condo, multifamily, or account-level records.

Address aliases and external geocoding audit trail

After the local TCAD/Travis County GIS coordinate recovery stages, a meaningful number of records can still have complete-looking situs addresses but no coordinates. The remaining clusters are not all the same kind of problem. Some are straightforward street-name convention mismatches between TCAD and Addresses.zip; others are true missing address-point coverage, missing ZIPs, new subdivision streets, or bad/ambiguous address strings.

To keep this auditable, the standalone workflow uses two additional data artifacts rather than hard-coding one-off fixes in the script:

File	Purpose
`data/address_aliases.csv`	Manual, high-confidence street/address aliases used before address matching. Examples include ordinal street expansion (`12 ST` to `12TH ST`), legacy street spelling (`MENCHACA RD` to `MANCHACA RD`), route normalization (`RANCH RD 2222` to `2222 RD`), and selected type splits (`NORTH PLAZA` to `NORTH PLZ`).
`output/missing_coords_geocodable_clusters.csv`	Groups still-missing, structurally geocodable addresses by street name, suffix, and ZIP. This is the main review file for deciding whether additional aliases are safe.
`output/missing_coords_geocodable_addresses.csv`	Row-level still-missing geocodable addresses, used to build external geocoding requests.
`output/geocoding_candidates.csv`	Unique address strings prepared for external geocoding. Multiple parcel records at the same address are collapsed to one query with `n_records`.
`output/geocoding_results_arcgis.csv`	Raw ArcGIS Pro geocoding export. This is not merged directly because it can contain low-quality or out-of-area matches.
`output/geocoding_arcgis_accepted_queries.csv`	Query-level ArcGIS results that passed the acceptance rules.
`output/geocoding_arcgis_review_queries.csv`	Query-level ArcGIS results rejected or held for manual review. This file is useful for spotting errors such as matches in Houston, Dallas, the Northeast, or `(0, 0)` coordinates.
`output/geocoding_arcgis_accepted_lookup.csv`	Parcel-level lookup derived from accepted ArcGIS results. If this file exists, `standalone_corporate_parcels.R` applies it as a final coordinate fill with `coord_source = "arcgis_geocoder"`.

The alias table is deliberately conservative. It should contain only transformations that are supported by the local address reference and that can be explained row-by-row. Ambiguous subdivision clusters should stay out of the alias table even if they are high-volume, because an alias would hide uncertainty rather than resolve it.

The ArcGIS geocoder results are also filtered conservatively before merging. The accepted lookup currently requires:

Status == "M"
Score >= 95
Addr_type is one of PointAddress, Subaddress, StreetAddress, or StreetAddressExt
RegionAbbr == "TX"
Subregion == "Travis County"
Coordinates fall inside a broad Austin/Travis bounding box

This rejects obvious false positives and coarse matches, including records geocoded to other states, other Texas metros, ZIP centroids, locality centroids, or unmatched (0, 0) points. In one development run, output/geocoding_candidates.csv contained 11,561 unique address queries representing 16,345 parcel rows. The strict ArcGIS acceptance pass kept 10,894 queries and produced output/geocoding_arcgis_accepted_lookup.csv with 15,242 parcel-level coordinate fills. After applying that lookup, output/coords_summary.csv showed 479,416 of 486,859 coordinate rows filled, with 7,443 still missing.

The important principle is that every coordinate has a provenance. Downstream analysis can keep all coordinates, exclude approximate sources, or inspect particular stages using coord_source.

Identifying corporate ownership

The script classifies each parcel along three dimensions:

Residential — improvement or land state code begins with A (single-family) or B (multi-family), or the zoning code contains SF or MF.
Owner-occupied — the owner's mailing address matches the situs (property) address, or a homestead exemption (HS) is present. Address matching uses the clean_address() function, which normalises street type abbreviations, removes punctuation, and strips unit designators before comparison.
Financialized / corporate-owned — the owner name is matched against a regular expression covering:
- Formal entity suffixes: LLC, LP, LTD, INC, LC, LLLP (including spaced and punctuated variants)
- Real-estate sector keywords: INVEST, MANAGE, HOLDING, DEVELOP, REALT, ASSET, EQUITY, PARTNER, VENTURE, and others
- Numeric characters in the owner name (a proxy for numbered holding companies)

A parcel is flagged as is_target = TRUE when it is residential, not owner-occupied, and financialized. These are the records most likely to represent corporate landlord activity.

The script then spatially filters results to the City of Austin boundary using the Census Bureau's Places layer (via the tigris package) and writes the final dataset to corporate_owned_parcels.csv.

Tri-county Austin parcel files for hex aggregation

Austin spans Travis, Williamson, and Hays counties. The Travis workflow in standalone_corporate_parcels.R produces the main hex-ready residential parcel universe at output/residential_parcels_for_hex.csv. Two county extension scripts add the smaller Austin portions outside Travis:

Script	Output	Notes
`williamson-parcel-pull.R`	`output/williamson_residential_parcels_for_hex.csv`	Downloads/reuses WCAD Socrata exports, spatially filters WCAD parcel geometry to City of Austin `FULL` jurisdiction, joins WCAD property and owner records, and writes an append-ready parcel file.
`hays-parcel-pull.R`	`output/hays_residential_parcels_for_hex.csv`	Uses Hays CAD property export ZIPs plus the public Hays parcel ArcGIS feature service. The Hays CAD export is nested ZIPs (`PROPERTY`, `OWNER`, `LAND`, `IMPROVEMENT`, `SEGMENT`, etc.), so the script recursively unpacks them before normalizing.

Both county extension outputs intentionally use the same 25-column schema as output/residential_parcels_for_hex.csv, including parcel coordinates, owner-name rollups, residential/corporate flags, parcel/unit/square-foot denominators, and corporate numerator fields. Parcel IDs are prefixed with WILLIAMSON: or HAYS: to avoid collisions with Travis parcel IDs. Coordinates come from representative points on county parcel geometry and have county-specific coord_source values.

In the current development run:

File	Residential parcels	Corporate-owned parcels	Estimated units	Corporate estimated units
`output/williamson_residential_parcels_for_hex.csv`	13,482	782	20,587	6,811
`output/hays_residential_parcels_for_hex.csv`	298	1	298	1

These files can be appended directly for citywide hex aggregation:

travis <- readr::read_csv("output/residential_parcels_for_hex.csv", show_col_types = FALSE)
williamson <- readr::read_csv("output/williamson_residential_parcels_for_hex.csv", show_col_types = FALSE)
hays <- readr::read_csv("output/hays_residential_parcels_for_hex.csv", show_col_types = FALSE)

austin_parcels_for_hex <- dplyr::bind_rows(travis, williamson, hays)

Unit estimates are source-dependent. Travis uses TCAD state codes plus square footage, Williamson uses broader WCAD property type and square-footage rules, and Hays uses Hays state codes where available. The resulting property_units field is suitable for hex-level rates, but should be described as an estimated unit denominator rather than a directly reported count.

Standalone script: `austin_parcel_land_transactions.R`

austin_parcel_land_transactions.R creates a citywide parcel-year dataset for displacement modeling. Unlike the corporate-ownership hex files above, this script uses all parcels inside the City of Austin FULL jurisdiction, including commercial parcels, because nonresidential land transactions may also be relevant to neighborhood change.

Primary output:

output/austin_parcel_year_land_transactions.csv

The output has one row per parcel_id plus transaction_year, with current parcel attributes repeated for modeling convenience. Parcel IDs are prefixed by county (TRAVIS:, WILLIAMSON:, HAYS:) to avoid collisions across appraisal districts.

Sources and county treatment

County	Land value source	Transaction source	Notes
Travis	TCAD `owners[].ownerValue[]`, field `ownerLandValue`, from `tcad_special_export.zip`	TCAD `deeds` records from `tcad_special_export.zip`	Streams the large TCAD JSON with `jq`, caches extracted owner-value/deed tables in `output/`, and counts deeds by deed year. Buyer and seller names are classified with the same corporate/financialized marker regex used elsewhere.
Williamson	`data/wcad/wcad_property_certified.csv`, field `TotalLandMktValue`	Not available in the current WCAD exports	Spatially filters `data/wcad/wcad_parcels.rds` to Austin `FULL`, then joins property records. Transaction fields are `NA` and `transaction_source = "not_available_in_current_wcad_exports"`.
Hays	Hays property export, field `CurrLandValue`	Nested Hays `SALES` export	Recursively reads the nested Hays ZIP export and counts sales by `DeedDate` where available, otherwise `SaleDate`. `PrevOwnerName` supports a corporate seller/previous-owner signal, but buyer identity is not inferred.

The script also writes a QA summary:

output/austin_parcel_year_land_transactions_summary.csv

Travis extraction uses cached flat files after the first successful run:

output/travis_owner_values.csv
output/travis_deeds.csv
output/travis_land_transaction_selected_fields.csv

The script normalizes ZIP codes to valid five-digit strings, treats 00000 as missing, and handles malformed TCAD deed dates by falling back to recorded deed dates when available. This avoids spurious transaction years such as 21, 201, or 222 from malformed source dates.

For quick testing without the long Travis stream, set AUSTIN_LAND_TX_COUNTIES:

AUSTIN_LAND_TX_COUNTIES=Williamson,Hays Rscript austin_parcel_land_transactions.R

Subset runs write subset-named outputs, such as:

output/austin_parcel_year_land_transactions_williamson_hays.csv
output/austin_parcel_year_land_transactions_summary_williamson_hays.csv

Corporate transaction fields

The transaction fields represent deed or sale records involving corporate-like names, not a definitive annual ownership panel. Travis supports buyer and seller indicators from buyerLine and sellerLine. Hays supports a seller/previous-owner indicator from PrevOwnerName. Williamson transaction fields remain unavailable unless a separate public WCAD sales/deed source is added later.

The standard output schema is:

county, parcel_id, source_property_id, situs_address, situs_city, situs_state,
situs_zip, lat, lon, coord_source, current_land_value, land_value_tax_year,
transaction_year, transaction_count, corporate_buyer_transaction_count,
corporate_seller_transaction_count, corporate_party_transaction_count,
transaction_source, land_value_source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Austin Landlord Mapper

What this repository does

Repository structure

Pipeline walkthrough

Step 1 – Download TCAD data (`tcad_data`)

Step 2 – Parse TCAD data (`tcad_parse`)

Step 3 – Merge and classify parcels (`austin_parcel_data_merged`)

Step 4 – Append code-complaint counts (`austin_parcel_data_merged_code`)

Step 5 – Scrape Texas Comptroller data (`austin_parcel_data_merged_owner`)

Step 6 – Cluster properties into ownership groups (`situs_group_assignments`)

Step 7 – Geocode parcels (`owners_info_total`)

Step 8 – Enrich with socioeconomic context (`owners_data_total_supp`)

Step 9 – Shiny application (`shinyApp/app.R`)

Setup and usage

Prerequisites

Configuration

Running the pipeline

Running the Shiny app locally

Key derived fields in the output dataset

Standalone script: `standalone_corporate_parcels.R`

TCAD Special Export structure

Parcel geometry and coordinates

Address aliases and external geocoding audit trail

Identifying corporate ownership

Tri-county Austin parcel files for hex aggregation

Standalone script: `austin_parcel_land_transactions.R`

Sources and county treatment

Corporate transaction fields

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
shinyApp		shinyApp
.gitignore		.gitignore
HHI Data 2024 United States.xlsx		HHI Data 2024 United States.xlsx
README.md		README.md
TCAD_parse.py		TCAD_parse.py
_targets.R		_targets.R
austin_parcel_land_transactions.R		austin_parcel_land_transactions.R
final_output_helper_functions.R		final_output_helper_functions.R
hays-parcel-pull.R		hays-parcel-pull.R
scrape_helper_functions.R		scrape_helper_functions.R
standalone_corporate_parcels.R		standalone_corporate_parcels.R
supplementary_scrape_helper_functions.R		supplementary_scrape_helper_functions.R
target_helper_functions.R		target_helper_functions.R
williamson-parcel-pull.R		williamson-parcel-pull.R

Folders and files

Latest commit

History

Repository files navigation

Austin Landlord Mapper

What this repository does

Repository structure

Pipeline walkthrough

Step 1 – Download TCAD data (tcad_data)

Step 2 – Parse TCAD data (tcad_parse)

Step 3 – Merge and classify parcels (austin_parcel_data_merged)

Step 4 – Append code-complaint counts (austin_parcel_data_merged_code)

Step 5 – Scrape Texas Comptroller data (austin_parcel_data_merged_owner)

Step 6 – Cluster properties into ownership groups (situs_group_assignments)

Step 7 – Geocode parcels (owners_info_total)

Step 8 – Enrich with socioeconomic context (owners_data_total_supp)

Step 9 – Shiny application (shinyApp/app.R)

Setup and usage

Prerequisites

Configuration

Running the pipeline

Running the Shiny app locally

Key derived fields in the output dataset

Standalone script: standalone_corporate_parcels.R

TCAD Special Export structure

Parcel geometry and coordinates

Address aliases and external geocoding audit trail

Identifying corporate ownership

Tri-county Austin parcel files for hex aggregation

Standalone script: austin_parcel_land_transactions.R

Sources and county treatment

Corporate transaction fields

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 1 – Download TCAD data (`tcad_data`)

Step 2 – Parse TCAD data (`tcad_parse`)

Step 3 – Merge and classify parcels (`austin_parcel_data_merged`)

Step 4 – Append code-complaint counts (`austin_parcel_data_merged_code`)

Step 5 – Scrape Texas Comptroller data (`austin_parcel_data_merged_owner`)

Step 6 – Cluster properties into ownership groups (`situs_group_assignments`)

Step 7 – Geocode parcels (`owners_info_total`)

Step 8 – Enrich with socioeconomic context (`owners_data_total_supp`)

Step 9 – Shiny application (`shinyApp/app.R`)

Standalone script: `standalone_corporate_parcels.R`

Standalone script: `austin_parcel_land_transactions.R`

Packages