diff --git a/AGENTS.md b/AGENTS.md index e36b2ef..1899af9 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -53,3 +53,76 @@ For multi-step tasks, state a brief plan: 2. [Step] → verify: [check] 3. [Step] → verify: [check] ``` + +## 5. Project Architecture + +**Flask + vanilla JS + D3. No ORM, no front-end framework.** + +``` +simple_org_chart/ ← Python package (Flask app) + app_main.py ← All Flask routes, filter parsers, API endpoints + msgraph.py ← Graph API helpers: fetch_all_employees, probe_graph_capabilities, etc. + reports.py ← Pure filter functions (apply_tagpicker_filters, apply_last_login_filters, …) + data_update.py ← Sync orchestration: token → probe caps → fetch employees → write caches + config.py ← Path constants (DATA_DIR, *_FILE paths) + settings.py ← load_settings / save_settings helpers + auth.py ← @require_auth decorator +static/ + app.js ← D3 org chart, node click/drag, employee detail panel + reports.js ← Report filter UI, tagpicker/toggle renderers, REPORT_CONFIGS + reports.css ← Report filter layout styles + locales/en-US.json← All user-facing strings (i18n keys) +data/ ← JSON caches written at sync time (git-ignored) +``` + +## 6. Adding a Report Filter + +Every filter touches all of these layers — miss one and it silently does nothing: + +1. **`msgraph.py`** — add the field to `$select` in `fetch_all_employees` / `collect_last_login_records`; populate it on each record dict. +2. **`reports.py`** — add the parameter to every `apply_*_filters` function signature; implement the filtering logic. +3. **`app_main.py`** — parse the query-string param in `_parse_standard_toggle_args` or `_parse_tagpicker_args`; forward it to the filter function in every relevant route (GET + export). +4. **`static/reports.js`** — add the filter object to `_standardToggleFilters()` or `TAGPICKER_FILTERS`; add `requiredCapability` if the filter needs a Graph permission beyond `User.Read.All`. +5. **`static/locales/en-US.json`** — add label and (for tagpickers) placeholder/mode keys. + +## 7. Graph Permission & Capability Gating + +Filters that depend on Graph permissions beyond the base `User.Read.All` must be gated: + +| Filter group | Required permission | Capability key | +|---|---|---| +| Mailbox type (User / Shared / Room) | `MailboxSettings.Read` | `mailbox_settings_read` | +| GAL visibility (Hidden / Visible) | `MailboxSettings.Read` | `mailbox_settings_read` | +| Inactivity day-range | `AuditLog.Read.All` + Entra P1/P2 | `audit_log_read_all` | + +**How it works:** +- `msgraph.probe_graph_capabilities(token)` decodes the JWT access token's `roles` claim (no extra API calls) at the start of every sync and writes `data/graph_capabilities.json`. +- `GET /api/graph-capabilities` serves that file to the front end. +- `reports.js` fetches capabilities before first render; `_isFilterCapable(filter)` checks `filter.requiredCapability` against the loaded flags. +- Incapable filter buttons get `aria-disabled` + class `filter-chip--unavailable` (greyed out, tooltip explains missing permission). + +When adding a new filter that needs a Graph permission: +1. Add a probe call in `probe_graph_capabilities()` if the capability isn't already detected. +2. Set `requiredCapability: ''` on the filter object in `reports.js`. + +## 8. Data Flow: Sync → Cache → API → UI + +``` +data_update.run_data_update() + ├─ probe_graph_capabilities() → data/graph_capabilities.json + ├─ fetch_all_employees() → data/employee_data.json, missing_manager, filtered_users, … + ├─ collect_last_login_records() + │ └─ enrich with managerId from employee list + │ → data/last_login_records.json + └─ collect_disabled_users() → data/disabled_user_records.json + +GET /api/reports/ → load cache → apply_*_filters() → JSON response +GET /api/graph-capabilities → data/graph_capabilities.json → JSON response +``` + +## 9. i18n Rules + +- Every user-visible string must have a key in `static/locales/en-US.json`. +- The translator `t(key)` is available in `reports.js` via `getTranslator()`. +- Never hardcode English strings in JS templates. +- When adding filters: add `labelKey`, `placeholderKey`, `resetLabelKey`, `modeIncludeLabelKey`, `modeExcludeLabelKey` references and the matching JSON entries. diff --git a/README.md b/README.md index f3faf14..7034652 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ SimpleOrgChart is a Flask application backed by Azure Active Directory (Entra ID - Hardened security defaults: strict Content Security Policy, sanitized redirects, login isolation, and placeholder-secret protection. - Modular front end: no inline scripts or styles; shared CSS variables power `configure`, `reports`, and org chart experiences. - Daily automation: background scheduler refreshes Azure AD data (20:00 local time) and persists JSON caches under `data/`. -- Admin reporting: missing managers, filtered users, and last-login inactivity insights—each with one-click XLSX export. +- Admin reporting: missing managers, filtered users, last-login inactivity, missing profile pictures, data quality issues, and recently hired users—each with one-click XLSX export. - Export tooling: SVG/PNG/PDF org chart capture and server-backed XLSX generation for the current chart tree. - Deployment ready: ships with Docker Compose and a Gunicorn configuration (`deploy/gunicorn.conf.py`) for containerized hosting. @@ -39,8 +39,8 @@ SimpleOrgChart is a Flask application backed by Azure Active Directory (Entra ID - `User.Read.All` - `Presence.Read.All` *(enables live Teams presence status indicators on org chart cards)* - `LicenseAssignment.Read.All` *(required for licensing insights and admin reports)* - - `AuditLog.Read.All` *(required for last sign-in metrics and disabled-user audit timestamps)* - - `MailboxSettings.Read` *(enables mailbox-type metadata used by last sign-in filters; without it, all mailboxes are treated as standard users)* + - `AuditLog.Read.All` *(required for last sign-in metrics and disabled-user audit timestamps; also enables the inactivity day-range filters on the Last Logins report — without it those filters are greyed out)* + - `MailboxSettings.Read` *(enables mailbox-type metadata used by mailbox-type filters; without it the User/Shared/Room mailbox filters are greyed out)* - Grant admin consent for the tenant. 3. **Create a Client Secret** @@ -125,7 +125,7 @@ docker compose up -d ``` - Default port: `APP_PORT` (defaults to `5000`). Override it in `.env` to change container and host bindings. -- Persistent data resides in the `orgchart_data` volume. Remove it to rebuild caches from scratch. +- Persistent data resides in the `./data` and `./config` bind-mounted directories. Remove their contents to rebuild caches from scratch. - Local execution outside Docker is not supported; use the provided container workflow for development and production. ## Key Features @@ -139,7 +139,12 @@ docker compose up -d - Users by last sign-in activity - Employees hired in the last 365 days - Users hidden by filters + - Users without profile picture + - Users without a hiring date + - Users with data quality issues (whitespace, uppercase emails) - User Scanner (OSINT) — individual and organization-wide email presence scans +- **Rich Report Filters**: Toggle groups (mailbox type, account status, license status, user type, GAL visibility, mailbox presence, manager presence) plus tagpicker filters for title, department, country, and state/province. Filters that require Graph permissions not currently granted are automatically greyed out with a tooltip explaining what is missing. +- **Permission-Gated Filters**: On each data sync the app probes which Graph API capabilities are available and persists the result. Filters that require `AuditLog.Read.All` (inactivity ranges), `MailboxSettings.Read` (mailbox type), or Exchange-backed `showInAddressList` (GAL visibility) are disabled in the UI until the corresponding permission is granted and a sync is run. - **Automated Email Reports**: Schedule daily, weekly, or monthly reports sent via SMTP after data synchronization. - **Export Options**: SVG/PNG/PDF snapshots and XLSX exports for reports and chart data. - **Caching & Scheduling**: JSON caches regenerate nightly; manual refresh endpoints keep data current on demand. @@ -291,12 +296,17 @@ Organization scans use file-based state (`data/full_scan_state.json`) and a canc - `data/employee_data.json` – Full org hierarchy. - `data/missing_manager_records.json` – Missing manager snapshot. +- `data/filtered_user_records.json` – Users hidden by org chart filters. - `data/disabled_user_records.json` – Disabled users enriched with license and sign-in metadata. - `data/last_login_records.json` – Active users with last sign-in timestamps. +- `data/recently_hired_employees.json` – Users hired in the last 365 days. +- `data/missing_photo_records.json` – Users without a profile picture. +- `data/missing_hire_date_records.json` – Users without a hiring date (`employeeHireDate`). +- `data/dirty_data_records.json` – Users with data quality issues (whitespace, uppercase emails). +- `data/graph_capabilities.json` – Graph API capability flags written on each sync; controls which report filters are enabled in the UI. - `data/user_scanner_results.json` – Cached results from the most recent organization-wide OSINT scan. - `data/user_scanner_history.json` – Last five scan run metadata entries. - `data/user_scanner_exports/` – Downloaded XLSX workbooks for completed scans. -- Additional files exist for filtered/disabled-with-license/hiring reports. If a cache is missing or stale, hit **Refresh Data** on the reports page or start the app with `RUN_INITIAL_UPDATE=true`. diff --git a/simple_org_chart/app_main.py b/simple_org_chart/app_main.py index 33a2526..87ec2dc 100644 --- a/simple_org_chart/app_main.py +++ b/simple_org_chart/app_main.py @@ -92,16 +92,21 @@ def flock(fd, op): ) from simple_org_chart.reports import ( ReportCacheManager, + apply_dirty_data_filters, apply_disabled_filters, apply_filtered_user_filters, apply_last_login_filters, + apply_missing_hire_date_filters, apply_missing_manager_filters, apply_missing_photo_filters, apply_tagpicker_filters, + calculate_license_totals, + load_dirty_data, load_disabled_users_data, load_filtered_license_data, load_filtered_user_data, load_last_login_data, + load_missing_hire_date_data, load_missing_manager_data, load_missing_photo_data, load_recently_hired_data, @@ -292,8 +297,21 @@ def add_security_headers(response): RECENTLY_DISABLED_FILE = str(app_config.RECENTLY_DISABLED_FILE) RECENTLY_HIRED_FILE = str(app_config.RECENTLY_HIRED_FILE) MISSING_PHOTO_FILE = str(app_config.MISSING_PHOTO_FILE) +GRAPH_CAPABILITIES_FILE = str(app_config.GRAPH_CAPABILITIES_FILE) +DIRTY_DATA_FILE = str(app_config.DIRTY_DATA_FILE) +MISSING_HIRE_DATE_FILE = str(app_config.MISSING_HIRE_DATE_FILE) DATA_UPDATE_STATUS_FILE = os.path.join(DATA_DIR, 'data_update_status.json') +# Always delete the capabilities file on startup — it is regenerated on the +# next sync via JWT-decode. This ensures no stale data from previous probe +# implementations (API-call based) persists across restarts. +try: + if os.path.exists(GRAPH_CAPABILITIES_FILE): + os.remove(GRAPH_CAPABILITIES_FILE) + logger.info("Removed graph_capabilities.json on startup; will regenerate on next sync.") +except Exception as _cap_startup_err: + logger.warning("Could not remove graph_capabilities.json on startup: %s", _cap_startup_err) + logger.info(f"DATA_DIR set to: {DATA_DIR}") TENANT_ID = os.environ.get('AZURE_TENANT_ID') @@ -838,15 +856,39 @@ def get_metadata_options(): job_titles = collect_unique_field_values(employees, 'title') departments = collect_unique_field_values(employees, 'department') countries = collect_unique_field_values(employees, 'country') + states = collect_unique_field_values(employees, 'state') employee_options = collect_employee_option_labels(employees) return jsonify({ 'jobTitles': job_titles, 'departments': departments, 'countries': countries, + 'states': states, 'employees': employee_options }) + +@app.route('/api/graph-capabilities') +@require_auth +def get_graph_capabilities(): + """Return the Graph API capability flags detected during the last sync.""" + if not os.path.exists(GRAPH_CAPABILITIES_FILE): + resp = jsonify({'available': False, 'reason': 'No sync has run yet'}) + resp.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate' + return resp + try: + with open(GRAPH_CAPABILITIES_FILE, 'r') as cap_file: + caps = json.load(cap_file) + caps['available'] = True + resp = jsonify(caps) + resp.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate' + return resp + except Exception as exc: + logger.warning('Failed to read graph capabilities file: %s', exc) + resp = jsonify({'available': False, 'reason': 'Failed to read capabilities file'}) + resp.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate' + return resp + @app.route('/api/set-top-user', methods=['POST']) @limiter.limit(RATE_LIMIT_SETTINGS) def set_top_user(): @@ -1744,6 +1786,223 @@ def export_missing_photo_report(): return jsonify({'error': 'Failed to export report'}), 500 +@app.route('/api/reports/missing-hire-date') +@require_auth +def get_missing_hire_date_report(): + try: + refresh = _parse_bool_arg(request.args.get('refresh'), default=False) + scope = request.args.get('scope', 'orgChart') + toggles = _parse_standard_toggle_args(request.args) + tp = _parse_tagpicker_args(request.args) + + records = load_missing_hire_date_data(report_cache, force_refresh=refresh) + records = _apply_scope_filter(records, scope) + filtered_records = apply_missing_hire_date_filters(records, **toggles) + filtered_records = apply_tagpicker_filters(filtered_records, **tp) + generated_at = None + if os.path.exists(MISSING_HIRE_DATE_FILE): + generated_at = datetime.fromtimestamp(os.path.getmtime(MISSING_HIRE_DATE_FILE)).isoformat() + + return jsonify({ + 'records': filtered_records, + 'count': len(filtered_records), + 'generatedAt': generated_at, + 'appliedFilters': toggles, + }) + except Exception as e: + logger.error(f"Error loading missing hire date report: {e}") + return jsonify({'error': 'Failed to load report data'}), 500 + + +@app.route('/api/reports/missing-hire-date/export') +@require_auth +def export_missing_hire_date_report(): + if not Workbook: + return jsonify({'error': 'XLSX export not available - openpyxl not installed'}), 500 + + try: + refresh = _parse_bool_arg(request.args.get('refresh'), default=False) + scope = request.args.get('scope', 'orgChart') + toggles = _parse_standard_toggle_args(request.args) + tp = _parse_tagpicker_args(request.args) + + records = load_missing_hire_date_data(report_cache, force_refresh=refresh) + records = _apply_scope_filter(records, scope) + filtered_records = apply_missing_hire_date_filters(records, **toggles) + filtered_records = apply_tagpicker_filters(filtered_records, **tp) + + wb = Workbook() + ws = wb.active + ws.title = "Missing Hire Date" + + headers = [ + ('name', 'Name'), + ('title', 'Title'), + ('department', 'Department'), + ('email', 'Email'), + ('country', 'Country'), + ] + + for column_index, (_, header_text) in enumerate(headers, 1): + cell = ws.cell(row=1, column=column_index, value=header_text) + cell.font = Font(bold=True, color="FFFFFF") + cell.fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid") + cell.alignment = Alignment(horizontal="center") + + for row_index, record in enumerate(filtered_records, start=2): + for column_index, (key, _) in enumerate(headers, 1): + ws.cell(row=row_index, column=column_index, value=record.get(key)) + + for col in range(1, len(headers) + 1): + column_letter = get_column_letter(col) + ws.column_dimensions[column_letter].width = 22 + + filename = f"missing-hire-date-{datetime.now().strftime('%Y-%m-%d')}.xlsx" + + add_metadata_sheet( + wb, + filename=filename, + sheet_title=ws.title, + item_count=len(filtered_records), + data_export_option=format_export_filters(scope, toggles, tp), + ) + + output = BytesIO() + wb.save(output) + output.seek(0) + + return send_file( + output, + as_attachment=True, + download_name=filename, + mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' + ) + except Exception as e: + logger.error(f"Error exporting missing hire date report: {e}") + return jsonify({'error': 'Failed to export report'}), 500 + + +@app.route('/api/reports/dirty-data') +@require_auth +def get_dirty_data_report(): + try: + refresh = _parse_bool_arg(request.args.get('refresh'), default=False) + scope = request.args.get('scope', 'orgChart') + + include_enabled = _parse_bool_arg(request.args.get('includeEnabled'), default=True) + include_disabled = _parse_bool_arg(request.args.get('includeDisabled'), default=False) + include_licensed = _parse_bool_arg(request.args.get('includeLicensed'), default=True) + include_unlicensed = _parse_bool_arg(request.args.get('includeUnlicensed'), default=True) + include_members = _parse_bool_arg(request.args.get('includeMembers'), default=True) + include_guests = _parse_bool_arg(request.args.get('includeGuests'), default=False) + + records = load_dirty_data(report_cache, force_refresh=refresh) + records = _apply_scope_filter(records, scope) + filtered_records = apply_dirty_data_filters( + records, + include_enabled=include_enabled, + include_disabled=include_disabled, + include_licensed=include_licensed, + include_unlicensed=include_unlicensed, + include_members=include_members, + include_guests=include_guests, + ) + + generated_at = None + if os.path.exists(DIRTY_DATA_FILE): + generated_at = datetime.fromtimestamp(os.path.getmtime(DIRTY_DATA_FILE)).isoformat() + + return jsonify({ + 'records': filtered_records, + 'count': len(filtered_records), + 'generatedAt': generated_at, + }) + except Exception as e: + logger.error(f"Error loading dirty data report: {e}") + return jsonify({'error': 'Failed to load report data'}), 500 + + +@app.route('/api/reports/dirty-data/export') +@require_auth +def export_dirty_data_report(): + if not Workbook: + return jsonify({'error': 'XLSX export not available - openpyxl not installed'}), 500 + + try: + refresh = _parse_bool_arg(request.args.get('refresh'), default=False) + scope = request.args.get('scope', 'orgChart') + + include_enabled = _parse_bool_arg(request.args.get('includeEnabled'), default=True) + include_disabled = _parse_bool_arg(request.args.get('includeDisabled'), default=False) + include_licensed = _parse_bool_arg(request.args.get('includeLicensed'), default=True) + include_unlicensed = _parse_bool_arg(request.args.get('includeUnlicensed'), default=True) + include_members = _parse_bool_arg(request.args.get('includeMembers'), default=True) + include_guests = _parse_bool_arg(request.args.get('includeGuests'), default=False) + + records = load_dirty_data(report_cache, force_refresh=refresh) + records = _apply_scope_filter(records, scope) + filtered_records = apply_dirty_data_filters( + records, + include_enabled=include_enabled, + include_disabled=include_disabled, + include_licensed=include_licensed, + include_unlicensed=include_unlicensed, + include_members=include_members, + include_guests=include_guests, + ) + + wb = Workbook() + ws = wb.active + ws.title = "Data Quality Issues" + + headers = [ + ('name', 'Name'), + ('title', 'Title'), + ('department', 'Department'), + ('email', 'Email'), + ('issueCount', 'Issue Count'), + ('issueFields', 'Affected Fields'), + ] + + for column_index, (_, header_text) in enumerate(headers, 1): + cell = ws.cell(row=1, column=column_index, value=header_text) + cell.font = Font(bold=True, color="FFFFFF") + cell.fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid") + cell.alignment = Alignment(horizontal="center") + + for row_index, record in enumerate(filtered_records, start=2): + for column_index, (key, _) in enumerate(headers, 1): + ws.cell(row=row_index, column=column_index, value=record.get(key)) + + for col in range(1, len(headers) + 1): + column_letter = get_column_letter(col) + ws.column_dimensions[column_letter].width = 22 + + filename = f"data-quality-issues-{datetime.now().strftime('%Y-%m-%d')}.xlsx" + + add_metadata_sheet( + wb, + filename=filename, + sheet_title=ws.title, + item_count=len(filtered_records), + data_export_option=format_export_filters(scope, {}, {}), + ) + + output = BytesIO() + wb.save(output) + output.seek(0) + + return send_file( + output, + as_attachment=True, + download_name=filename, + mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' + ) + except Exception as e: + logger.error(f"Error exporting dirty data report: {e}") + return jsonify({'error': 'Failed to export report'}), 500 + + @app.route('/api/reports/disabled-users') @require_auth def get_disabled_users_report(): @@ -2129,6 +2388,11 @@ def _extract_list(key): if filter_countries_mode not in ('include', 'exclude'): filter_countries_mode = 'exclude' + filter_states = _extract_list('filterStates') + filter_states_mode = (args.get('filterStatesMode') or 'exclude').strip().lower() + if filter_states_mode not in ('include', 'exclude'): + filter_states_mode = 'exclude' + return { 'filter_titles': filter_titles, 'filter_titles_mode': filter_titles_mode, @@ -2136,6 +2400,8 @@ def _extract_list(key): 'filter_departments_mode': filter_departments_mode, 'filter_countries': filter_countries, 'filter_countries_mode': filter_countries_mode, + 'filter_states': filter_states, + 'filter_states_mode': filter_states_mode, } @@ -2156,6 +2422,12 @@ def _parse_standard_toggle_args(args, defaults=None): 'include_unlicensed': _parse_bool_arg(args.get('includeUnlicensed'), default=d.get('include_unlicensed', True)), 'include_members': _parse_bool_arg(args.get('includeMembers'), default=d.get('include_members', True)), 'include_guests': _parse_bool_arg(args.get('includeGuests'), default=d.get('include_guests', False)), + 'include_hidden_from_address_list': _parse_bool_arg(args.get('includeHiddenFromAddressList'), default=d.get('include_hidden_from_address_list', True)), + 'include_visible_in_address_list': _parse_bool_arg(args.get('includeVisibleInAddressList'), default=d.get('include_visible_in_address_list', True)), + 'include_with_mailbox': _parse_bool_arg(args.get('includeWithMailbox'), default=d.get('include_with_mailbox', True)), + 'include_without_mailbox': _parse_bool_arg(args.get('includeWithoutMailbox'), default=d.get('include_without_mailbox', True)), + 'include_with_manager': _parse_bool_arg(args.get('includeWithManager'), default=d.get('include_with_manager', True)), + 'include_without_manager': _parse_bool_arg(args.get('includeWithoutManager'), default=d.get('include_without_manager', True)), } @@ -2195,6 +2467,12 @@ def get_last_logins_report(): include_user_mailboxes = _parse_bool_arg(request.args.get('includeUserMailboxes'), default=True) include_shared_mailboxes = _parse_bool_arg(request.args.get('includeSharedMailboxes'), default=True) include_room_equipment_mailboxes = _parse_bool_arg(request.args.get('includeRoomEquipmentMailboxes'), default=True) + include_hidden_from_address_list = _parse_bool_arg(request.args.get('includeHiddenFromAddressList'), default=True) + include_visible_in_address_list = _parse_bool_arg(request.args.get('includeVisibleInAddressList'), default=True) + include_with_mailbox = _parse_bool_arg(request.args.get('includeWithMailbox'), default=True) + include_without_mailbox = _parse_bool_arg(request.args.get('includeWithoutMailbox'), default=True) + include_with_manager = _parse_bool_arg(request.args.get('includeWithManager'), default=True) + include_without_manager = _parse_bool_arg(request.args.get('includeWithoutManager'), default=True) tp = _parse_tagpicker_args(request.args) inactive_days_raw = request.args.get('inactiveDays') @@ -2222,7 +2500,13 @@ def get_last_logins_report(): include_shared_mailboxes=include_shared_mailboxes, include_room_equipment_mailboxes=include_room_equipment_mailboxes, inactive_days=inactive_days, - inactive_days_max=inactive_days_max + inactive_days_max=inactive_days_max, + include_hidden_from_address_list=include_hidden_from_address_list, + include_visible_in_address_list=include_visible_in_address_list, + include_with_mailbox=include_with_mailbox, + include_without_mailbox=include_without_mailbox, + include_with_manager=include_with_manager, + include_without_manager=include_without_manager, ) filtered_records = apply_tagpicker_filters(filtered_records, **tp) @@ -2246,6 +2530,12 @@ def get_last_logins_report(): 'includeUserMailboxes': include_user_mailboxes, 'includeSharedMailboxes': include_shared_mailboxes, 'includeRoomEquipmentMailboxes': include_room_equipment_mailboxes, + 'includeHiddenFromAddressList': include_hidden_from_address_list, + 'includeVisibleInAddressList': include_visible_in_address_list, + 'includeWithMailbox': include_with_mailbox, + 'includeWithoutMailbox': include_without_mailbox, + 'includeWithManager': include_with_manager, + 'includeWithoutManager': include_without_manager, 'inactiveDays': inactive_days, 'inactiveDaysMax': inactive_days_max } @@ -2275,6 +2565,12 @@ def export_last_logins_report(): include_user_mailboxes = _parse_bool_arg(request.args.get('includeUserMailboxes'), default=True) include_shared_mailboxes = _parse_bool_arg(request.args.get('includeSharedMailboxes'), default=True) include_room_equipment_mailboxes = _parse_bool_arg(request.args.get('includeRoomEquipmentMailboxes'), default=True) + include_hidden_from_address_list = _parse_bool_arg(request.args.get('includeHiddenFromAddressList'), default=True) + include_visible_in_address_list = _parse_bool_arg(request.args.get('includeVisibleInAddressList'), default=True) + include_with_mailbox = _parse_bool_arg(request.args.get('includeWithMailbox'), default=True) + include_without_mailbox = _parse_bool_arg(request.args.get('includeWithoutMailbox'), default=True) + include_with_manager = _parse_bool_arg(request.args.get('includeWithManager'), default=True) + include_without_manager = _parse_bool_arg(request.args.get('includeWithoutManager'), default=True) inactive_days_raw = request.args.get('inactiveDays') inactive_days = None if inactive_days_raw not in (None, '', 'null', 'None'): @@ -2302,7 +2598,13 @@ def export_last_logins_report(): include_shared_mailboxes=include_shared_mailboxes, include_room_equipment_mailboxes=include_room_equipment_mailboxes, inactive_days=inactive_days, - inactive_days_max=inactive_days_max + inactive_days_max=inactive_days_max, + include_hidden_from_address_list=include_hidden_from_address_list, + include_visible_in_address_list=include_visible_in_address_list, + include_with_mailbox=include_with_mailbox, + include_without_mailbox=include_without_mailbox, + include_with_manager=include_with_manager, + include_without_manager=include_without_manager, ) filtered_records = apply_tagpicker_filters(filtered_records, **tp) @@ -3140,8 +3442,14 @@ def user_scanner_full_scan(): include_unlicensed = body.get('includeUnlicensed', False) include_members = body.get('includeMembers', True) include_guests = body.get('includeGuests', False) - - # Tagpicker filters (Title / Department / Country) + include_hidden_from_address_list = body.get('includeHiddenFromAddressList', True) + include_visible_in_address_list = body.get('includeVisibleInAddressList', True) + include_with_mailbox = body.get('includeWithMailbox', True) + include_without_mailbox = body.get('includeWithoutMailbox', True) + include_with_manager = body.get('includeWithManager', True) + include_without_manager = body.get('includeWithoutManager', True) + + # Tagpicker filters (Title / Department / Country / State) def _parse_filter_values(value): if isinstance(value, list): parsed = [] @@ -3158,6 +3466,8 @@ def _parse_filter_values(value): filter_departments_mode = body.get('filterDepartmentsMode', 'exclude') filter_countries = _parse_filter_values(body.get('filterCountries')) filter_countries_mode = body.get('filterCountriesMode', 'exclude') + filter_states = _parse_filter_values(body.get('filterStates')) + filter_states_mode = body.get('filterStatesMode', 'exclude') # Load all non-guest users (includes disabled, filtered, etc.) employees = _load_all_scannable_users() @@ -3181,9 +3491,15 @@ def _parse_filter_values(value): include_unlicensed=include_unlicensed, include_members=include_members, include_guests=include_guests, + include_hidden_from_address_list=include_hidden_from_address_list, + include_visible_in_address_list=include_visible_in_address_list, + include_with_mailbox=include_with_mailbox, + include_without_mailbox=include_without_mailbox, + include_with_manager=include_with_manager, + include_without_manager=include_without_manager, ) - # Apply tagpicker filters (Title / Department / Country) + # Apply tagpicker filters (Title / Department / Country / State) employees = apply_tagpicker_filters( employees, filter_titles=filter_titles, @@ -3192,6 +3508,8 @@ def _parse_filter_values(value): filter_departments_mode=filter_departments_mode, filter_countries=filter_countries, filter_countries_mode=filter_countries_mode, + filter_states=filter_states, + filter_states_mode=filter_states_mode, ) if not employees: @@ -3543,6 +3861,60 @@ def trigger_update(): mark_data_update_finished(success=False, error=str(e), source='manual') return jsonify({'error': 'Update failed'}), 500 +@app.route('/api/clear-data', methods=['POST']) +@require_auth +@limiter.limit(RATE_LIMIT_REFRESH) +def clear_cached_data(): + """Delete all cached dataset files (keeping settings/config) and trigger a fresh sync.""" + current_status = load_data_update_status() + if current_status.get('state') == 'running': + return jsonify({'error': 'Update already in progress'}), 409 + + cache_files = [ + DATA_FILE, + MISSING_MANAGER_FILE, + EMPLOYEE_LIST_FILE, + DISABLED_LICENSE_FILE, + FILTERED_LICENSE_FILE, + FILTERED_USERS_FILE, + DISABLED_USERS_FILE, + LAST_LOGIN_FILE, + RECENTLY_DISABLED_FILE, + RECENTLY_HIRED_FILE, + MISSING_PHOTO_FILE, + GRAPH_CAPABILITIES_FILE, + DIRTY_DATA_FILE, + MISSING_HIRE_DATE_FILE, + ] + + removed = [] + for path in cache_files: + try: + if os.path.exists(path): + os.remove(path) + removed.append(os.path.basename(path)) + except OSError as remove_error: + logger.warning(f"Failed to delete cached file {path}: {remove_error}") + + logger.info( + "Cleared %s cached data file(s) by user %s: %s", + len(removed), session.get('username'), removed, + ) + + try: + worker = threading.Thread( + target=update_employee_data, + kwargs={'source': 'manual'}, + daemon=True, + ) + worker.start() + except Exception as e: + logger.error(f"Error triggering update after clearing data: {e}") + mark_data_update_finished(success=False, error=str(e), source='manual') + return jsonify({'cleared': removed, 'error': 'Sync failed to start'}), 500 + + return jsonify({'cleared': removed, 'message': 'Data cleared; fresh sync started'}), 200 + @app.route('/search-test') def search_test(): return render_template_string(get_template('search_test.html')) diff --git a/simple_org_chart/config.py b/simple_org_chart/config.py index 267c058..a580a4c 100644 --- a/simple_org_chart/config.py +++ b/simple_org_chart/config.py @@ -15,7 +15,7 @@ STATIC_DIR = BASE_DIR / "static" TEMPLATE_DIR = BASE_DIR / "templates" -REPO_DIR = BASE_DIR / "repositories" +REPO_DIR = DATA_DIR / "repositories" SETTINGS_FILE = CONFIG_DIR / "app_settings.json" DATA_FILE = DATA_DIR / "employee_data.json" MISSING_MANAGER_FILE = DATA_DIR / "missing_manager_records.json" @@ -28,6 +28,9 @@ RECENTLY_DISABLED_FILE = DATA_DIR / "recently_disabled_employees.json" RECENTLY_HIRED_FILE = DATA_DIR / "recently_hired_employees.json" MISSING_PHOTO_FILE = DATA_DIR / "missing_photo_records.json" +GRAPH_CAPABILITIES_FILE = DATA_DIR / "graph_capabilities.json" +DIRTY_DATA_FILE = DATA_DIR / "dirty_data_records.json" +MISSING_HIRE_DATE_FILE = DATA_DIR / "missing_hire_date_records.json" def ensure_directories() -> None: @@ -63,6 +66,9 @@ def as_posix_env(mapping: Dict[str, Path]) -> Dict[str, str]: "RECENTLY_DISABLED_FILE", "RECENTLY_HIRED_FILE", "MISSING_PHOTO_FILE", + "GRAPH_CAPABILITIES_FILE", + "DIRTY_DATA_FILE", + "MISSING_HIRE_DATE_FILE", "ensure_directories", "as_posix_env", ] diff --git a/simple_org_chart/data_update.py b/simple_org_chart/data_update.py index 65e13cf..c067289 100644 --- a/simple_org_chart/data_update.py +++ b/simple_org_chart/data_update.py @@ -19,6 +19,7 @@ fetch_all_employees, get_access_token, parse_graph_datetime, + probe_graph_capabilities, _enrich_mailbox_metadata, ) from simple_org_chart.settings import ( @@ -47,6 +48,9 @@ FILTERED_LICENSE_FILE = str(app_config.FILTERED_LICENSE_FILE) FILTERED_USERS_FILE = str(app_config.FILTERED_USERS_FILE) MISSING_PHOTO_FILE = str(app_config.MISSING_PHOTO_FILE) +GRAPH_CAPABILITIES_FILE = str(app_config.GRAPH_CAPABILITIES_FILE) +DIRTY_DATA_FILE = str(app_config.DIRTY_DATA_FILE) +MISSING_HIRE_DATE_FILE = str(app_config.MISSING_HIRE_DATE_FILE) DATA_UPDATE_STATUS_FILE = os.path.join(DATA_DIR, 'data_update_status.json') _DATA_UPDATE_STATUS_LOCK = threading.Lock() @@ -316,6 +320,15 @@ def update_employee_data(source: str = 'unknown') -> None: error_message = "Access token retrieval failed" return + # Probe Graph capabilities early so filters can be correctly gated. + try: + capabilities = probe_graph_capabilities(token) + with open(GRAPH_CAPABILITIES_FILE, 'w') as cap_file: + json.dump(capabilities, cap_file, indent=2) + logger.info("Graph capabilities probed and saved: %s", GRAPH_CAPABILITIES_FILE) + except Exception as cap_error: + logger.warning("Failed to probe/save Graph capabilities: %s", cap_error) + settings = load_settings() months_threshold = settings.get('newEmployeeMonths', 3) @@ -344,6 +357,7 @@ def update_employee_data(source: str = 'unknown') -> None: 'department': emp.get('department') or '', 'email': emp.get('email') or '', 'country': emp.get('country') or '', + 'state': emp.get('state') or '', 'accountEnabled': emp.get('accountEnabled', True), 'userType': emp.get('userType') or '', 'licenseCount': emp.get('licenseCount', 0), @@ -360,6 +374,52 @@ def update_employee_data(source: str = 'unknown') -> None: except Exception as report_error: logger.error(f"Failed to write missing photo report cache: {report_error}") + try: + missing_hire_date_records = [ + { + 'id': emp.get('id'), + 'name': emp.get('name') or '', + 'title': emp.get('title') or '', + 'department': emp.get('department') or '', + 'email': emp.get('email') or '', + 'country': emp.get('country') or '', + 'state': emp.get('state') or '', + 'accountEnabled': emp.get('accountEnabled', True), + 'userType': emp.get('userType') or '', + 'licenseCount': emp.get('licenseCount', 0), + 'licenseSkus': emp.get('licenseSkus', []), + 'licenseSkuIds': emp.get('licenseSkuIds', []), + 'mailboxType': emp.get('mailboxType'), + 'isSharedMailbox': emp.get('isSharedMailbox'), + 'managerId': emp.get('managerId'), + 'managerName': emp.get('managerName') or '', + 'hasManager': emp.get('hasManager', True), + 'hasMailbox': emp.get('hasMailbox', True), + 'hiddenFromAddressLists': emp.get('hiddenFromAddressLists', False), + } + for emp in (list(employees) + (filtered_users or [])) + if not (emp.get('hireDate') or emp.get('employeeHireDate')) + ] + with open(MISSING_HIRE_DATE_FILE, 'w') as report_file: + json.dump(missing_hire_date_records, report_file, indent=2) + logger.info( + f"Updated missing hire date report cache with {len(missing_hire_date_records)} records" + ) + except Exception as report_error: + logger.error(f"Failed to write missing hire date report cache: {report_error}") + + try: + from simple_org_chart.reports import detect_dirty_data_records + all_users_for_dirty = list(employees) + (filtered_users or []) + dirty_records = detect_dirty_data_records(all_users_for_dirty) + with open(DIRTY_DATA_FILE, 'w') as report_file: + json.dump(dirty_records, report_file, indent=2) + logger.info( + f"Updated dirty data report cache with {len(dirty_records)} records" + ) + except Exception as report_error: + logger.error(f"Failed to write dirty data report cache: {report_error}") + ignored_employee_set = parse_ignored_employees(settings) ignored_department_set = parse_ignored_departments(settings) @@ -495,6 +555,17 @@ def update_new_status(node): try: last_login_records = collect_last_login_records(token=token) + # Enrich with managerId from the employee list so the "has manager" + # filter works on this report. + _employee_id_map = {str(e.get('id')): e for e in employees if e.get('id')} + for _rec in last_login_records: + _emp = _employee_id_map.get(str(_rec.get('id') or '')) + if _emp is not None: + _rec['managerId'] = _emp.get('managerId') + _rec['hasManager'] = bool(_emp.get('managerId')) + else: + _rec.setdefault('managerId', None) + _rec.setdefault('hasManager', False) with open(LAST_LOGIN_FILE, 'w') as report_file: json.dump(last_login_records, report_file, indent=2) logger.info( diff --git a/simple_org_chart/hierarchy.py b/simple_org_chart/hierarchy.py index b212029..4572e67 100644 --- a/simple_org_chart/hierarchy.py +++ b/simple_org_chart/hierarchy.py @@ -225,6 +225,7 @@ def traverse(node): 'businessPhone': emp.get('businessPhone'), 'location': emp.get('location') or emp.get('officeLocation') or '', 'country': emp.get('country') or '', + 'state': emp.get('state') or '', 'managerName': manager_name, 'reason': effective_reason, 'missingReason': reason, diff --git a/simple_org_chart/msgraph.py b/simple_org_chart/msgraph.py index 3be08b3..a8a607c 100644 --- a/simple_org_chart/msgraph.py +++ b/simple_org_chart/msgraph.py @@ -2,6 +2,8 @@ from __future__ import annotations +import base64 +import json as _json import logging import os import time @@ -31,6 +33,93 @@ FallbackLoader = Callable[[], EmployeeTriple] +# --------------------------------------------------------------------------- +# Graph capability probing +# --------------------------------------------------------------------------- + +# Map of capability key → what it means. +# Capability keys and the Graph app permissions they require: +# user_read_all - User.Read.All (core: required for everything) +# audit_log_read_all - AuditLog.Read.All (sign-in activity / last logins) +# mailbox_settings_read - MailboxSettings.Read (mailbox type + GAL visibility) + + +def _decode_jwt_roles(token: str) -> frozenset: + """Extract the roles claim from a JWT access token without verifying the signature. + + Safe for informational use only — we are reading our own token to discover + which application permissions were granted, not for authentication. + """ + try: + parts = token.split('.') + if len(parts) < 2: + return frozenset() + payload = parts[1] + # Restore base64 padding (no padding added when already aligned) + payload += '=' * (-len(payload) % 4) + decoded = base64.urlsafe_b64decode(payload.encode()) + claims = _json.loads(decoded) + roles = claims.get('roles', []) + return frozenset(roles) if isinstance(roles, list) else frozenset() + except Exception as exc: + logger.debug("Failed to decode JWT roles: %s", exc) + return frozenset() + + +def _has_exchange_mailbox(user: dict) -> bool: + """Return True when the user has a provisioned Exchange Online mailbox. + + Reads the ``assignedPlans`` collection returned by Graph and looks for an + enabled ``exchange`` service plan. This mirrors what the Exchange admin + center "Manage mailboxes" view reflects, and is far more reliable than the + ``mail`` attribute (which is populated on many mail-enabled objects that + have no mailbox). Requires no extra Graph permission or API call. + """ + assigned_plans = user.get("assignedPlans") or [] + if not isinstance(assigned_plans, list): + return False + for plan in assigned_plans: + if not isinstance(plan, dict): + continue + if (plan.get("service") or "").lower() == "exchange" and \ + (plan.get("capabilityStatus") or "").lower() == "enabled": + return True + return False + + +def probe_graph_capabilities(token: str) -> dict: + """Detect which Graph API capabilities are available with the supplied token. + + Reads permission grants directly from the JWT access token's ``roles`` + claim — no extra API calls needed. For client-credentials (app-only) flow, + a permission appears in ``roles`` only when it has been admin-consented. + + Returns a dict of capability flags, e.g.:: + + { + "user_read_all": True, + "audit_log_read_all": False, + "mailbox_settings_read": True, + "probed_at": "2026-06-14T10:00:00+00:00", + } + """ + roles = _decode_jwt_roles(token) + logger.info("Decoded %d JWT role(s) from access token", len(roles)) + + result: dict = { + "user_read_all": "User.Read.All" in roles, + "audit_log_read_all": "AuditLog.Read.All" in roles, + "mailbox_settings_read": "MailboxSettings.Read" in roles, + "probed_at": datetime.now(timezone.utc).isoformat(), + } + + logger.info( + "Graph capability probe results: %s", + {k: v for k, v in result.items() if k != "probed_at"}, + ) + return result + + def _enrich_mailbox_metadata( headers: dict, records: Iterable[dict], @@ -93,7 +182,38 @@ def _enrich_mailbox_metadata( payload = response.json() or {} mailbox_purpose_raw = (payload.get("userPurpose") or "").strip() + + # Always attempt to refresh showInAddressList with a targeted per-user + # call. The bulk /users query returns null for some Exchange-managed + # shared mailboxes even when HiddenFromAddressListsEnabled=True in + # Exchange. We use the beta endpoint with strong consistency (no + # ConsistencyLevel: eventual) for the best chance of getting the + # Exchange-synced value. + try: + gal_headers = { + "Authorization": headers.get("Authorization", ""), + "Content-Type": "application/json", + } + gal_resp = requests.get( + f"{GRAPH_API_BETA_ENDPOINT}/users/{user_id}?$select=showInAddressList", + headers=gal_headers, + timeout=10, + ) + if gal_resp.status_code == 200: + show_in_al = gal_resp.json().get("showInAddressList") + if show_in_al is not None: + for record in record_group: + record["hiddenFromAddressLists"] = show_in_al is False + else: + logger.debug( + "showInAddressList refresh returned status %s for %s", + gal_resp.status_code, user_id, + ) + except Exception as exc: + logger.debug("Failed to refresh showInAddressList for %s: %s", user_id, exc) + if not mailbox_purpose_raw: + lookups_performed += 1 continue mailbox_purpose = mailbox_purpose_raw.lower() @@ -102,6 +222,9 @@ def _enrich_mailbox_metadata( for record in record_group: record["mailboxType"] = mailbox_purpose_raw record["isSharedMailbox"] = is_shared_mailbox + # A resolved mailbox purpose (shared/room/equipment/user) means the + # mailbox exists, regardless of licensing/assignedPlans. + record["hasMailbox"] = True lookups_performed += 1 @@ -301,7 +424,7 @@ def fetch_all_employees( select_fields = ( "id,displayName,jobTitle,department,mail,userPrincipalName,mobilePhone," "businessPhones,officeLocation,city,state,country,usageLocation,streetAddress," - "postalCode,employeeHireDate,accountEnabled,userType,assignedLicenses" + "postalCode,employeeHireDate,accountEnabled,userType,assignedLicenses,assignedPlans,showInAddressList" ) users_url = ( f"{GRAPH_API_ENDPOINT}/users?$select={select_fields}" @@ -399,7 +522,9 @@ def fetch_all_employees( "licenseSkuIds": license_sku_ids, "mailboxType": None, "isSharedMailbox": None, - "managerId": user.get("manager", {}).get("id") if user.get("manager") else None, + "hiddenFromAddressLists": user.get("showInAddressList") is False, + "hasMailbox": _has_exchange_mailbox(user), + "hasManager": bool(user.get("manager", {}).get("id") if user.get("manager") else None), "children": [], } filtered_users.append(base_record) @@ -471,6 +596,9 @@ def fetch_all_employees( "licenseSkuIds": list(license_sku_ids), "mailboxType": None, "isSharedMailbox": None, + "hiddenFromAddressLists": user.get("showInAddressList") is False, + "hasMailbox": _has_exchange_mailbox(user), + "hasManager": bool(user.get("manager", {}).get("id") if user.get("manager") else None), } ) users_url = data.get("@odata.nextLink") @@ -561,7 +689,7 @@ def collect_last_login_records(*, token: Optional[str] = None) -> list[dict]: base_fields = ( "id,displayName,jobTitle,department,mail,userPrincipalName," - "signInActivity,accountEnabled,userType,assignedLicenses,country" + "signInActivity,accountEnabled,userType,assignedLicenses,assignedPlans,country,state,showInAddressList" ) def build_users_url(select_fields: str) -> str: @@ -664,6 +792,7 @@ def _map_licenses(license_entries: Optional[Iterable[dict]]) -> Tuple[list[str], "department": user.get("department") or "No Department", "email": user.get("mail") or user.get("userPrincipalName") or "", "country": user.get("country") or "", + "state": user.get("state") or "", "accountEnabled": user.get("accountEnabled", True), "userType": (user.get("userType") or "").lower(), "licenseCount": len(sku_ids), @@ -678,6 +807,8 @@ def _map_licenses(license_entries: Optional[Iterable[dict]]) -> Tuple[list[str], "lastNonInteractiveSignIn": _format_datetime(last_non_interactive), "daysSinceNonInteractiveSignIn": int((now_utc - last_non_interactive).days) if last_non_interactive else None, "neverSignedIn": not observed_dates, + "hiddenFromAddressLists": user.get("showInAddressList") is False, + "hasMailbox": _has_exchange_mailbox(user), } records.append(record) @@ -706,7 +837,7 @@ def _collect_disabled_users(*, token: Optional[str] = None) -> list[dict]: select_fields = ( "id,displayName,jobTitle,department,mail,userPrincipalName,mobilePhone," "businessPhones,officeLocation,city,state,country,usageLocation,streetAddress," - "postalCode,employeeHireDate,employeeLeaveDateTime,accountEnabled,userType,assignedLicenses" + "postalCode,employeeHireDate,employeeLeaveDateTime,accountEnabled,userType,assignedLicenses,assignedPlans,showInAddressList" ) users_url = f"{GRAPH_API_ENDPOINT}/users?$select={select_fields}&$filter=accountEnabled eq false" @@ -774,6 +905,8 @@ def _collect_disabled_users(*, token: Optional[str] = None) -> list[dict]: "hireDate": datetime_to_iso(hire_date) if hire_date else None, "disabledDate": disabled_iso, "disabledDays": calculate_days_since(disabled_at), + "hiddenFromAddressLists": user.get("showInAddressList") is False, + "hasMailbox": _has_exchange_mailbox(user), } ) users_url = data.get("@odata.nextLink") diff --git a/simple_org_chart/reports.py b/simple_org_chart/reports.py index 108130b..17a076c 100644 --- a/simple_org_chart/reports.py +++ b/simple_org_chart/reports.py @@ -23,6 +23,8 @@ FILTERED_LICENSE_FILE = str(app_config.FILTERED_LICENSE_FILE) FILTERED_USERS_FILE = str(app_config.FILTERED_USERS_FILE) MISSING_PHOTO_FILE = str(app_config.MISSING_PHOTO_FILE) +DIRTY_DATA_FILE = str(app_config.DIRTY_DATA_FILE) +MISSING_HIRE_DATE_FILE = str(app_config.MISSING_HIRE_DATE_FILE) class ReportCacheManager: @@ -92,6 +94,14 @@ def load_missing_photo_data(cache: ReportCacheManager, *, force_refresh: bool = ) +def load_missing_hire_date_data(cache: ReportCacheManager, *, force_refresh: bool = False): + return cache.load_json( + MISSING_HIRE_DATE_FILE, + refresh=force_refresh, + description="missing hire date report cache", + ) + + def load_disabled_license_data(cache: ReportCacheManager, *, force_refresh: bool = False): return cache.load_json( DISABLED_LICENSE_FILE, @@ -241,6 +251,12 @@ def apply_last_login_filters( include_never_signed_in: bool = True, inactive_days: Optional[str] = None, inactive_days_max: Optional[str] = None, + include_hidden_from_address_list: bool = True, + include_visible_in_address_list: bool = True, + include_with_mailbox: bool = True, + include_without_mailbox: bool = True, + include_with_manager: bool = True, + include_without_manager: bool = True, ): if not records: return [] @@ -311,6 +327,33 @@ def apply_last_login_filters( if days_since is None or days_since > inactive_max_threshold: continue + # A shared/room/equipment mailbox is a mailbox by definition. + has_mailbox = bool(record.get("hasMailbox", True)) or is_shared_mailbox or is_room_equipment_mailbox + + # GAL visibility is a mailbox concept; the two populations are filtered independently. + # Mailbox users: apply the hidden/visible toggles directly. + # Non-mailbox users: they have no GAL status; exclude them when either GAL toggle is + # active (narrowing to a specific group) so they don't bleed into GAL-specific results. + if has_mailbox: + hidden = bool(record.get("hiddenFromAddressLists")) + if hidden and not include_hidden_from_address_list: + continue + if not hidden and not include_visible_in_address_list: + continue + elif not include_hidden_from_address_list or not include_visible_in_address_list: + continue + + if has_mailbox and not include_with_mailbox: + continue + if not has_mailbox and not include_without_mailbox: + continue + + has_manager = bool(record.get("hasManager", True)) + if has_manager and not include_with_manager: + continue + if not has_manager and not include_without_manager: + continue + filtered.append(record) return filtered @@ -328,6 +371,12 @@ def apply_filtered_user_filters( include_unlicensed: bool = True, include_members: bool = True, include_guests: bool = True, + include_hidden_from_address_list: bool = True, + include_visible_in_address_list: bool = True, + include_with_mailbox: bool = True, + include_without_mailbox: bool = True, + include_with_manager: bool = True, + include_without_manager: bool = True, ): if not records: return [] @@ -363,6 +412,33 @@ def apply_filtered_user_filters( if user_type == "member" and not include_members: continue + # A shared/room/equipment mailbox is a mailbox by definition. + has_mailbox = bool(record.get("hasMailbox", True)) or is_shared_mailbox or is_room_equipment_mailbox + + # GAL visibility is a mailbox concept; the two populations are filtered independently. + # Mailbox users: apply the hidden/visible toggles directly. + # Non-mailbox users: they have no GAL status; exclude them when either GAL toggle is + # active (narrowing to a specific group) so they don't bleed into GAL-specific results. + if has_mailbox: + hidden = bool(record.get("hiddenFromAddressLists")) + if hidden and not include_hidden_from_address_list: + continue + if not hidden and not include_visible_in_address_list: + continue + elif not include_hidden_from_address_list or not include_visible_in_address_list: + continue + + if has_mailbox and not include_with_mailbox: + continue + if not has_mailbox and not include_without_mailbox: + continue + + has_manager = bool(record.get("hasManager", True)) + if has_manager and not include_with_manager: + continue + if not has_manager and not include_without_manager: + continue + filtered.append(record) return filtered @@ -380,6 +456,12 @@ def apply_missing_manager_filters( include_unlicensed: bool = True, include_members: bool = True, include_guests: bool = True, + include_hidden_from_address_list: bool = True, + include_visible_in_address_list: bool = True, + include_with_mailbox: bool = True, + include_without_mailbox: bool = True, + include_with_manager: bool = True, + include_without_manager: bool = True, ): return apply_filtered_user_filters( records, @@ -392,6 +474,51 @@ def apply_missing_manager_filters( include_unlicensed=include_unlicensed, include_members=include_members, include_guests=include_guests, + include_hidden_from_address_list=include_hidden_from_address_list, + include_visible_in_address_list=include_visible_in_address_list, + include_with_mailbox=include_with_mailbox, + include_without_mailbox=include_without_mailbox, + include_with_manager=include_with_manager, + include_without_manager=include_without_manager, + ) + + +def apply_missing_hire_date_filters( + records: Optional[Sequence[dict]], + *, + include_user_mailboxes: bool = True, + include_shared_mailboxes: bool = True, + include_room_equipment_mailboxes: bool = True, + include_enabled: bool = True, + include_disabled: bool = True, + include_licensed: bool = True, + include_unlicensed: bool = True, + include_members: bool = True, + include_guests: bool = True, + include_hidden_from_address_list: bool = True, + include_visible_in_address_list: bool = True, + include_with_mailbox: bool = True, + include_without_mailbox: bool = True, + include_with_manager: bool = True, + include_without_manager: bool = True, +): + return apply_filtered_user_filters( + records, + include_user_mailboxes=include_user_mailboxes, + include_shared_mailboxes=include_shared_mailboxes, + include_room_equipment_mailboxes=include_room_equipment_mailboxes, + include_enabled=include_enabled, + include_disabled=include_disabled, + include_licensed=include_licensed, + include_unlicensed=include_unlicensed, + include_members=include_members, + include_guests=include_guests, + include_hidden_from_address_list=include_hidden_from_address_list, + include_visible_in_address_list=include_visible_in_address_list, + include_with_mailbox=include_with_mailbox, + include_without_mailbox=include_without_mailbox, + include_with_manager=include_with_manager, + include_without_manager=include_without_manager, ) @@ -407,6 +534,12 @@ def apply_missing_photo_filters( include_unlicensed: bool = True, include_members: bool = True, include_guests: bool = True, + include_hidden_from_address_list: bool = True, + include_visible_in_address_list: bool = True, + include_with_mailbox: bool = True, + include_without_mailbox: bool = True, + include_with_manager: bool = True, + include_without_manager: bool = True, ): return apply_filtered_user_filters( records, @@ -419,6 +552,12 @@ def apply_missing_photo_filters( include_unlicensed=include_unlicensed, include_members=include_members, include_guests=include_guests, + include_hidden_from_address_list=include_hidden_from_address_list, + include_visible_in_address_list=include_visible_in_address_list, + include_with_mailbox=include_with_mailbox, + include_without_mailbox=include_without_mailbox, + include_with_manager=include_with_manager, + include_without_manager=include_without_manager, ) @@ -431,17 +570,20 @@ def apply_tagpicker_filters( filter_departments_mode: str = "exclude", filter_countries: Optional[List[str]] = None, filter_countries_mode: str = "exclude", + filter_states: Optional[List[str]] = None, + filter_states_mode: str = "exclude", ) -> List[dict]: - """Apply optional title/department/country include/exclude filters.""" + """Apply optional title/department/country/state include/exclude filters.""" if not records: return [] title_set = {v.strip().lower() for v in (filter_titles or []) if v and v.strip()} dept_set = {v.strip().lower() for v in (filter_departments or []) if v and v.strip()} country_set = {v.strip().lower() for v in (filter_countries or []) if v and v.strip()} + state_set = {v.strip().lower() for v in (filter_states or []) if v and v.strip()} # Nothing to filter - if not title_set and not dept_set and not country_set: + if not title_set and not dept_set and not country_set and not state_set: return list(records) filtered: List[dict] = [] @@ -449,6 +591,7 @@ def apply_tagpicker_filters( title_val = (record.get("title") or "").strip().lower() dept_val = (record.get("department") or "").strip().lower() country_val = (record.get("country") or "").strip().lower() + state_val = (record.get("state") or "").strip().lower() if title_set: matched = title_val in title_set @@ -471,6 +614,153 @@ def apply_tagpicker_filters( if filter_countries_mode == "exclude" and matched: continue + if state_set: + matched = state_val in state_set + if filter_states_mode == "include" and not matched: + continue + if filter_states_mode == "exclude" and matched: + continue + + filtered.append(record) + + return filtered + + + +# --------------------------------------------------------------------------- +# Dirty data detection +# --------------------------------------------------------------------------- + +_DIRTY_DATA_FIELDS = [ + ("name", "Name"), + ("title", "Title"), + ("department", "Department"), + ("email", "Email"), + ("phone", "Mobile Phone"), + ("businessPhone", "Business Phone"), + ("location", "Office Location"), + ("city", "City"), + ("state", "State / Province"), + ("country", "Country"), + ("usageLocation", "Usage Location"), +] + + +def _check_field_whitespace(value: object) -> Optional[str]: + """Return a human-readable issue string if *value* has whitespace problems.""" + if not isinstance(value, str) or not value: + return None + if value != value.strip(): + return "leading/trailing spaces" + if " " in value: + return "consecutive spaces" + return None + + +def _check_email(value: object) -> list: + """Return a list of issue strings for email-specific problems.""" + if not isinstance(value, str) or not value: + return [] + domain = value.lower().rsplit("@", 1)[-1] + if domain == "onmicrosoft.com" or domain.endswith(".onmicrosoft.com"): + return [] + problems = [] + ws = _check_field_whitespace(value) + if ws: + problems.append(ws) + if value != value.lower(): + problems.append("uppercase letters") + return problems + + +def detect_dirty_data_records(employees: Iterable[dict]) -> List[dict]: + """Scan employee records for fields with whitespace data-quality issues. + + Returns records that have at least one issue, each augmented with an + ``issues`` list of ``{"field":