...is a prototype headless-browser orchestration server (and improper Latin for "seize the data").
I spent 5 months with a process-automation team. They ran an amazing piece of flowcharting software on a farm of VMs, to simulate mouse-clicks and keystrokes - and run processes automatically (eg data-entry into web-based business systems), without paying humans to perform them. This system was an at-once beguiling and stupefying conflation of numerous bizarre and anachronistic technologies; Win32-spying, remote DOM-inspection, VB6-style expression functions, .NET Remoting, and even the obscure "Visual J#"!
But it was precarious. And slow. Browsers would become detached during execution. It would completely & inexplicably freeze-up during use. It couldn't effectively handle multiple browser-windows. XPath-mapped elements would become unfindable after UI-updates. Subtle differences in environment would cause certain unattended executions to fail, and it would be near-impossible to catch what went wrong. Data would be perilously plucked from inconsistently-formatted excel spreadsheets, and run through fragile type-coercion. I wanted to do better.
→ This prototype proved that a 5-minute Blue Prism process could be executed in under 20 seconds with JavaScript.
Imagine: instead of 50 VMs; one server, and 50 headless browsers.
An easy-to-use UI with a library of pre-defined baps (browser-automated processes - eg entering business-data and scraping some output), and live-previewing & interaction with the browser-pool. A http-API for triggering & scheduling bap executions on browsers from the pool. Robustly-implemented processes with watertight javascript, using playwright to manipulate the DOM directly, instead of prodding at the UI from above. Execution traces capturing precise screenshots & DOM-state at each stage. Consistent, schema-validated process input- and output-data. (Oh, and you'd save ~£500,000 on Blue Prism licensing costs too).
If only this one had come up in the A-level...
- For a process to run as robustly as possible, it needs to interface with the system at the lowest available layer; javascript enables direct manipulation of the DOM underlying the UI.
- To use Blue Prism the most effectively, you need in-depth programming knowledge (ie Visual Basic, webpage structuring, and http APIs mechanisms). But if you have this, why remain tied to Blue Prism? You could escape the sluggishness, precarity, and extortionate cost - in exchange for free, unfettered, democratised code.
- Suffice it to say, there exists a skills-gap between the disciplines of blue-prism-operation and playwright-scripting; a skills-gap likely to take some time to bridge in most working environments.
Amongst the most important code is...
- cd-server entrypoint: main.ts
- bap-execution logic: bap-execution.ts
- http API endpoints: api/index.ts
- this tool for playwright development: npmjs.com/playwright-live-interpreter
- download node-js and a zip of this repository
- run
npm iinsrc/andcd-base/bap-library/ - download chromium binaries, and put
chrome.exeet cetera incd-base/chromium/bin/ - then run, using...
cd src/carpe-datum-service/&&npx tsx main.ts- run trigger-bap-execution-demo.js to run the google-search-demo bap
- A cd-server runs the
carpe-datum-service, which listens for bap-execution requests on a http API. - The server maintains a pool of headless chromium instances, which are
comandeer()edandrelinquish()edas required. - The server has a bap-library (a folder of playwright-scripts and process-data schema definitions, for different browser-based processes).
- A client somewhere makes a
*start-newexecution POST request; this contains execution-parameters (eg whether to use a headed/headless browser) and process-input-data (eg the string to inject into the google-search box). The client can then make a*wait-for-exitlong-polling request, to determine when the bap-execution has finished. - On receipt of a
*start-newexecution request (egPOST /api/baps/google-search-demo/executions/*start-new), the server validates the input-data against the schema defined for the specified bap, and creates a new execution folder, with an.execution-in-progressflag file. A bap-execution-worker process is instanciated (egbap-execution-worker --cd-base-dir="..." --bap-name="google-search-demo" --execution-id="67f36fb23c9e" --target-browser-cdp-endpoint="http://localhost:9294"), and the server vigilantly captures this child-process's stdout/err and exit-code. - After execution, the execution endpoint (eg
GET /api/baps/google-search-demo/executions/67f36fb23c9e) returns an object describing the execution-duration, -exit-reason, -error-state, and any process-output-data (eg a value scraped from the webpage).
In other words, this prototype provides an interface for a process's input- and output-data, which is completely abstracted from the nitty-gritty of the process's execution. You don't have to see the process - and it doesn't even have to run on your computer; as long as it's robustly implemented in JavaScript, it can heedfully process as much data as you fancy, without you touching it once.
Do not watch Ben's javascript-in-rpa video. He is very embarrassed about it. Watch these marginally better ones instead.
Ben Mullan 2024
/images/cd-bap-library.png)
/images/cd-browser-pool.png)
/images/cd-dev-clear-code.png)