diff --git a/README.md b/README.md index 72ab50a..ec7ea2d 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,6 @@ High-performance rust CLI that connects to an existing Chrome browser via the De [![crates.io](https://img.shields.io/crates/v/chrome-devtools-cli.svg)](https://crates.io/crates/chrome-devtools-cli) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE) -[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/aeroxy/chrome-devtools-cli) ## Installation diff --git a/skill/chrome-devtools/CUSTOM_SCRIPTING.md b/skill/chrome-devtools/CUSTOM_SCRIPTING.md new file mode 100644 index 0000000..606e720 --- /dev/null +++ b/skill/chrome-devtools/CUSTOM_SCRIPTING.md @@ -0,0 +1,138 @@ +# Custom Scripting & Adapters Guide + +This guide details how to create and execute custom JavaScript scripts (`run-script`) and custom domain-aware adapters (`adapter`) using the Chrome DevTools CLI. + +--- + +## 1. Custom Scripts (`run-script`) + +`run-script` reads a local JavaScript file, wraps it inside an Immediately Invoked Function Expression (IIFE), and evaluates it directly inside the target browser's page context. + +### Flexible Argument Syntax +Dynamic arguments passed to the script can be specified in several styles and are automatically parsed and made available inside `ctx.args`. Note that raw positional values (styles 1 & 2 below) must come after a literal `--`, and any options like `--output`/`--track-navigation` must be given *before* it: + +1. **Named Style via `-a`/`--arg` (Recommended):** + Pass one or more `key=value` pairs with the repeatable `-a`/`--arg` flag. This form doesn't need a `--` separator: + ```bash + chrome-devtools run-script search_hn.js -a query="Rust" + ``` +2. **Pure Positional Style:** + Append raw positional strings after `--`. A single trailing positional argument is automatically mapped to `ctx.args.query` (as well as `ctx.args._0`): + ```bash + chrome-devtools run-script search_hn.js -- "Rust" + ``` +3. **Hybrid Style (Positional + Named, after `--`):** + ```bash + chrome-devtools run-script search_hn.js -- "Rust" limit=10 safeSearch=true + ``` + +### Comment-based Auto-Navigation +By declaring a standard `// @url ` or `// @navigate ` comment marker at the top of your script file, the CLI will check the active tab's current URL before executing your script. + +If the active tab is not currently on a domain matching the target URL, **the CLI will automatically navigate the tab to the target URL first**, wait for the page to load, and then execute your script. You can use `{arg_name}` placeholders inside the `@url` template to interpolate CLI arguments dynamically: +```javascript +// @url https://hn.algolia.com/?query={query} +``` + +--- + +## 2. Custom Domain-Aware Adapters (`adapter`) + +`adapter` reads a local custom JS adapter file, parses the target `@domain` JSDoc markers, and ensures the browser is on a matching domain before invoking a specific named function inside the script. + +### Domain Protection and Auto-Navigation +By declaring standard `@domain` markers at the top of your adapter file, the CLI checks the active page URL before executing your function. If the active tab is not on the target domain, **it automatically navigates the tab to the first target domain**, waits for it to load, and then runs your adapter. + +```javascript +// ==UserAdapter== +// @name Hacker News Search Adapter +// @domain hn.algolia.com +// ==/UserAdapter== +``` + +--- + +## 3. Injected Helper Context (`ctx`) + +Both `run-script` and `adapter` functions are passed an injected `ctx` context containing standard helper utilities: + +* `ctx.args`: Object containing typed key-value arguments. +* `ctx.wait(ms)`: Sleep/delay utility (`await ctx.wait(1000)`). +* `ctx.waitForText(text, timeout_ms)`: Polls the page body text until the string is present (defaults to 30s). +* `ctx.waitForSelector(selector, timeout_ms)`: Polls until an element matching the CSS selector exists in the DOM. +* `ctx.click(selector)`: DOM clicking helper. +* `ctx.fill(selector, value)`: DOM value input helper. Highly compatible with stateful frameworks (like React, Vue, and Angular) as it overrides standard value setters and fires appropriate events. + +--- + +## 4. Real-World SPA Example (Hacker News Search) + +These real-world examples work on `hn.algolia.com`. + +### Script file (`skill/chrome-devtools/examples/search_hn.js`) +```javascript +// @url https://hn.algolia.com/?query={query} + +// search_hn.js +// Run with: chrome-devtools run-script skill/chrome-devtools/examples/search_hn.js -a query="Rust" +// +// run-script injects `ctx` and runs this file inside an async context. +// Setting `@url` above tells the CLI to automatically navigate to the pre-rendered query URL first! + +const query = ctx.args.query; +if (!query) { + throw new Error("Query argument is required. Pass it with '-a query=...'"); +} + +// Wait for results to update/load +await ctx.waitForSelector("article.Story", 10000); + +// Extract results +const results = Array.from(document.querySelectorAll("article.Story")).map(el => { + const titleEl = el.querySelector(".Story_title a"); + const metaEl = el.querySelector(".Story_meta"); + return { + title: titleEl?.innerText.trim() || "", + meta: metaEl?.innerText.trim() || "", + url: titleEl?.href || "" + }; +}); + +return results; +``` + +### Adapter file (`skill/chrome-devtools/examples/hn_adapter.js`) +```javascript +// ==UserAdapter== +// @name Hacker News Search Adapter +// @domain hn.algolia.com +// ==/UserAdapter== + +// Run with: chrome-devtools adapter skill/chrome-devtools/examples/hn_adapter.js search -a query="Rust" + +async function search(ctx) { + const query = ctx.args.query; + if (!query) throw new Error("query argument is required"); + + // Fill search input (the SPA will fetch and render results dynamically) + await ctx.fill("input.SearchInput", query); + + // Wait a brief moment for React and the network request to resolve and update the DOM + await ctx.wait(1500); + + // Wait for results to update/load + await ctx.waitForSelector("article.Story", 10000); + + const results = Array.from(document.querySelectorAll("article.Story")).map(el => { + const titleEl = el.querySelector(".Story_title a"); + const metaEl = el.querySelector(".Story_meta"); + return { + title: titleEl?.innerText.trim() || "", + meta: metaEl?.innerText.trim() || "", + url: titleEl?.href || "" + }; + }); + + return results; +} +``` diff --git a/skill/chrome-devtools/SKILL.md b/skill/chrome-devtools/SKILL.md index c6f9b49..3a9335f 100644 --- a/skill/chrome-devtools/SKILL.md +++ b/skill/chrome-devtools/SKILL.md @@ -56,7 +56,7 @@ chrome-devtools --page 0 navigate https://example.com - **Navigation**: `navigate`, `navigate --back`, `navigate --forward`, `navigate --reload` - **Page management**: `list-pages`, `new-page`, `close-page`, `select-page` -- **Extraction**: `screenshot`, `snapshot` (accessibility tree), `evaluate` (JavaScript), `read-page` (page content as markdown) +- **Extraction**: `screenshot`, `snapshot` (accessibility tree), `evaluate` (JavaScript), `read-page` (page content as markdown), `run-script` (run local JS file), `adapter` (run site adapter) - **Interaction**: `click`, `fill`, `type-text`, `press-key`, `hover`, `click-at` - **Emulation**: `emulate` (viewport, mobile, geolocation, URL blocking) - **Inspection**: `console` (logs), `network` (requests), `sw-logs` (extension service workers) @@ -311,6 +311,28 @@ chrome-devtools --target warm-squid read-page --json - `read-page` — you want the page's textual content as readable markdown (articles, docs, wiki pages). Best for summarization, extraction, or feeding content to an LLM. - `snapshot` — you need the full accessibility tree with element IDs, roles, and interactive elements. Best for understanding page structure and finding elements to click/fill. +### Pattern 13: Local JS Scripting (run-script) + +Evaluate a local JavaScript file inside the page context. Dynamic arguments can be passed as raw positional values at the end of the command or via `-a/--arg` keys, and are automatically typed and injected into the execution context as `ctx.args`. Supports comment-based `@url` auto-navigation. + +See the dedicated [Custom Scripting Guide](./CUSTOM_SCRIPTING.md) for full documentation on script creation, argument parsing, and auto-navigation. + +```bash +# Run a script with trailing positional arguments (auto-navigates if @url is present) +chrome-devtools --target warm-squid run-script skill/chrome-devtools/examples/search_hn.js -- "Rust" +``` + +### Pattern 14: Custom Domain-Aware Adapters (adapter) + +Run site-specific adapter actions. If the browser is not currently on a matching domain (as defined by `@domain` comments in the JSDoc header), the CLI auto-navigates to that domain first. + +See the dedicated [Custom Scripting Guide](./CUSTOM_SCRIPTING.md) for full documentation on custom adapters, domain protection, and argument parsing. + +```bash +# Run an adapter function with positional args (auto-navigates if target domain is mismatch) +chrome-devtools --target warm-squid adapter skill/chrome-devtools/examples/hn_adapter.js search -- "Rust" +``` + ## Complete Command Reference ### Navigation @@ -366,6 +388,12 @@ chrome-devtools --target list-3p-tools chrome-devtools --target execute-3p-tool '' ``` +### Custom Scripting & Adapters +```bash +chrome-devtools --target run-script [--arg key=value] [--output ] [--track-navigation] +chrome-devtools --target adapter [--arg key=value] [--output ] [--track-navigation] +``` + ### Daemon ```bash chrome-devtools kill-daemon # stop the background daemon process diff --git a/skill/chrome-devtools/examples/hn_adapter.js b/skill/chrome-devtools/examples/hn_adapter.js new file mode 100644 index 0000000..8c05a62 --- /dev/null +++ b/skill/chrome-devtools/examples/hn_adapter.js @@ -0,0 +1,32 @@ +// ==UserAdapter== +// @name Hacker News Search Adapter +// @domain hn.algolia.com +// ==/UserAdapter== + +// Run with: chrome-devtools adapter skill/chrome-devtools/examples/hn_adapter.js search -a query="Rust" + +async function search(ctx) { + const query = ctx.args.query; + if (!query) throw new Error("query argument is required"); + + // Fill search input (the SPA will fetch and render results dynamically) + await ctx.fill("input.SearchInput", query); + + // Wait a brief moment for React and the network request to resolve and update the DOM + await ctx.wait(1500); + + // Wait for results to update/load + await ctx.waitForSelector("article.Story", 10000); + + const results = Array.from(document.querySelectorAll("article.Story")).map(el => { + const titleEl = el.querySelector(".Story_title a"); + const metaEl = el.querySelector(".Story_meta"); + return { + title: titleEl?.innerText.trim() || "", + meta: metaEl?.innerText.trim() || "", + url: titleEl?.href || "" + }; + }); + + return results; +} diff --git a/skill/chrome-devtools/examples/search_hn.js b/skill/chrome-devtools/examples/search_hn.js new file mode 100644 index 0000000..7cb11fc --- /dev/null +++ b/skill/chrome-devtools/examples/search_hn.js @@ -0,0 +1,28 @@ +// @url https://hn.algolia.com/?query={query} + +// search_hn.js +// Run with: chrome-devtools run-script skill/chrome-devtools/examples/search_hn.js -a query="Rust" +// +// run-script injects `ctx` and runs this file inside an async context. +// Setting `@url` above tells the CLI to automatically navigate to the pre-rendered query URL first! + +const query = ctx.args.query; +if (!query) { + throw new Error("Query argument is required. Pass it with '-a query=...'"); +} + +// Wait for results to update/load +await ctx.waitForSelector("article.Story", 10000); + +// Extract results +const results = Array.from(document.querySelectorAll("article.Story")).map(el => { + const titleEl = el.querySelector(".Story_title a"); + const metaEl = el.querySelector(".Story_meta"); + return { + title: titleEl?.innerText.trim() || "", + meta: metaEl?.innerText.trim() || "", + url: titleEl?.href || "" + }; +}); + +return results; diff --git a/src/commands/evaluate.rs b/src/commands/evaluate.rs index cd719ef..d8ce78c 100644 --- a/src/commands/evaluate.rs +++ b/src/commands/evaluate.rs @@ -2,6 +2,7 @@ use anyhow::Result; use serde_json::json; use crate::cdp::CdpClient; +use crate::constants::POLL_INTERVAL_MS; use crate::format::{format_structured, OutputFormat}; use crate::result::CommandResult; @@ -43,16 +44,14 @@ pub async fn evaluate( let desc = exception["exception"]["description"] .as_str() .unwrap_or(text); - anyhow::bail!( - "{desc}\n\n[HINT: To explore the page DOM, use the `snapshot` command instead of `evaluate`. To interact with elements, use `click` or `fill`.]" - ); + anyhow::bail!("{desc}"); } let value = &result["result"]; let val_type = value["type"].as_str().unwrap_or("undefined"); let output_hint = if format.is_text() { - let mut text = match val_type { + let text = match val_type { "undefined" => "undefined".to_string(), "string" => value["value"].as_str().unwrap_or("").to_string(), _ => { @@ -64,13 +63,6 @@ pub async fn evaluate( } }; - if expression.contains("querySelector") - || expression.contains("document.body") - || expression.contains("getElementById") - || expression.contains("getElementsBy") - { - text.push_str("\n\n[HINT: Avoid using `evaluate` for DOM traversal. Use the `snapshot` command to get a clean accessibility tree of the page, then use `click` or `fill`.]"); - } text } else { let v = value.get("value").unwrap_or(value); @@ -91,3 +83,760 @@ pub async fn evaluate( .await?) } } + +/// Build the injected `ctx` automation-helper object shared by `run-script` and +/// `adapter`. +/// +/// The returned snippet declares `const ctx = {...}` and is meant to be embedded +/// at the top of an async IIFE, before user code runs. Both call sites reuse it +/// so the helper surface stays in lockstep. +fn build_ctx_object(args_str: &str) -> String { + format!( + r#"const ctx = {{ + args: {args_str}, + wait: async (ms) => new Promise(r => setTimeout(r, ms)), + waitForText: async (text, timeout = 30000) => {{ + const start = Date.now(); + while (true) {{ + if (document.body && document.body.innerText.includes(text)) return; + if (timeout > 0 && Date.now() - start >= timeout) break; + await new Promise(r => setTimeout(r, {POLL_INTERVAL_MS})); + }} + throw new Error("Timeout waiting for text: " + text); + }}, + waitForSelector: async (selector, timeout = 30000) => {{ + const start = Date.now(); + while (true) {{ + if (document.querySelector(selector)) return; + if (timeout > 0 && Date.now() - start >= timeout) break; + await new Promise(r => setTimeout(r, {POLL_INTERVAL_MS})); + }} + throw new Error("Timeout waiting for selector: " + selector); + }}, + click: async (selector) => {{ + const el = document.querySelector(selector); + if (!el) throw new Error("Element not found: " + selector); + el.click(); + }}, + fill: async (selector, value) => {{ + const el = document.querySelector(selector); + if (!el) throw new Error("Element not found: " + selector); + // Look up the native property setter from the element's own + // prototype chain (rather than assigning directly) so that + // React/Vue-style frameworks, which override the setter to + // track state, still observe the update. + const setNativeProp = (element, prop, val) => {{ + let setter; + let proto = Object.getPrototypeOf(element); + while (proto) {{ + const desc = Object.getOwnPropertyDescriptor(proto, prop); + if (desc) {{ + setter = desc.set; + break; + }} + proto = Object.getPrototypeOf(proto); + }} + if (setter) {{ + setter.call(element, val); + }} else {{ + element[prop] = val; + }} + }}; + if (el.type === 'checkbox' || el.type === 'radio') {{ + // Checkboxes/radios toggle via `checked`, not `value`. A + // boolean (or "true"/"false") sets the state directly; any + // other value selects the input whose `value` it matches. + let checkedVal; + if (value === true || value === false) {{ + checkedVal = value; + }} else if (value === 'true' || value === 'false') {{ + checkedVal = value === 'true'; + }} else {{ + checkedVal = String(value) === el.value; + }} + setNativeProp(el, 'checked', checkedVal); + }} else if (el.isContentEditable) {{ + el.innerText = value; + }} else {{ + setNativeProp(el, 'value', value); + }} + el.dispatchEvent(new Event('input', {{ bubbles: true }})); + el.dispatchEvent(new Event('change', {{ bubbles: true }})); + }} + }};"# + ) +} + +fn url_encode(input: &str) -> String { + let mut encoded = String::new(); + for b in input.bytes() { + match b { + b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'-' | b'_' | b'.' | b'~' => { + encoded.push(b as char); + } + _ => { + encoded.push_str(&format!("%{:02X}", b)); + } + } + } + encoded +} + +/// Extract the `@url` / `@navigate` auto-navigation target from a script's +/// leading comment block, if present. +/// +/// Only lines at the very top of the file that are comments (`//`, `/*`, or a +/// `*` JSDoc continuation line) are considered; scanning stops at the first +/// blank-then-non-comment line. A trailing `*/` on single-line block comments +/// (e.g. `/* @url https://example.com */`) is stripped so it isn't captured +/// as part of the URL. +fn parse_script_url_marker(content: &str) -> Option { + for line in content.lines() { + let trimmed = line.trim(); + if trimmed.is_empty() { + continue; + } + let comment = trimmed + .strip_prefix("//") + .or_else(|| trimmed.strip_prefix("/*")) + .or_else(|| trimmed.strip_prefix('*'))?; + + let mut comment = comment.trim(); + if let Some(stripped) = comment.strip_suffix("*/") { + comment = stripped.trim(); + } + + if let Some(rest) = comment.strip_prefix("@url") { + return Some(rest.trim().to_string()); + } + if let Some(rest) = comment.strip_prefix("@navigate") { + return Some(rest.trim().to_string()); + } + } + None +} + +/// Run a local JavaScript file inside the page context +pub async fn run_script( + client: &mut CdpClient, + session_id: &str, + file_path: &str, + script_args: &serde_json::Value, + format: OutputFormat, + output: Option<&str>, + track_navigation: bool, +) -> Result { + let script_content = tokio::fs::read_to_string(file_path) + .await + .map_err(|e| anyhow::anyhow!("Failed to read script file '{}': {}", file_path, e))?; + + // Perform auto-navigation if @url or @navigate comments exist at the top of the file + let target_url = parse_script_url_marker(&script_content); + + if let Some(ref url) = target_url { + // Interpolate {arg_name} placeholders from script_args + let mut interpolated_url = url.clone(); + if let Some(obj) = script_args.as_object() { + for (key, val) in obj { + let placeholder = format!("{{{}}}", key); + let val_str = match val { + serde_json::Value::String(s) => s.clone(), + other => other.to_string(), + }; + let encoded_val = url_encode(&val_str); + interpolated_url = interpolated_url.replace(&placeholder, &encoded_val); + } + } + + let nav_url = if interpolated_url.starts_with("http://") || interpolated_url.starts_with("https://") { + interpolated_url.clone() + } else if is_local_host(&interpolated_url) { + format!("http://{}", interpolated_url) + } else { + format!("https://{}", interpolated_url) + }; + + let current_url = client.current_url(session_id).await?; + if current_url.trim_end_matches('/') != nav_url.trim_end_matches('/') { + eprintln!("[script] Current URL '{}' does not match target URL '{}'. Auto-navigating...", current_url, nav_url); + + crate::commands::navigate::navigate( + client, + session_id, + Some(&nav_url), + false, + false, + false, + None, + None, + ) + .await?; + + let post_nav_url = client.current_url(session_id).await?; + if post_nav_url.trim_end_matches('/') != nav_url.trim_end_matches('/') { + // Not a hard failure: sites commonly redirect (www., trailing + // slashes, locale/auth redirects, SPA router normalization), + // and `navigate()` already surfaces real navigation failures + // (CDP errors, load timeouts) before we get here. + eprintln!( + "[script] Warning: auto-navigation to '{}' resulted in URL '{}'. Continuing anyway...", + nav_url, post_nav_url + ); + } + } + } + + let args_str = serde_json::to_string(script_args)?; + let ctx = build_ctx_object(&args_str); + + let iife = format!( + r#"(async () => {{ + {ctx} + + {script_content} + }})()"# + ); + + evaluate(client, session_id, &iife, format, output, track_navigation).await +} + +/// Extract `@domain` JSDoc comments from a script. +/// +/// Only genuine metadata comment lines (`// @domain ...` or the `* @domain ...` +/// JSDoc continuation form) are honored. Matching a bare `@domain` substring +/// would otherwise pick up the marker from string literals or prose elsewhere +/// in the adapter source. +fn parse_adapter_domains(content: &str) -> Vec { + let mut domains = Vec::new(); + for line in content.lines() { + let trimmed = line.trim(); + if trimmed.is_empty() { + continue; + } + let comment = match trimmed + .strip_prefix("//") + .or_else(|| trimmed.strip_prefix("/*")) + .or_else(|| trimmed.strip_prefix('*')) + { + Some(rest) => rest.trim_start(), + None => { + // Leading ES module imports and "use strict" directives are + // common before the metadata comment block and shouldn't stop + // the scan; any other code still ends it (see + // test_parse_adapter_domains_block_comments_and_early_break). + if trimmed.starts_with("import ") || trimmed.starts_with("import(") { + continue; + } + if matches!(trimmed, "\"use strict\";" | "'use strict';" | "\"use strict\"" | "'use strict'") { + continue; + } + break; + } + }; + + let mut comment = comment; + if let Some(stripped) = comment.strip_suffix("*/") { + comment = stripped.trim(); + } + + if let Some(rest) = comment.strip_prefix("@domain") { + if rest.is_empty() || rest.starts_with(char::is_whitespace) { + let domain = rest.split_whitespace().next().unwrap_or(""); + if !domain.is_empty() { + domains.push(domain.to_string()); + } + } + } + } + domains +} + +/// Strip scheme, path, and port from a raw URL/host string, returning the bare +/// lowercased hostname. +fn normalize_host(raw: &str) -> String { + let lower = raw.trim().to_lowercase(); + let without_scheme = lower + .strip_prefix("https://") + .or_else(|| lower.strip_prefix("http://")) + .unwrap_or(&lower); + let host = without_scheme.split('/').next().unwrap_or(without_scheme); + let host = host.split('@').last().unwrap_or(host); + let host = if host.starts_with('[') { + if let Some(idx) = host.rfind(']') { + &host[..=idx] + } else { + host + } + } else if host.matches(':').count() > 1 { + host + } else { + host.split(':').next().unwrap_or(host) + }; + host.to_string() +} + +/// Detect loopback / local-dev hosts that should default to plain HTTP during +/// auto-navigation, since they typically don't serve HTTPS. +fn is_local_host(domain: &str) -> bool { + let host = normalize_host(domain); + host == "localhost" + || host == "127.0.0.1" + || host == "0.0.0.0" + || host == "[::1]" + || host == "::1" + || host.ends_with(".localhost") +} + +/// Check if a URL matches a domain pattern. +/// +/// Both sides are normalized to a bare hostname first, so an adapter `@domain` +/// written as `https://example.com` or `example.com/path` still matches the +/// page host instead of forcing a spurious auto-navigation. +fn url_matches_domain(url: &str, domain: &str) -> bool { + let host = normalize_host(url); + let domain = normalize_host(domain); + if domain.is_empty() { + return false; + } + + host == domain || host.ends_with(&format!(".{}", domain)) +} + +/// True for characters allowed inside a JavaScript identifier (after the first). +fn is_js_ident_char(c: char) -> bool { + c.is_ascii_alphanumeric() || c == '_' || c == '$' +} + +/// Validate that `name` is a plain JavaScript identifier. +/// +/// The adapter function name is interpolated directly into the injected IIFE, so +/// rejecting anything that isn't an identifier prevents both syntax errors and +/// code injection through a crafted `function_name`. +fn is_valid_js_identifier(name: &str) -> bool { + let mut chars = name.chars(); + match chars.next() { + // A leading digit (or any non-identifier-start char) is invalid. + Some(c) if c.is_ascii_alphabetic() || c == '_' || c == '$' => {} + _ => return false, + } + chars.all(is_js_ident_char) +} + +/// Normalize ES-module `export` keywords out of adapter source. +/// +/// Adapters are injected as statements into an async IIFE, where a top-level +/// `export` is a SyntaxError. The supported adapter format is plain function +/// declarations; this strips a leading `export` / `export default` so the common +/// authoring habit parses instead of failing before the function-existence check. +/// +/// The prefix is only stripped when it directly precedes a declaration keyword. +/// A bare same-line `export { ... };` list or `export default ;` +/// re-exports an already-declared binding rather than declaring anything new, +/// so the whole line is dropped instead. This avoids corrupting `export *` / +/// `export { ... } from '...'` re-export blocks (which this tool can't resolve +/// anyway) or stray `export` text inside multi-line strings/comments. +fn strip_export_keywords(content: &str) -> String { + const DECL_KEYWORDS: [&str; 6] = ["function", "async", "class", "const", "let", "var"]; + let declaration_follows = |rest: &str| { + let rest = rest.trim_start(); + DECL_KEYWORDS.iter().any(|kw| match rest.strip_prefix(kw) { + // The keyword must end at a non-identifier boundary so `constant` + // is not mistaken for `const`. + Some(after) => match after.chars().next() { + Some(c) => !is_js_ident_char(c), + None => true, + }, + None => false, + }) + }; + + // A same-line `export { a, b as c };` list or `export default ident;` + // only re-exports bindings that are already declared elsewhere in the + // file, so it's safe to drop entirely rather than rewrite. + let is_bare_reexport = |trimmed: &str| -> bool { + if let Some(rest) = trimmed.strip_prefix("export {") { + let rest = rest.trim_end(); + return rest.ends_with('}') || rest.ends_with("};"); + } + if let Some(rest) = trimmed.strip_prefix("export default ") { + let ident = rest.trim_end().trim_end_matches(';').trim(); + return is_valid_js_identifier(ident); + } + false + }; + + content + .lines() + .map(|line| { + let trimmed = line.trim_start(); + let indent = &line[..line.len() - trimmed.len()]; + if is_bare_reexport(trimmed) { + return String::new(); + } + if let Some(rest) = trimmed.strip_prefix("export default ") { + if declaration_follows(rest) { + return format!("{indent}{rest}"); + } + } else if let Some(rest) = trimmed.strip_prefix("export ") { + if declaration_follows(rest) { + return format!("{indent}{rest}"); + } + } + line.to_string() + }) + .collect::>() + .join("\n") +} + +/// Run a structured custom adapter function inside the page context +pub async fn run_adapter( + client: &mut CdpClient, + session_id: &str, + file_path: &str, + function_name: &str, + script_args: &serde_json::Value, + format: OutputFormat, + output: Option<&str>, + track_navigation: bool, +) -> Result { + // `function_name` is interpolated straight into the injected IIFE, so reject + // anything that isn't a plain identifier before touching Chrome or the disk. + if !is_valid_js_identifier(function_name) { + anyhow::bail!( + "Invalid adapter function name '{}': must be a valid JavaScript identifier", + function_name + ); + } + + let script_content = tokio::fs::read_to_string(file_path) + .await + .map_err(|e| anyhow::anyhow!("Failed to read adapter file '{}': {}", file_path, e))?; + + // Perform domain protection + let domains = parse_adapter_domains(&script_content); + if !domains.is_empty() { + let current_url = client.current_url(session_id).await?; + let matched = domains.iter().any(|domain| url_matches_domain(¤t_url, domain)); + + if !matched { + let target_domain = &domains[0]; + // Preserve the host exactly as declared in `@domain`; only supply a + // scheme when one is missing. Forcing a `www.` subdomain breaks apex + // hosts and adapters that target an existing subdomain + // (e.g. `creator.xiaohongshu.com`). + let target_url = if target_domain.starts_with("http://") || target_domain.starts_with("https://") { + // An explicit scheme always wins, so authors can force http/https + // by writing it in `@domain` (e.g. `@domain http://localhost:3000`). + target_domain.clone() + } else if is_local_host(target_domain) { + // Local dev servers generally speak http, not https. + format!("http://{}", target_domain) + } else { + format!("https://{}", target_domain) + }; + eprintln!("[adapter] Current URL '{}' does not match adapter domains {:?}. Auto-navigating to '{}'...", current_url, domains, target_url); + + crate::commands::navigate::navigate( + client, + session_id, + Some(&target_url), + false, + false, + false, + None, + None, + ) + .await?; + + let post_nav_url = client.current_url(session_id).await?; + let post_matched = domains.iter().any(|domain| url_matches_domain(&post_nav_url, domain)); + if !post_matched { + anyhow::bail!( + "Auto-navigation to '{}' resulted in URL '{}' which does not match adapter domains {:?}", + target_url, + post_nav_url, + domains + ); + } + } + } + + // Normalize away `export` so module-style adapter declarations parse when + // injected as statements below. Domain parsing above used the raw source, + // which is unaffected (domains live in comments). + let script_content = strip_export_keywords(&script_content); + + let args_str = serde_json::to_string(script_args)?; + let ctx = build_ctx_object(&args_str); + + let iife = format!( + r#"(async () => {{ + {ctx} + + {script_content} + + if (typeof {function_name} !== 'function') {{ + throw new Error("Function '{function_name}' not found in adapter"); + }} + return await {function_name}(ctx); + }})()"# + ); + + evaluate(client, session_id, &iife, format, output, track_navigation).await +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_parse_adapter_domains() { + let content = r#" + // ==UserAdapter== + // @name Xiaohongshu Custom Adapter + // @domain xiaohongshu.com + // @domain creator.xiaohongshu.com + // ==/UserAdapter== + "#; + let domains = parse_adapter_domains(content); + assert_eq!(domains, vec!["xiaohongshu.com", "creator.xiaohongshu.com"]); + } + + #[test] + fn test_parse_adapter_domains_jsdoc_block() { + // The `* @domain` JSDoc continuation form is also honored. + let content = "/**\n * @domain example.com\n */"; + assert_eq!(parse_adapter_domains(content), vec!["example.com"]); + } + + #[test] + fn test_parse_adapter_domains_ignores_non_metadata() { + // Only genuine comment metadata lines count: string literals, prose, and + // tokens like `@domainname` must not be picked up. + let content = r#" + // @domain real.com + const note = "send mail to user@domain.com"; + // contact foo@domain.org for help + // @domainname not-a-real-marker.com + const x = "@domain inside-string.com"; + "#; + assert_eq!(parse_adapter_domains(content), vec!["real.com"]); + } + + #[test] + fn test_parse_adapter_domains_block_comments_and_early_break() { + // Block comment on a single line + let content = "/* @domain block-one.com */\nconst x = 1;\n// @domain ignored.com"; + assert_eq!(parse_adapter_domains(content), vec!["block-one.com"]); + + // Multi-line block comment with early break on code + let content = "/*\n * @domain block-two.com\n */\n\nconst y = 2;\n// @domain ignored2.com"; + assert_eq!(parse_adapter_domains(content), vec!["block-two.com"]); + } + + #[test] + fn test_parse_adapter_domains_skips_leading_imports_and_use_strict() { + // Leading ES imports / "use strict" directives shouldn't stop the scan + // for the metadata comment block that follows them. + let content = "import { foo } from 'bar';\n\"use strict\";\n// @domain example.com\nconst x = 1;\n// @domain ignored.com"; + assert_eq!(parse_adapter_domains(content), vec!["example.com"]); + } + + #[test] + fn test_parse_script_url_marker_line_comment() { + let content = "// @url https://example.com\nconst x = 1;"; + assert_eq!( + parse_script_url_marker(content), + Some("https://example.com".to_string()) + ); + } + + #[test] + fn test_parse_script_url_marker_navigate_alias() { + let content = "// @navigate https://example.com\nconst x = 1;"; + assert_eq!( + parse_script_url_marker(content), + Some("https://example.com".to_string()) + ); + } + + #[test] + fn test_parse_script_url_marker_single_line_block_comment() { + // A single-line block comment's trailing `*/` must not be captured as + // part of the URL. + let content = "/* @url https://example.com */\nconst x = 1;"; + assert_eq!( + parse_script_url_marker(content), + Some("https://example.com".to_string()) + ); + } + + #[test] + fn test_parse_script_url_marker_jsdoc_block() { + let content = "/**\n * @url https://example.com\n */\nconst x = 1;"; + assert_eq!( + parse_script_url_marker(content), + Some("https://example.com".to_string()) + ); + } + + #[test] + fn test_parse_script_url_marker_skips_leading_blank_lines() { + let content = "\n\n \n// @url https://example.com\nconst x = 1;"; + assert_eq!( + parse_script_url_marker(content), + Some("https://example.com".to_string()) + ); + } + + #[test] + fn test_parse_script_url_marker_stops_at_first_non_comment_line() { + // The marker only counts if it's part of the leading comment block; + // once code starts, scanning stops even if a later comment has one. + let content = "const x = 1;\n// @url https://example.com"; + assert_eq!(parse_script_url_marker(content), None); + } + + #[test] + fn test_parse_script_url_marker_absent() { + let content = "// just a regular comment\nconst x = 1;"; + assert_eq!(parse_script_url_marker(content), None); + } + + #[test] + fn test_strip_export_keywords() { + let src = "export async function ask(ctx) {}\n export function read() {}\nexport const helper = 1;\nexport default function main() {}\nconst x = \"export inside string\";"; + let out = strip_export_keywords(src); + assert_eq!( + out, + "async function ask(ctx) {}\n function read() {}\nconst helper = 1;\nfunction main() {}\nconst x = \"export inside string\";" + ); + } + + #[test] + fn test_strip_export_keywords_preserves_non_declarations() { + // `export * from` re-exports (which this tool can't resolve) and + // prose that merely starts with the word must be left untouched + // (only declarations and bare re-exports are handled). + let src = "export * from './x';\nexport const ok = 1;\nexport constants = 2;"; + let out = strip_export_keywords(src); + assert_eq!(out, "export * from './x';\nconst ok = 1;\nexport constants = 2;"); + } + + #[test] + fn test_strip_export_keywords_drops_bare_reexports() { + // `export { ... };` lists and `export default ;` only + // re-export an already-declared binding, so the whole line can be + // dropped rather than left behind as invalid top-level `export` syntax. + let src = "async function search(ctx) {}\nexport { search };\nexport default search;"; + let out = strip_export_keywords(src); + assert_eq!(out, "async function search(ctx) {}\n\n"); + } + + #[test] + fn test_is_valid_js_identifier() { + assert!(is_valid_js_identifier("ask")); + assert!(is_valid_js_identifier("_private")); + assert!(is_valid_js_identifier("$dollar")); + assert!(is_valid_js_identifier("readWiki2")); + assert!(!is_valid_js_identifier("")); + assert!(!is_valid_js_identifier("2fast")); + assert!(!is_valid_js_identifier("foo.bar")); + assert!(!is_valid_js_identifier("foo(); evil")); + assert!(!is_valid_js_identifier("foo bar")); + } + + #[test] + fn test_normalize_host() { + assert_eq!(normalize_host("http://user:pass@example.com/some/path"), "example.com"); + assert_eq!(normalize_host("http://user:pass@[::1]:8080"), "[::1]"); + assert_eq!(normalize_host("http://user:pass@127.0.0.1:8080"), "127.0.0.1"); + assert_eq!(normalize_host("https://foo:bar@localhost"), "localhost"); + assert_eq!(normalize_host("example.com"), "example.com"); + assert_eq!(normalize_host("http://example.com:3000/"), "example.com"); + } + + #[test] + fn test_is_local_host() { + assert!(is_local_host("localhost")); + assert!(is_local_host("localhost:3000")); + assert!(is_local_host("127.0.0.1:8080")); + assert!(is_local_host("[::1]")); + assert!(is_local_host("[::1]:8080")); + assert!(is_local_host("::1")); + assert!(is_local_host("app.localhost")); + assert!(is_local_host("http://localhost:5173/path")); + assert!(!is_local_host("example.com")); + assert!(!is_local_host("notlocalhost.com")); + } + + #[test] + fn test_url_matches_domain() { + assert!(url_matches_domain("https://www.xiaohongshu.com/explore", "xiaohongshu.com")); + assert!(url_matches_domain("http://creator.xiaohongshu.com", "creator.xiaohongshu.com")); + assert!(url_matches_domain("https://xiaohongshu.com:8080/path", "xiaohongshu.com")); + assert!(url_matches_domain("http://[::1]:3000", "[::1]")); + assert!(!url_matches_domain("https://google.com", "xiaohongshu.com")); + } + + #[test] + fn test_url_matches_domain_normalizes_domain() { + // `@domain` written with a scheme and/or path still matches the host. + assert!(url_matches_domain("https://www.example.com/page", "https://example.com")); + assert!(url_matches_domain("https://example.com/explore", "example.com/path")); + assert!(url_matches_domain("https://example.com", "http://example.com:443/")); + assert!(!url_matches_domain("https://example.com", "")); + } + + #[test] + fn test_url_encode() { + assert_eq!(url_encode("hello world"), "hello%20world"); + assert_eq!(url_encode("foo+bar"), "foo%2Bbar"); + assert_eq!(url_encode("a-z_A-Z_0-9"), "a-z_A-Z_0-9"); + } + + #[test] + fn test_build_ctx_object_embeds_args_and_helpers() { + let ctx = build_ctx_object(r#"{"query":"hi"}"#); + assert!(ctx.starts_with("const ctx = {")); + assert!(ctx.contains(r#"args: {"query":"hi"}"#)); + for helper in ["wait:", "waitForText:", "waitForSelector:", "click:", "fill:"] { + assert!(ctx.contains(helper), "missing helper: {helper}"); + } + // fill must special-case checkable inputs instead of setting `value`. + assert!(ctx.contains("el.type === 'checkbox' || el.type === 'radio'")); + assert!(ctx.contains("setNativeProp(el, 'checked'")); + // fill must support contenteditable elements. + assert!(ctx.contains("el.isContentEditable")); + assert!(ctx.contains("el.innerText =")); + // fill must look up the native property setter from the element's own + // prototype (not a hardcoded HTMLInputElement/etc. chain, which would + // throw on