Skip to main content

jsdom 中文网

jsdom 是许多 Web 标准的纯 JavaScript 实现,特别是 WHATWG DOMHTML 标准,用于 Node.js。通常,该项目的目标是模拟足够多的 Web 浏览器子集,以便用于测试和抓取现实世界的 Web 应用。

¥jsdom is a pure-JavaScript implementation of many web standards, notably the WHATWG DOM and HTML Standards, for use with Node.js. In general, the goal of the project is to emulate enough of a subset of a web browser to be useful for testing and scraping real-world web applications.

jsdom 的最新版本需要 Node.js v18 或更新版本。(v23 以下的 jsdom 版本仍可与以前的 Node.js 版本一起使用,但不受支持。)

¥The latest versions of jsdom require Node.js v18 or newer. (Versions of jsdom below v23 still work with previous Node.js versions, but are unsupported.)

基本用法

¥Basic usage

const jsdom = require("jsdom");
const { JSDOM } = jsdom;

要使用 jsdom,你将主要使用 JSDOM 构造函数,它是 jsdom 主模块的命名导出。将一个字符串传递给构造函数。你将返回一个 JSDOM 对象,该对象具有许多有用的属性,特别是 window

¥To use jsdom, you will primarily use the JSDOM constructor, which is a named export of the jsdom main module. Pass the constructor a string. You will get back a JSDOM object, which has a number of useful properties, notably window:

const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);
console.log(dom.window.document.querySelector("p").textContent); // "Hello world"

(请注意,jsdom 将像浏览器一样解析你传递给它的 HTML,包括隐含的 <html><head><body> 标签。)

¥(Note that jsdom will parse the HTML you pass it just like a browser does, including implied <html>, <head>, and <body> tags.)

生成的对象是 JSDOM 类的一个实例,除了 window 之外,它还包含许多有用的属性和方法。通常,它可用于对来自 "外部," 的 jsdom 执行操作,从而执行使用普通 DOM API 无法完成的操作。对于简单情况,你不需要任何此功能,我们建议使用如下编码模式

¥The resulting object is an instance of the JSDOM class, which contains a number of useful properties and methods besides window. In general, it can be used to act on the jsdom from the "outside," doing things that are not possible with the normal DOM APIs. For simple cases, where you don't need any of this functionality, we recommend a coding pattern like

const { window } = new JSDOM(`...`);
// or even
const { document } = (new JSDOM(`...`)).window;

有关你可以使用 JSDOM 类执行的所有操作的完整文档位于下面的“JSDOM 对象 API”部分。

¥Full documentation on everything you can do with the JSDOM class is below, in the section "JSDOM Object API".

自定义 jsdom

¥Customizing jsdom

JSDOM 构造函数接受第二个参数,该参数可用于通过以下方式自定义 jsdom。

¥The JSDOM constructor accepts a second parameter which can be used to customize your jsdom in the following ways.

简单选项

¥Simple options

const dom = new JSDOM(``, {
url: "https://example.org/",
referrer: "https://example.com/",
contentType: "text/html",
includeNodeLocations: true,
storageQuota: 10000000
});
  • url 设置 window.locationdocument.URLdocument.documentURI 返回的值,并影响文档内相对 URL 的解析以及获取子资源时使用的同源限制和引用站点等内容。它默认为 "about:blank"

    ¥url sets the value returned by window.location, document.URL, and document.documentURI, and affects things like resolution of relative URLs within the document and the same-origin restrictions and referrer used while fetching subresources. It defaults to "about:blank".

  • referrer 仅影响从 document.referrer 读取的值。它默认为无引用者(反映为空字符串)。

    ¥referrer just affects the value read from document.referrer. It defaults to no referrer (which reflects as the empty string).

  • contentType 影响从 document.contentType 读取的值,以及文档的解析方式:作为 HTML 或 XML。不是 HTML MIME 类型XML MIME 类型 的值将被抛出。它默认为 "text/html"。如果存在 charset 参数,它会影响 二进制数据处理

    ¥contentType affects the value read from document.contentType, as well as how the document is parsed: as HTML or as XML. Values that are not a HTML MIME type or an XML MIME type will throw. It defaults to "text/html". If a charset parameter is present, it can affect binary data processing.

  • includeNodeLocations 保留 HTML 解析器生成的位置信息,允许你使用 nodeLocation() 方法(如下所述)检索它。它还确保在 <script> 元素内运行的代码的异常堆栈跟踪中报告的行号是正确的。它默认为 false 以获得最佳性能,并且不能与 XML 内容类型一起使用,因为我们的 XML 解析器不支持位置信息。

    ¥includeNodeLocations preserves the location info produced by the HTML parser, allowing you to retrieve it with the nodeLocation() method (described below). It also ensures that line numbers reported in exception stack traces for code running inside <script> elements are correct. It defaults to false to give the best performance, and cannot be used with an XML content type since our XML parser does not support location info.

  • storageQuotalocalStoragesessionStorage 使用的单独存储区域的最大代码单元大小。尝试存储大于此限制的数据将导致抛出 DOMException。默认情况下,它设置为每个来源 5,000,000 个代码单元,这是受 HTML 规范的启发。

    ¥storageQuota is the maximum size in code units for the separate storage areas used by localStorage and sessionStorage. Attempts to store data larger than this limit will cause a DOMException to be thrown. By default, it is set to 5,000,000 code units per origin, as inspired by the HTML specification.

请注意,urlreferrer 在使用前都已规范化,因此例如如果你传入 "https:example.com",jsdom 会将其解释为你已传入 "https://example.com/"。如果你传递了无法解析的 URL,则调用将抛出。(URL 根据 URL 标准 进行解析和序列化。)

¥Note that both url and referrer are canonicalized before they're used, so e.g. if you pass in "https:example.com", jsdom will interpret that as if you had given "https://example.com/". If you pass an unparseable URL, the call will throw. (URLs are parsed and serialized according to the URL Standard.)

执行脚本

¥Executing scripts

jsdom 最强大的功能是它可以在 jsdom 内执行脚本。这些脚本可以修改页面的内容并访问 jsdom 实现的所有 Web 平台 API。

¥jsdom's most powerful ability is that it can execute scripts inside the jsdom. These scripts can modify the content of the page and access all the web platform APIs jsdom implements.

但是,在处理不受信任的内容时,这也非常危险。jsdom 沙箱并非万无一失,如果足够努力,在 DOM 的 <script> 中运行的代码可以访问 Node.js 环境,从而访问你的机器。因此,默认情况下禁用执行 HTML 中嵌入的脚本的功能:

¥However, this is also highly dangerous when dealing with untrusted content. The jsdom sandbox is not foolproof, and code running inside the DOM's <script>s can, if it tries hard enough, get access to the Node.js environment, and thus to your machine. As such, the ability to execute scripts embedded in the HTML is disabled by default:

const dom = new JSDOM(`<body>
<div id="content"></div>
<script>document.getElementById("content").append(document.createElement("hr"));</script>
</body>`);

// The script will not be executed, by default:
console.log(dom.window.document.getElementById("content").children.length); // 0

要启用页面内的执行脚本,可以使用 runScripts: "dangerously" 选项:

¥To enable executing scripts inside the page, you can use the runScripts: "dangerously" option:

const dom = new JSDOM(`<body>
<div id="content"></div>
<script>document.getElementById("content").append(document.createElement("hr"));</script>
</body>`, { runScripts: "dangerously" });

// The script will be executed and modify the DOM:
console.log(dom.window.document.getElementById("content").children.length); // 1

我们再次强调,仅在提供你知道安全的 jsdom 代码时才使用它。如果你将它用于任意用户提供的代码或来自互联网的代码,那么你实际上是在运行不受信任的 Node.js 代码,并且你的机器可能会受到威胁。

¥Again we emphasize to only use this when feeding jsdom code you know is safe. If you use it on arbitrary user-supplied code, or code from the Internet, you are effectively running untrusted Node.js code, and your machine could be compromised.

如果你想执行通过 <script src=""> 包含的外部脚本,你还需要确保它们加载它们。为此,请添加选项 resources: "usable" 如下所述。(你可能还想设置 url 选项,原因如上所述在那里。)

¥If you want to execute external scripts, included via <script src="">, you'll also need to ensure that they load them. To do this, add the option resources: "usable" as described below. (You'll likely also want to set the url option, for the reasons discussed there.)

事件处理程序属性(如 <div onclick="">)也受此设置控制;除非将 runScripts 设置为 "dangerously",否则它们将不起作用。(但是,事件处理程序属性(如 div.onclick = ...)无论 runScripts 如何都将起作用。)

¥Event handler attributes, like <div onclick="">, are also governed by this setting; they will not function unless runScripts is set to "dangerously". (However, event handler properties, like div.onclick = ..., will function regardless of runScripts.)

如果你只是想执行脚本 "从外部",而不是让 <script> 元素和事件处理程序属性运行 "从内部",则可以使用 runScripts: "outside-only" 选项,该选项允许在 window 上安装所有 JavaScript 规范提供的全局变量的新副本。这包括 window.Arraywindow.Promise 等。值得注意的是,它还包括 window.eval,它允许运行脚本,但使用 jsdom window 作为全局变量:

¥If you are simply trying to execute script "from the outside", instead of letting <script> elements and event handlers attributes run "from the inside", you can use the runScripts: "outside-only" option, which enables fresh copies of all the JavaScript spec-provided globals to be installed on window. This includes things like window.Array, window.Promise, etc. It also, notably, includes window.eval, which allows running scripts, but with the jsdom window as the global:

const dom = new JSDOM(`<body>
<div id="content"></div>
<script>document.getElementById("content").append(document.createElement("hr"));</script>
</body>`, { runScripts: "outside-only" });

// run a script outside of JSDOM:
dom.window.eval('document.getElementById("content").append(document.createElement("p"));');

console.log(dom.window.document.getElementById("content").children.length); // 1
console.log(dom.window.document.getElementsByTagName("hr").length); // 0
console.log(dom.window.document.getElementsByTagName("p").length); // 1

出于性能原因,此功能默认关闭,但可以安全启用。

¥This is turned off by default for performance reasons, but is safe to enable.

请注意,在默认配置中,如果不设置 runScriptswindow.Arraywindow.eval 等的值将与外部 Node.js 环境提供的值相同。也就是说,window.eval === eval 将保持不变,因此 window.eval 将无法以有用的方式运行脚本。

¥Note that in the default configuration, without setting runScripts, the values of window.Array, window.eval, etc. will be the same as those provided by the outer Node.js environment. That is, window.eval === eval will hold, so window.eval will not run scripts in a useful way.

我们强烈建议不要尝试通过将 jsdom 和 Node 全局环境混合在一起(例如通过执行 global.window = dom.window)来实现 "执行脚本",然后在 Node 全局环境中执行脚本或测试代码。相反,你应该像对待浏览器一样对待 jsdom,并使用 window.evalrunScripts: "dangerously" 运行所有需要访问 jsdom 环境中的 DOM 的脚本和测试。例如,这可能需要创建一个 browserify 包以作为 <script> 元素执行 - 就像在浏览器中一样。

¥We strongly advise against trying to "execute scripts" by mashing together the jsdom and Node global environments (e.g. by doing global.window = dom.window), and then executing scripts or test code inside the Node global environment. Instead, you should treat jsdom like you would a browser, and run all scripts and tests that need access to a DOM inside the jsdom environment, using window.eval or runScripts: "dangerously". This might require, for example, creating a browserify bundle to execute as a <script> element—just like you would in a browser.

最后,对于高级用例,你可以使用下面记录的 dom.getInternalVMContext() 方法。

¥Finally, for advanced use cases you can use the dom.getInternalVMContext() method, documented below.

假装是可视化浏览器

¥Pretending to be a visual browser

jsdom 没有渲染视觉内容的能力,默认情况下会像无头浏览器一样运行。它通过 document.hidden 等 API 向网页提供提示,提示其内容不可见。

¥jsdom does not have the capability to render visual content, and will act like a headless browser by default. It provides hints to web pages through APIs such as document.hidden that their content is not visible.

pretendToBeVisual 选项设置为 true 时,jsdom 将假装它正在渲染和显示内容。它通过以下方式实现:

¥When the pretendToBeVisual option is set to true, jsdom will pretend that it is rendering and displaying content. It does this by:

  • document.hidden 更改为返回 false 而不是 true

    ¥Changing document.hidden to return false instead of true

  • document.visibilityState 更改为返回 "visible" 而不是 "prerender"

    ¥Changing document.visibilityState to return "visible" instead of "prerender"

  • 启用 window.requestAnimationFrame()window.cancelAnimationFrame() 方法,否则不存在

    ¥Enabling window.requestAnimationFrame() and window.cancelAnimationFrame() methods, which otherwise do not exist

const window = (new JSDOM(``, { pretendToBeVisual: true })).window;

window.requestAnimationFrame(timestamp => {
console.log(timestamp > 0);
});

请注意,jsdom 仍然是 不进行任何布局或渲染,所以这实际上只是假装是可视化的,而不是实现真正的可视化 Web 浏览器将实现的平台部分。

¥Note that jsdom still does not do any layout or rendering, so this is really just about pretending to be visual, not about implementing the parts of the platform a real, visual web browser would implement.

加载子资源

¥Loading subresources

基本选项

¥Basic options

默认情况下,jsdom 不会加载任何子资源,例如脚本、样式表、图片或 iframe。如果你希望 jsdom 加载此类资源,你可以传递 resources: "usable" 选项,它将加载所有可用资源。那些是:

¥By default, jsdom will not load any subresources such as scripts, stylesheets, images, or iframes. If you'd like jsdom to load such resources, you can pass the resources: "usable" option, which will load all usable resources. Those are:

  • 通过 <frame><iframe> 的框架和 iframe

    ¥Frames and iframes, via <frame> and <iframe>

  • 样式表,通过 <link rel="stylesheet">

    ¥Stylesheets, via <link rel="stylesheet">

  • 脚本,通过 <script>,但前提是 runScripts: "dangerously" 也已设置

    ¥Scripts, via <script>, but only if runScripts: "dangerously" is also set

  • 图片,通过 <img>,但前提是还安装了 canvas npm 包(请参阅下面的“Canvas 支持”)

    ¥Images, via <img>, but only if the canvas npm package is also installed (see "Canvas Support" below)

尝试加载资源时,请记住 url 选项的默认值是 "about:blank",这意味着通过相对 URL 包含的任何资源都将无法加载。(尝试根据 URL about:blank 解析 URL /something 的结果是一个错误。)因此,在这些情况下,你可能希望为 url 选项设置非默认值,或者使用自动执行此操作的 便捷 API 之一。

¥When attempting to load resources, recall that the default value for the url option is "about:blank", which means that any resources included via relative URLs will fail to load. (The result of trying to parse the URL /something against the URL about:blank is an error.) So, you'll likely want to set a non-default value for the url option in those cases, or use one of the convenience APIs that do so automatically.

高级配置

¥Advanced configuration

为了更全面地定制 jsdom 的资源加载行为,你可以传递 ResourceLoader 类的实例作为 resources 选项值:

¥To more fully customize jsdom's resource-loading behavior, you can pass an instance of the ResourceLoader class as the resources option value:

const resourceLoader = new jsdom.ResourceLoader({
proxy: "http://127.0.0.1:9001",
strictSSL: false,
userAgent: "Mellblomenator/9000",
});
const dom = new JSDOM(``, { resources: resourceLoader });

ResourceLoader 构造函数的三个选项是:

¥The three options to the ResourceLoader constructor are:

  • proxy 是要使用的 HTTP 代理的地址。

    ¥proxy is the address of an HTTP proxy to be used.

  • strictSSL 可以设置为 false 以禁用 SSL 证书有效的要求。

    ¥strictSSL can be set to false to disable the requirement that SSL certificates be valid.

  • userAgent 影响发送的 User-Agent 标头,从而影响 navigator.userAgent 的结果值。它默认为 Mozilla/5.0 (${process.platform || "unknown OS"}) AppleWebKit/537.36 (KHTML, like Gecko) jsdom/${jsdomVersion}

    ¥userAgent affects the User-Agent header sent, and thus the resulting value for navigator.userAgent. It defaults to Mozilla/5.0 (${process.platform || "unknown OS"}) AppleWebKit/537.36 (KHTML, like Gecko) jsdom/${jsdomVersion}.

你可以通过子类化 ResourceLoader 并重写 fetch() 方法来进一步自定义资源获取。例如,这是一个覆盖为特定 URL 提供的响应的版本:

¥You can further customize resource fetching by subclassing ResourceLoader and overriding the fetch() method. For example, here is a version that overrides the response provided for a specific URL:

class CustomResourceLoader extends jsdom.ResourceLoader {
fetch(url, options) {
// Override the contents of this script to do something unusual.
if (url === "https://example.com/some-specific-script.js") {
return Promise.resolve(Buffer.from("window.someGlobal = 5;"));
}

return super.fetch(url, options);
}
}

jsdom 将在遇到 "usable" 资源时调用自定义资源加载器的 fetch() 方法,如上节所述。该方法采用 URL 字符串以及一些选项,如果调用 super.fetch(),你应该不加修改地传递这些选项。它必须返回 Node.js Buffer 对象的 promise,或者如果资源有意不加载,则返回 null。通常,大多数情况下都希望委托给 super.fetch(),如下所示。

¥jsdom will call your custom resource loader's fetch() method whenever it encounters a "usable" resource, per the above section. The method takes a URL string, as well as a few options which you should pass through unmodified if calling super.fetch(). It must return a promise for a Node.js Buffer object, or return null if the resource is intentionally not to be loaded. In general, most cases will want to delegate to super.fetch(), as shown.

你将在 fetch() 中收到的选项之一是获取资源的元素(如果适用)。

¥One of the options you will receive in fetch() will be the element (if applicable) that is fetching a resource.

class CustomResourceLoader extends jsdom.ResourceLoader {
fetch(url, options) {
if (options.element) {
console.log(`Element ${options.element.localName} is requesting the url ${url}`);
}

return super.fetch(url, options);
}
}

虚拟控制台

¥Virtual consoles

与 Web 浏览器一样,jsdom 具有 "console" 的概念。这记录了直接从页面发送的信息(通过在文档内执行的脚本)以及来自 jsdom 实现本身的信息。我们将用户可控制的控制台称为 "虚拟控制台",以将其与 Node.js console API 和页面内的 window.console API 区分开来。

¥Like web browsers, jsdom has the concept of a "console". This records both information directly sent from the page, via scripts executing inside the document, as well as information from the jsdom implementation itself. We call the user-controllable console a "virtual console", to distinguish it from the Node.js console API and from the inside-the-page window.console API.

默认情况下,JSDOM 构造函数将返回一个带有虚拟控制台的实例,该控制台将其所有输出转发到 Node.js 控制台。要创建自己的虚拟控制台并将其传递给 jsdom,你可以通过执行以下操作覆盖此默认值

¥By default, the JSDOM constructor will return an instance with a virtual console that forwards all its output to the Node.js console. To create your own virtual console and pass it to jsdom, you can override this default by doing

const virtualConsole = new jsdom.VirtualConsole();
const dom = new JSDOM(``, { virtualConsole });

这样的代码将创建一个没有行为的虚拟控制台。你可以通过为所有可能的控制台方法添加事件监听器来赋予它行为:

¥Code like this will create a virtual console with no behavior. You can give it behavior by adding event listeners for all the possible console methods:

virtualConsole.on("error", () => { ... });
virtualConsole.on("warn", () => { ... });
virtualConsole.on("info", () => { ... });
virtualConsole.on("dir", () => { ... });
// ... etc. See https://console.spec.whatwg.org/#logging

(请注意,最好在调用 new JSDOM() 之前设置这些事件监听器,因为解析期间可能会出现错误或控制台调用脚本。)

¥(Note that it is probably best to set up these event listeners before calling new JSDOM(), since errors or console-invoking script might occur during parsing.)

如果你只是想将虚拟控制台输出重定向到另一个控制台,例如默认的 Node.js 控制台,你可以这样做

¥If you simply want to redirect the virtual console output to another console, like the default Node.js one, you can do

virtualConsole.sendTo(console);

还有一个特殊事件 "jsdomError",它将使用错误对象触发以报告来自 jsdom 本身的错误。这类似于错误消息经常出现在 Web 浏览器控制台中的方式,即使它们不是由 console.error 发起的。到目前为止,以下错误以这种方式输出:

¥There is also a special event, "jsdomError", which will fire with error objects to report errors from jsdom itself. This is similar to how error messages often show up in web browser consoles, even if they are not initiated by console.error. So far, the following errors are output this way:

  • 加载或解析子资源(脚本、样式表、框架和 iframe)时出错

    ¥Errors loading or parsing subresources (scripts, stylesheets, frames, and iframes)

  • 脚本执行错误,这些错误未由返回 true 或调用 event.preventDefault() 的窗口 onerror 事件处理程序处理

    ¥Script execution errors that are not handled by a window onerror event handler that returns true or calls event.preventDefault()

  • 由于调用方法(如 window.alert)而导致未实现的错误,jsdom 未实现这些方法,但为了实现 Web 兼容性还是会安装这些方法

    ¥Not-implemented errors resulting from calls to methods, like window.alert, which jsdom does not implement, but installs anyway for web compatibility

如果你使用 sendTo(c) 将错误发送到 c,默认情况下它将使用来自 "jsdomError" 事件的信息调用 c.error(errorStack[, errorDetail])。如果你希望保持事件与方法调用的严格一对一映射,并且可能自己处理 "jsdomError",那么你可以这样做

¥If you're using sendTo(c) to send errors to c, by default it will call c.error(errorStack[, errorDetail]) with information from "jsdomError" events. If you'd prefer to maintain a strict one-to-one mapping of events to method calls, and perhaps handle "jsdomError"s yourself, then you can do

virtualConsole.sendTo(c, { omitJSDOMErrors: true });

与 Web 浏览器一样,jsdom 具有 cookie jar 的概念,用于存储 HTTP cookie。具有与文档位于同一域上的 URL 且未标记为仅 HTTP 的 Cookie 可通过 document.cookie API 访问。此外,cookie jar 中的所有 cookie 都会影响子资源的获取。

¥Like web browsers, jsdom has the concept of a cookie jar, storing HTTP cookies. Cookies that have a URL on the same domain as the document, and are not marked HTTP-only, are accessible via the document.cookie API. Additionally, all cookies in the cookie jar will impact the fetching of subresources.

默认情况下,JSDOM 构造函数将返回一个带有空 cookie jar 的实例。要创建自己的 cookie jar 并将其传递给 jsdom,你可以通过执行以下操作覆盖此默认值

¥By default, the JSDOM constructor will return an instance with an empty cookie jar. To create your own cookie jar and pass it to jsdom, you can override this default by doing

const cookieJar = new jsdom.CookieJar(store, options);
const dom = new JSDOM(``, { cookieJar });

如果你想在多个 jsdom 之间共享同一个 cookie jar,或者提前用某些值填充 cookie jar,这非常有用。

¥This is mostly useful if you want to share the same cookie jar among multiple jsdoms, or prime the cookie jar with certain values ahead of time.

Cookie jar 由 tough-cookie 包提供。jsdom.CookieJar 构造函数是 Tough-cookie cookie jar 的子类,默认情况下会设置 looseMode: true 选项,因为 与浏览器的行为更匹配。如果你想自己使用 strong-cookie 的实用程序和类,你可以使用 jsdom.toughCookie 模块导出来访问使用 jsdom 打包的 strong-cookie 模块实例。

¥Cookie jars are provided by the tough-cookie package. The jsdom.CookieJar constructor is a subclass of the tough-cookie cookie jar which by default sets the looseMode: true option, since that matches better how browsers behave. If you want to use tough-cookie's utilities and classes yourself, you can use the jsdom.toughCookie module export to get access to the tough-cookie module instance packaged with jsdom.

解析前干预

¥Intervening before parsing

jsdom 允许你很早就干预 jsdom 的创建:在创建 WindowDocument 对象之后,但在解析任何 HTML 以使用节点填充文档之前:

¥jsdom allows you to intervene in the creation of a jsdom very early: after the Window and Document objects are created, but before any HTML is parsed to populate the document with nodes:

const dom = new JSDOM(`<p>Hello</p>`, {
beforeParse(window) {
window.document.childNodes.length === 0;
window.someCoolAPI = () => { /* ... */ };
}
});

如果你想以某种方式修改环境,这尤其有用,例如为 jsdom 不支持的 Web 平台 API 添加垫片。

¥This is especially useful if you are wanting to modify the environment in some way, for example adding shims for web platform APIs jsdom does not support.

JSDOM 对象 API

¥JSDOM object API

一旦你构造了一个 JSDOM 对象,它将具有以下有用的功能:

¥Once you have constructed a JSDOM object, it will have the following useful capabilities:

属性

¥Properties

属性 window 检索为你创建的 Window 对象。

¥The property window retrieves the Window object that was created for you.

属性 virtualConsolecookieJar 反映了你传入的选项,或者如果没有传入这些选项,则反映了为你创建的默认值。

¥The properties virtualConsole and cookieJar reflect the options you pass in, or the defaults created for you if nothing was passed in for those options.

使用 serialize() 序列化文档

¥Serializing the document with serialize()

serialize() 方法将返回文档的 HTML 序列化,包括 doctype:

¥The serialize() method will return the HTML serialization of the document, including the doctype:

const dom = new JSDOM(`<!DOCTYPE html>hello`);

dom.serialize() === "<!DOCTYPE html><html><head></head><body>hello</body></html>";

// Contrast with:
dom.window.document.documentElement.outerHTML === "<html><head></head><body>hello</body></html>";

使用 nodeLocation(node) 获取节点的源位置

¥Getting the source location of a node with nodeLocation(node)

nodeLocation() 方法将查找 DOM 节点在源文档中的位置,并返回该节点的 parse5 位置信息

¥The nodeLocation() method will find where a DOM node is within the source document, returning the parse5 location info for the node:

const dom = new JSDOM(
`<p>Hello
<img src="foo.jpg">
</p>`,
{ includeNodeLocations: true }
);

const document = dom.window.document;
const bodyEl = document.body; // implicitly created
const pEl = document.querySelector("p");
const textNode = pEl.firstChild;
const imgEl = document.querySelector("img");

console.log(dom.nodeLocation(bodyEl)); // null; it's not in the source
console.log(dom.nodeLocation(pEl)); // { startOffset: 0, endOffset: 39, startTag: ..., endTag: ... }
console.log(dom.nodeLocation(textNode)); // { startOffset: 3, endOffset: 13 }
console.log(dom.nodeLocation(imgEl)); // { startOffset: 13, endOffset: 32 }

请注意,此功能仅在你设置了 includeNodeLocations 选项时才有效;出于性能原因,节点位置默认关闭。

¥Note that this feature only works if you have set the includeNodeLocations option; node locations are off by default for performance reasons.

使用 getInternalVMContext() 与 Node.js vm 模块交互

¥Interfacing with the Node.js vm module using getInternalVMContext()

Node.js 的内置 vm 模块是 jsdom 脚本运行魔法的基础。一些高级用例,例如预编译脚本然后多次运行它,可以从直接将 vm 模块与 jsdom 创建的 Window 一起使用中受益。

¥The built-in vm module of Node.js is what underpins jsdom's script-running magic. Some advanced use cases, like pre-compiling a script and then running it multiple times, benefit from using the vm module directly with a jsdom-created Window.

要访问适合与 vm API 一起使用的 上下文全局对象,可以使用 getInternalVMContext() 方法:

¥To get access to the contextified global object, suitable for use with the vm APIs, you can use the getInternalVMContext() method:

const { Script } = require("vm");

const dom = new JSDOM(``, { runScripts: "outside-only" });
const script = new Script(`
if (!this.ran) {
this.ran = 0;
}

++this.ran;
`);

const vmContext = dom.getInternalVMContext();

script.runInContext(vmContext);
script.runInContext(vmContext);
script.runInContext(vmContext);

console.assert(dom.window.ran === 3);

这是相当高级的功能,除非你有非常特殊的需求,否则我们建议坚持使用普通的 DOM API(例如 window.eval()document.createElement("script"))。

¥This is somewhat-advanced functionality, and we advise sticking to normal DOM APIs (such as window.eval() or document.createElement("script")) unless you have very specific needs.

请注意,如果创建 JSDOM 实例时未设置 runScripts,或者你是 在 Web 中使用 jsdom 浏览器,则此方法将引发异常。

¥Note that this method will throw an exception if the JSDOM instance was created without runScripts set, or if you are using jsdom in a web browser.

使用重新配置 jsdom reconfigure(settings)

¥Reconfiguring the jsdom with reconfigure(settings)

window 上的 top 属性在规范中标记为 [Unforgeable],这意味着它是一个不可配置的自有属性,因此即使使用 Object.defineProperty,也无法被 jsdom 中运行的正常代码覆盖或遮蔽。

¥The top property on window is marked [Unforgeable] in the spec, meaning it is a non-configurable own property and thus cannot be overridden or shadowed by normal code running inside the jsdom, even using Object.defineProperty.

同样,目前 jsdom 不处理导航(例如设置 window.location.href = "https://example.com/");这样做会导致虚拟控制台触发 "jsdomError",解释此功能未实现,并且不会发生任何变化:不会有新的 WindowDocument 对象,现有 windowlocation 对象仍将具有所有相同的属性值。

¥Similarly, at present jsdom does not handle navigation (such as setting window.location.href = "https://example.com/"); doing so will cause the virtual console to emit a "jsdomError" explaining that this feature is not implemented, and nothing will change: there will be no new Window or Document object, and the existing window's location object will still have all the same property values.

但是,如果你从窗口外部操作,例如在创建 jsdom 的某个测试框架中,你可以使用特殊的 reconfigure() 方法覆盖其中一个或两个:

¥However, if you're acting from outside the window, e.g. in some test framework that creates jsdoms, you can override one or both of these using the special reconfigure() method:

const dom = new JSDOM();

dom.window.top === dom.window;
dom.window.location.href === "about:blank";

dom.reconfigure({ windowTop: myFakeTopForTesting, url: "https://example.com/" });

dom.window.top === myFakeTopForTesting;
dom.window.location.href === "https://example.com/";

请注意,更改 jsdom 的 URL 将影响所有返回当前文档 URL 的 API,例如 window.locationdocument.URLdocument.documentURI,以及文档内相对 URL 的解析,以及获取子资源时使用的同源检查和引用者。但是,它不会执行到该 URL 内容的导航;DOM 的内容将保持不变,并且不会创建 WindowDocument 等的新实例。

¥Note that changing the jsdom's URL will impact all APIs that return the current document URL, such as window.location, document.URL, and document.documentURI, as well as the resolution of relative URLs within the document, and the same-origin checks and referrer used while fetching subresources. It will not, however, perform navigation to the contents of that URL; the contents of the DOM will remain unchanged, and no new instances of Window, Document, etc. will be created.

便利 API

¥Convenience APIs

fromURL()

除了 JSDOM 构造函数本身之外,jsdom 还提供了一个返回 promise 的工厂方法,用于从 URL 构造 jsdom:

¥In addition to the JSDOM constructor itself, jsdom provides a promise-returning factory method for constructing a jsdom from a URL:

JSDOM.fromURL("https://example.com/", options).then(dom => {
console.log(dom.serialize());
});

如果 URL 有效且请求成功,则返回的 promise 将通过 JSDOM 实例实现。任何重定向都将跟踪到其最终目的地。

¥The returned promise will fulfill with a JSDOM instance if the URL is valid and the request is successful. Any redirects will be followed to their ultimate destination.

提供给 fromURL() 的选项与提供给 JSDOM 构造函数的选项类似,但具有以下附加限制和后果:

¥The options provided to fromURL() are similar to those provided to the JSDOM constructor, with the following additional restrictions and consequences:

  • 无法提供 urlcontentType 选项。

    ¥The url and contentType options cannot be provided.

  • referrer 选项用作初始请求的 HTTP Referer 请求标头。

    ¥The referrer option is used as the HTTP Referer request header of the initial request.

  • resources 选项还会影响初始请求;如果你想要配置代理(请参见上文),这很有用。

    ¥The resources option also affects the initial request; this is useful if you want to, for example, configure a proxy (see above).

  • 生成的 jsdom 的 URL、内容类型和引用者由响应确定。

    ¥The resulting jsdom's URL, content type, and referrer are determined from the response.

  • 通过 HTTP Set-Cookie 响应标头设置的任何 cookie 都存储在 jsdom 的 cookie jar 中。同样,任何已在提供的 cookie jar 中的 cookie 都会作为 HTTP Cookie 请求标头发送。

    ¥Any cookies set via HTTP Set-Cookie response headers are stored in the jsdom's cookie jar. Similarly, any cookies already in a supplied cookie jar are sent as HTTP Cookie request headers.

fromFile()

fromURL() 类似,jsdom 还提供了一个 fromFile() 工厂方法,用于从文件名构造 jsdom:

¥Similar to fromURL(), jsdom also provides a fromFile() factory method for constructing a jsdom from a filename:

JSDOM.fromFile("stuff.html", options).then(dom => {
console.log(dom.serialize());
});

如果可以打开给定的文件,则返回的 promise 将通过 JSDOM 实例实现。与 Node.js API 中一样,文件名是相对于当前工作目录给出的。

¥The returned promise will fulfill with a JSDOM instance if the given file can be opened. As usual in Node.js APIs, the filename is given relative to the current working directory.

提供给 fromFile() 的选项与提供给 JSDOM 构造函数的选项类似,但具有以下附加默认值:

¥The options provided to fromFile() are similar to those provided to the JSDOM constructor, with the following additional defaults:

  • url 选项将默认为与给定文件名相对应的文件 URL,而不是 "about:blank"

    ¥The url option will default to a file URL corresponding to the given filename, instead of to "about:blank".

  • 如果给定的文件名以 .xht.xhtml.xml 结尾,则 contentType 选项将默认为 "application/xhtml+xml";否则它将继续默认为 "text/html"

    ¥The contentType option will default to "application/xhtml+xml" if the given filename ends in .xht, .xhtml, or .xml; otherwise it will continue to default to "text/html".

fragment()

对于最简单的情况,你可能不需要整个 JSDOM 实例及其所有相关功能。你甚至可能不需要 WindowDocument!相反,你只需要解析一些 HTML,并获取可以操作的 DOM 对象。为此,我们有 fragment(),它从给定字符串创建 DocumentFragment

¥For the very simplest of cases, you might not need a whole JSDOM instance with all its associated power. You might not even need a Window or Document! Instead, you just need to parse some HTML, and get a DOM object you can manipulate. For that, we have fragment(), which creates a DocumentFragment from a given string:

const frag = JSDOM.fragment(`<p>Hello</p><p><strong>Hi!</strong>`);

frag.childNodes.length === 2;
frag.querySelector("strong").textContent === "Hi!";
// etc.

这里 frag 是一个 DocumentFragment 实例,其内容是通过解析提供的字符串创建的。解析是使用 <template> 元素完成的,因此你可以在其中包含任何元素(包括具有奇怪解析规则的元素,如 <td>)。同样重要的是要注意,生成的 DocumentFragment 将没有 关联的浏览上下文:也就是说,元素的 ownerDocument 将具有空的 defaultView 属性,资源将不会加载等。

¥Here frag is a DocumentFragment instance, whose contents are created by parsing the provided string. The parsing is done using a <template> element, so you can include any element there (including ones with weird parsing rules like <td>). It's also important to note that the resulting DocumentFragment will not have an associated browsing context: that is, elements' ownerDocument will have a null defaultView property, resources will not load, etc.

所有对 fragment() 工厂的调用都会导致 DocumentFragment 共享相同的模板所有者 Document。这允许多次调用 fragment(),而无需额外开销。但这也意味着对 fragment() 的调用无法使用任何选项进行自定义。

¥All invocations of the fragment() factory result in DocumentFragments that share the same template owner Document. This allows many calls to fragment() with no extra overhead. But it also means that calls to fragment() cannot be customized with any options.

请注意,使用 DocumentFragment 进行序列化并不像使用完整的 JSDOM 对象那样容易。如果你需要序列化 ​​DOM,你可能应该更直接地使用 JSDOM 构造函数。但对于包含单个元素的片段的特殊情况,通过正常方式很容易做到:

¥Note that serialization is not as easy with DocumentFragments as it is with full JSDOM objects. If you need to serialize your DOM, you should probably use the JSDOM constructor more directly. But for the special case of a fragment containing a single element, it's pretty easy to do through normal means:

const frag = JSDOM.fragment(`<p>Hello</p>`);
console.log(frag.firstChild.outerHTML); // logs "<p>Hello</p>"

其他值得注意的功能

¥Other noteworthy features

Canvas 支持

¥Canvas support

jsdom 支持使用 canvas 包通过画布 API 扩展任何 <canvas> 元素。要使其工作,你需要将 canvas 作为依赖包含在项目中,作为 jsdom 的对等项。如果 jsdom 可以找到 canvas 包,它将使用它,但如果不存在,则 <canvas> 元素的行为将类似于 <div>。自 jsdom v13 以来,需要 canvas 的 2.x 版本;不再支持版本 1.x。

¥jsdom includes support for using the canvas package to extend any <canvas> elements with the canvas API. To make this work, you need to include canvas as a dependency in your project, as a peer of jsdom. If jsdom can find the canvas package, it will use it, but if it's not present, then <canvas> elements will behave like <div>s. Since jsdom v13, version 2.x of canvas is required; version 1.x is no longer supported.

编码嗅探

¥Encoding sniffing

除了提供字符串之外,还可以为 JSDOM 构造函数提供二进制数据,形式为 Node.js Buffer 或标准 JavaScript 二进制数据类型,如 ArrayBufferUint8ArrayDataView 等。完成后,jsdom 将从提供的字节中进行 嗅探编码,就像浏览器一样扫描 <meta charset> 标签。

¥In addition to supplying a string, the JSDOM constructor can also be supplied binary data, in the form of a Node.js Buffer or a standard JavaScript binary data type like ArrayBuffer, Uint8Array, DataView, etc. When this is done, jsdom will sniff the encoding from the supplied bytes, scanning for <meta charset> tags just like a browser does.

如果提供的 contentType 选项包含 charset 参数,则该编码将覆盖嗅探的编码 - 除非存在 UTF-8 或 UTF-16 BOM,在这种情况下这些优先。(再次强调,这就像浏览器一样。)

¥If the supplied contentType option contains a charset parameter, that encoding will override the sniffed encoding—unless a UTF-8 or UTF-16 BOM is present, in which case those take precedence. (Again, this is just like a browser.)

此编码嗅探也适用于 JSDOM.fromFile()JSDOM.fromURL()。在后一种情况下,与响应一起发送的任何 Content-Type 标头都将优先,方式与构造函数的 contentType 选项相同。

¥This encoding sniffing also applies to JSDOM.fromFile() and JSDOM.fromURL(). In the latter case, any Content-Type headers sent with the response will take priority, in the same fashion as the constructor's contentType option.

请注意,在许多情况下,以这种方式提供字节可能比提供字符串更好。例如,如果你尝试使用 Node.js 的 buffer.toString("utf-8") API,Node.js 将不会删除任何前导 BOM。如果你将此字符串提供给 jsdom,它将逐字解释它,使 BOM 保持完整。但 jsdom 的二进制数据解码代码将像浏览器一样删除前导 BOM;在这种情况下,直接提供 buffer 将产生所需的结果。

¥Note that in many cases supplying bytes in this fashion can be better than supplying a string. For example, if you attempt to use Node.js's buffer.toString("utf-8") API, Node.js will not strip any leading BOMs. If you then give this string to jsdom, it will interpret it verbatim, leaving the BOM intact. But jsdom's binary data decoding code will strip leading BOMs, just like a browser; in such cases, supplying buffer directly will give the desired result.

关闭 jsdom

¥Closing down a jsdom

根据定义,jsdom 中的计时器(由 window.setTimeout()window.setInterval() 设置)将在窗口上下文中执行未来的代码。由于以后无法在不保持进程活动的情况下执行代码,因此出色的 jsdom 计时器将使你的 Node.js 进程保持活动状态。同样,由于无法在不保持该对象处于活动状态的情况下在对象的上下文中执行代码,因此未完成的 jsdom 计时器将阻止对它们所安排的窗口进行垃圾收集。

¥Timers in the jsdom (set by window.setTimeout() or window.setInterval()) will, by definition, execute code in the future in the context of the window. Since there is no way to execute code in the future without keeping the process alive, outstanding jsdom timers will keep your Node.js process alive. Similarly, since there is no way to execute code in the context of an object without keeping that object alive, outstanding jsdom timers will prevent garbage collection of the window on which they are scheduled.

如果你想确保关闭 jsdom 窗口,请使用 window.close(),它将终止所有正在运行的计时器(并且还会删除窗口和文档上的任何事件监听器)。

¥If you want to be sure to shut down a jsdom window, use window.close(), which will terminate all running timers (and also remove any event listeners on the window and document).

使用 Chrome DevTools 调试 DOM

¥Debugging the DOM using Chrome DevTools

在 Node.js 中,你可以使用 Chrome DevTools 调试程序。请参阅 官方文档 了解如何开始。

¥In Node.js you can debug programs using Chrome DevTools. See the official documentation for how to get started.

默认情况下,jsdom 元素在控制台中被格式化为普通的旧 JS 对象。为了更轻松地调试,可以使用 jsdom-devtools-formatter,它可以让你像检查真实 DOM 元素一样检查它们。

¥By default jsdom elements are formatted as plain old JS objects in the console. To make it easier to debug, you can use jsdom-devtools-formatter, which lets you inspect them like real DOM elements.

注意事项

¥Caveats

异步脚本加载

¥Asynchronous script loading

人们在使用 jsdom 时经常遇到异步脚本加载问题。许多页面异步加载脚本,但无法判断它们何时完成加载,因此无法判断何时是运行代码并检查生成的 DOM 结构的好时机。这是一个基本限制;我们无法预测网页上的脚本会做什么,因此无法告诉你它们何时完成更多脚本的加载。

¥People often have trouble with asynchronous script loading when using jsdom. Many pages load scripts asynchronously, but there is no way to tell when they're done doing so, and thus when it's a good time to run your code and inspect the resulting DOM structure. This is a fundamental limitation; we cannot predict what scripts on the web page will do, and so cannot tell you when they are done loading more scripts.

这可以通过几种方式解决。如果你控制相关页面,最好的方法是使用脚本加载器提供的任何机制来检测加载何时完成。例如,如果你使用像 RequireJS 这样的模块加载器,代码可能如下所示:

¥This can be worked around in a few ways. The best way, if you control the page in question, is to use whatever mechanisms are given by the script loader to detect when loading is done. For example, if you're using a module loader like RequireJS, the code could look like:

// On the Node.js side:
const window = (new JSDOM(...)).window;
window.onModulesLoaded = () => {
console.log("ready to roll!");
};
<!-- Inside the HTML you supply to jsdom -->
<script>
requirejs(["entry-module"], () => {
window.onModulesLoaded();
});
</script>

如果你无法控制页面,你可以尝试变通方法,例如轮询特定元素是否存在。

¥If you do not control the page, you could try workarounds such as polling for the presence of a specific element.

有关更多详细信息,请参阅 #640 中的讨论,尤其是 @matthewkastor有见地的评论

¥For more details, see the discussion in #640, especially @matthewkastor's insightful comment.

Web 平台未实现的部分

¥Unimplemented parts of the web platform

虽然我们喜欢向 jsdom 添加新功能并使其与最新的 Web 规范保持同步,但它有许多缺失的 API。如果有任何缺失,请随时提交问题,但我们的团队很小而且很忙,因此拉取请求可能会更好。

¥Although we enjoy adding new features to jsdom and keeping it up to date with the latest web specs, it has many missing APIs. Please feel free to file an issue for anything missing, but we're a small and busy team, so a pull request might work even better.

jsdom 的某些功能由我们的依赖提供。在这方面值得注意的文档包括我们的 CSS 选择器引擎 nwsapi支持的 CSS 选择器 列表。

¥Some features of jsdom are provided by our dependencies. Notable documentation in that regard includes the list of supported CSS selectors for our CSS selector engine, nwsapi.

除了我们尚未涉及的功能之外,还有两个主要功能目前不在 jsdom 的范围内。这些是:

¥Beyond just features that we haven't gotten to yet, there are two major features that are currently outside the scope of jsdom. These are:

  • 导航:单击链接或分配 location.href 或类似内容时,能够更改全局对象和所有其他对象。

    ¥Navigation: the ability to change the global object, and all other objects, when clicking a link or assigning location.href or similar.

  • 布局:能够计算元素在 CSS 中的视觉布局,这会影响 getBoundingClientRects() 等方法或 offsetTop 等属性。

    ¥Layout: the ability to calculate where elements will be visually laid out as a result of CSS, which impacts methods like getBoundingClientRects() or properties like offsetTop.

目前,jsdom 对这些功能的某些方面具有虚拟行为,例如将 "未实现" "jsdomError" 发送到虚拟控制台进行导航,或为许多与布局相关的属性返回零。通常你可以在代码中解决这些限制,例如通过在抓取过程中为每个页面创建新的 JSDOM 实例,或使用 Object.defineProperty() 更改各种与布局相关的 getter 和方法的返回内容。

¥Currently jsdom has dummy behaviors for some aspects of these features, such as sending a "not implemented" "jsdomError" to the virtual console for navigation, or returning zeros for many layout-related properties. Often you can work around these limitations in your code, e.g. by creating new JSDOM instances for each page you "navigate" to during a crawl, or using Object.defineProperty() to change what various layout-related getters and methods return.

请注意,同一空间中的其他工具(例如 PhantomJS)确实支持这些功能。在 wiki 上,我们有关于 jsdom 与 PhantomJS 的更完整的描述。

¥Note that other tools in the same space, such as PhantomJS, do support these features. On the wiki, we have a more complete writeup about jsdom vs. PhantomJS.

支持 jsdom

¥Supporting jsdom

jsdom 是一个社区驱动的项目,由 volunteers 团队维护。你可以通过以下方式支持 jsdom:

¥jsdom is a community-driven project maintained by a team of volunteers. You could support jsdom by:

  • 获得 jsdom 的专业支持 作为 Tidelift 订阅的一部分。Tidelift 帮助我们实现开源可持续发展,同时为团队提供维护、许可和安全保证。

    ¥Getting professional support for jsdom as part of a Tidelift subscription. Tidelift helps making open source sustainable for us while giving teams assurances for maintenance, licensing, and security.

  • 贡献 直接到项目。

    ¥Contributing directly to the project.

获取帮助

¥Getting help

如果你需要 jsdom 方面的帮助,请随意使用以下任何场所:

¥If you need help with jsdom, please feel free to use any of the following venues: