Skip to content

Fix for keeping and using custom Cheerio options (Issue #273)#274

Closed
Torthu wants to merge 2 commits into
cheeriojs:masterfrom
Torthu:master
Closed

Fix for keeping and using custom Cheerio options (Issue #273)#274
Torthu wants to merge 2 commits into
cheeriojs:masterfrom
Torthu:master

Conversation

@Torthu

@Torthu Torthu commented Sep 19, 2013

Copy link
Copy Markdown
Contributor

Cheerio.prototype.options are now overwritten if Cheerio is
instantiated with an options object.

cheerioinstace.options is now passed to parser when adding new dom
nodes to an existing Cheerio instance.

Cheerio.prototype.options are now overwritten if Cheerio is
instantiated with an options object.

cheerioinstace.options is now passed to parser when adding new dom
nodes to an existing Cheerio instance.
Comment thread lib/static.js Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks:

  • include a space after if
  • use single quotes

@davidchambers

Copy link
Copy Markdown
Contributor

Nice patch!

Comment thread test/xml.js

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant, "should force lowercase for new nodes".

But really, the functionality you've implemented forwards all parsing options. We should write the tests to verify this a little more generally. Mind adding another assertion to each of these tests for the ignoreWhiteSpace option? Then we'd want to name these something like, 'respects parsing options defined when the context was created'.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, and I agree.

@jugglinmike

Copy link
Copy Markdown
Member

This approach has potentially-surprising behavior--it modifies Cheerio globally. For instance, the following test will fail:

it('should maintain the parsing options of distinct contexts independently', function() {
  var str = '<g><someElem someAttribute="something">hello</someElem></g>';
  var $x = cheerio.load('', { xmlMode: false });
  var $h = cheerio.load('', { xmlMode: true });

  expect($x(str).html()).to.equal('<someelem someattribute="something">hello</someelem>');
});

I've implemented a fix for this and opened a pull request against your branch: Torthu#1

@davidchambers What do you think about this approach? It adds a bit of overhead, but I think it's important to keep contexts separate.

@fb55

fb55 commented Sep 21, 2013

Copy link
Copy Markdown
Member

This pull request is a hack and definitely shouldn't be merged. It introduces circular dependencies, which is probably the worst way to do this.

Instead, patching .make() to maintain the options object will not only be cleaner, but also a lot faster. There are a couple of other places where cheerio is instantiated, which would have to be updated, too.

@fb55

fb55 commented Sep 21, 2013

Copy link
Copy Markdown
Member

Generally speaking, keeping an options object is definitely a needed feature, not only for #275 (which is only a nicety), but also for passing the options to CSSselect.

@jugglinmike

Copy link
Copy Markdown
Member

@Torthu Cheerio has changed a bit since you first opened this pull request, but I'd be happy to work with you to update (and incorporate @fb55's feedback). We could really use this functionality--are you up for it? If I don't hear from you, I'll pick up from where you left off.

@Torthu

Torthu commented Nov 28, 2013

Copy link
Copy Markdown
Contributor Author

Sure, I'm up for it, but I must admit that my understanding of the codebase is minimal at best.

@jugglinmike

Copy link
Copy Markdown
Member

That's fine! I'll start off with our high-level goals, then get into specifics
about how you can accomplish them technically.

Goals

Parsing options aren't really handled in a consistent way right now, but as far
as I can tell, this is what we want:

  • Per-instance options that control how a Cheerio instance (and its
    descendents) parse HTML
  • Global options that can be configured to change how all subsequent
    Cheerio instances behave

I want to draw attention to the "subsequent" part, since one might argue that
changing the global options should also change the behavior of existing Cheerio
objects. For example:

Cheerio.prototype.options.lowerCaseTags = false;
var $ = Cheerio.load('<div></div>');

Cheerio.prototype.options.lowerCaseTags = true;

// Should the behavior of this operation reflect the current state of the
// options?
$.append('<H1>Hello</H1>');

If we want this, we'll have to constantly reference (and merge) instance- and
global- options. I personally don't think that the behavior is worth the
performance overhead, but I'm curious if you (or the project maintainers) have
differing opinions.

Code

Basically, we need to be able to specify parsing options at instantiation time.
This means the Cheerio constructor needs an additional argument:

diff --git a/lib/cheerio.js b/lib/cheerio.js
index 2c847e1..91a8008 100644
--- a/lib/cheerio.js
+++ b/lib/cheerio.js
@@ -31,8 +29,10 @@ var $ = require('./static');
  * Instance of cheerio
  */

-var Cheerio = module.exports = function(selector, context, root) {
-  if (!(this instanceof Cheerio)) return new Cheerio(selector, context, root);
+var Cheerio = module.exports = function(selector, context, root, options) {
+  if (!(this instanceof Cheerio)) return new Cheerio(selector, context, root, options);
+
+  this.options = _.defaults(options || {}, Cheerio.prototype.options);

   // $(), $(null), $(undefined), $(false)
   if (!selector) return this;

...then update all the calls to parse to pass along this.options. With this
in place, Cheerio#make can be patched so that all sub-selections share the
same parsing options:

 Cheerio.prototype.make = function(dom) {
-  return new Cheerio(dom);
+  return new Cheerio(dom, undefined, undefined, this.options);
 };

And the static method $.load will need to change the way it invokes both
parse and the Cheerio constructor.


Now that I've written this out, it's clear that my approach requires some API
changes, and not all of it is particularly clean. Some potential
simplifications:

  • We could only maintain global parsing options. This seems to clash with the
    (currently unimplemented) API though: the Cheerio.load(str, options)
    signature implies per-instance options.
  • We could remote the root parameter of the Cheerio constructor since it is
    essentially a less-powerful alias for $.load (and the project Readme
    describes $.load as "the preferred method" for loading HTML).

I'd like to get some feedback from @matthewmueller, @fb55, and @davidchambers
before I advise you to implement this! What do you all think? Have I missed a
simpler/more elegant solution?

@davidchambers

Copy link
Copy Markdown
Contributor
Cheerio.prototype.options.lowerCaseTags = false;
var $ = Cheerio.load('<div></div>');

We should avoid global state. I'd prefer to see the following:

var $ = Cheerio.load('<div></div>', {lowerCaseTags: false});

If cheerio's API provided a way to create an independent constructor function, mutating its prototype would be okay. This could, perhaps, work as follows:

var init = function() {
  function Cheerio(...) {
    ...
  }
  Cheerio.prototype.find = function(...) {
    ...
  };
  ...
  return Cheerio;
};
// a.js
var Cheerio = require('cheerio').init();
Cheerio.prototype.foo = 42
// b.js
var Cheerio = require('cheerio').init();
Cheerio.prototype.foo // => undefined

@jugglinmike

Copy link
Copy Markdown
Member

We should avoid global state.

Generally speaking, I agree with you. But I think this might make sense for people that want to concisely configure Cheerio for their entire module (no need to specify the same options everywhere).

If cheerio's API provided a way to create an independent constructor function, mutating its prototype would be okay.

This sounds a little like what I proposed to @Torthu in GH-1: Create a unique prototype for each Cheerio context

@jugglinmike

Copy link
Copy Markdown
Member

I'm bumping this issue because we could really use some feedback before building this out. Does the proposal I made above seem alright? Should we forget about setting options globally, as @davidchambers recommended?

It's important for the library to consistently handle these options so that it behaves in a predictable way. Note that this issue also blocks #261.

@stevenvachon

Copy link
Copy Markdown
Contributor

The code is kind of a mess. We have to pass options to, like, every function. Each module should've been prototyped to run as instances so that this.options could be used everywhere with much less passing around.

Even with the above mentioned init wrapper, options won't reach manipulation.js without crazy amounts of var passing. Unless I am mistaken, I think the only solution is a rewrite or a global options var.

@fb55

fb55 commented Feb 25, 2014

Copy link
Copy Markdown
Member

@stevenvachon Most functions are attached to cheerio's prototype and can access instance properties. Just modifying _make should do the trick. As far as I understand, init is only required to have an independent prototype, which can be modified to set default options. Or did I miss something?

@stevenvachon

Copy link
Copy Markdown
Contributor

@fb55 They are, but they break away from it in each module. For example, cheerio.js has this:

var api = ['attributes', 'traversing', 'manipulation', 'css'];

api.forEach(function(mod) {
  _.extend(Cheerio.prototype, require('./api/' + mod));
});

but then manipulation.js has this:

var makeDomArray = function(elem) {
  if (elem == null) {
    return [];
  } else if (elem.cheerio) {
    return elem.toArray();
  } else if (_.isArray(elem)) {
    return _.flatten(elem.map(makeDomArray));
  } else if (_.isString(elem)) {
    return evaluate(elem);
  } else {
    return [elem];
  }
};

var _insert = function(concatenator) {
  return function() {
    var elems = slice.call(arguments),
        dom = makeDomArray(elems);

    return this.each(function(i, el) {
      if (_.isFunction(elems[0])) {
        dom = makeDomArray(elems[0].call(el, i, this.html()));
      }
      updateDOM(concatenator(dom, el.children || (el.children = [])), el);
    });
  };
};

var append = exports.append = _insert(function(dom, children) {
  return children.concat(dom);
});

We'll have to pass this.options to everything.

@fb55

fb55 commented Feb 25, 2014

Copy link
Copy Markdown
Member

@stevenvachon Or, these methods could be added to the prototype (prefixed with an underscore). But I get your point.

@stevenvachon

Copy link
Copy Markdown
Contributor

Anything happening here? I've had to special case this in my code and I'd much prefer it to use something that won't break in the future.

@fb55

fb55 commented Apr 6, 2014

Copy link
Copy Markdown
Member

@stevenvachon As far as I know, there is currently nobody working on this. Feel free to send a pull request, though :)

fb55 added a commit that referenced this pull request Apr 8, 2014
fb55 added a commit that referenced this pull request Apr 8, 2014
fb55 added a commit that referenced this pull request Apr 8, 2014
as proposed by @jugglinmike in #274, but using `this.options` instead
of the prototype to enable `Cheerio.call` calls to work properly
@fb55 fb55 mentioned this pull request Apr 8, 2014
@fb55

fb55 commented Apr 8, 2014

Copy link
Copy Markdown
Member

I'm closing this in favor of #437.

@fb55 fb55 closed this Apr 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants