Refactoring & optimizing large BinaryPolynomial construction by mhlr · Pull Request #992 · dwavesystems/dimod

mhlr · 2021-09-10T02:42:16Z

Continuing #988 explorations.
This produces a >8x speedup in the construction of large BinaryPolynomial instances, see notebook

…__ method

arcondello · 2021-09-10T03:20:55Z

dimod/higherorder/polynomial.py


 import numpy as np

+import cytoolz as tl


while cytoolz is a cool package, I don't think we can add it as a core ocean dependency. It seems supported-ish, but, for instance, it only started releasing wheels in august/september.

If/when it matures more, then we can consider adding it.

Fair enough. Cytoolz has been in conda since June 2016 though
Many science/ai/ml package are conda only/first.

I am also using these PRs to explore ideas even if they do get merged. Hope that is ok

More than ok! It's much easier to have these conversations on a PR.

For conda, we actually have an issue about this dwavesystems/dwave-ocean-sdk#144. I do think conda support in addition would be nice, but we also need to support pypi installs.

Great! I would love to have dwave code in conda. Outside dwave I use conda almost exclusively. One adwantage is that the offical repos are curated so the quality tends to be higher than pypi. Also the conda (& now also mamba) package manager maintains global environment consistecy so one install does not break previously installed packages. Conda will (up/down) grade previouly installed packages if necessary. Conda is also language agnostic so you can distribute C, R, Julia, Rust packages with it. Main drawback is it can not install directly from git.

The reason I brought cond up though was more that cytoolz is more active/mature/maintained than it's pypi history would suggest. I believe dask was also long time in conda before it got released on pypi.

Otherwise I think cytoolz & conda are independent issues

Also conda gives you MKL compiled numerical libraries by default, which in turn means that native vectorised numpy/scipy operation run multithreaded by default (without GIL)

It also looks like cytool zhas been in PiPy since 2014 https://pypi.org/project/cytoolz/#history

Yes, though they only released an sdist until september. Which means that users without python-dev installed would not be able to install Ocean.
We have to be extremely careful about introducing new dependencies, especially ones with compiled code. Because they need to work across all of our supported OS/python versions.

Ok, I did not realize that the wheel availability was the issue. I thought the main concern was general maturity & support. If they are releasing wheels now what is the criterion for acceptance? Some period of time or some number of wheel releases?

There isn't a hard algorithm. But experience over the last few years has taught us to be extremely cautious when introducing new dependencies to Ocean. Especially compiled dependencies. A significant fraction of our support requests come from dependency management. Another place this comes up, each time Python releases a new version we can only release a supporting version of Ocean once all of our dependencies also update. So for some packages like NumPy, they are so fundamental that we cannot possibly avoid using them. But otherwise we definitely err, pretty far, on keeping our dependency tree as small as possible.

Fair enough. I do think that (cy)toolz is the kind universal nuts&bolts library linke numpy, scipy & pandas that is worth considering since it can be used everywhere and can reduces the need for resorting to C & cython

arcondello · 2021-09-10T03:22:50Z

dimod/higherorder/polynomial.py

                # not a polynomial
                return False
-
+        


not sure why these lines needed changes?

Looks like an eccidental whitespace change. Will fix. I am not seeing any semantic changes

Do we normally have whitespace in blank lines up to indentation level? Seems like that is what I removed

No, in fact removing the whitespace is correct. It's more about keeping the blame/commit history clean. No big deal either way.

arcondello · 2021-09-13T13:40:03Z

dimod/higherorder/polynomial.py

+                if vartype is Vartype.BINARY
+                else frozenset(
+                        var
+                        for var, power in tl.frequencies(term).items()


Seems like a big dependency to bring in for a 4 line function.

I also suspect that in typical use cases, i.e. relatively low degree terms, that just doing tuple.count() will be faster than constructing a new defaultdict.

In [7]: term = tuple('abcde') In [8]: %timeit for _ in frequencies(term): pass 1.31 µs ± 9.02 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [9]: %timeit for _ in ((v, term.count(v)) for v in term): pass 979 ns ± 7.79 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I also wonder if Counter would do the same thing.

That result seems to be from the interpreted toolz version of frequencies. The compiled cytools version is >2x faster than the .count version. The count version need to iterate over set(terms) rather than term which slows it down more, making it closer to 3x slower than compiled cytoolz.
Counter is around the same speed as the interpreted version of frequencies and provides the same syntactic benefit.

%timeit for _ in toolz.frequencies(term).items(): pass 1.16 µs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit for _ in ((v, term.count(v)) for v in term): pass 826 ns ± 46.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit for _ in cytoolz.frequencies(term).items(): pass 347 ns ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit for _ in ((v, term.count(v)) for v in set(term)): pass 1.01 µs ± 49.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit for _ in Counter(term).items(): pass 1.08 µs ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Seems like a big dependency to bring in for a 4 line function.

Cytoolz offers 2 benefits: clean code and speed - the original interpreted toolz library only provides the code quality part
The speed cytoolz brings is often sufficient to avoid needing to cythonize or port code to C leading to further code quality benefits as well
Depending on it brings in a moderately large number of small cythonized utility functions only a small number of which is likely to be used in any given PR. In each case it is easy to either reimplement an equivalent function or find a work around. In the long run though that means either having most of the library reimplemented possibly multiple times throughout our codebase and/or a lot of workarounds. Individual reimplementations are unlikely to get cythonized, so the codebase misses out on the performance benefit, in some cases leading to unnecessary C ports.

This could by demonstrated by refactoring a large piece of code that is performance bottleneck. I will do more refactoring and optimization in this file for now to see how many more cytoolz uses turn up. So far i have been doing it method by method to keep the PRs small but a combine PR would demonstrate the cumulative effect

mhlr · 2021-09-16T15:36:20Z

I am thinking about making all the operations keep the polynomials zero free

arcondello · 2021-09-16T15:37:26Z

I am thinking about making all the operations keep the polynomials zero free

That would be a backwards compatibility break. I am not opposed philosophically, but it would need to wait until dimod 0.11.

mhlr · 2021-09-16T15:44:38Z

I am thinking about making all the operations keep the polynomials zero free

That would be a backwards compatibility break. I am not opposed philosophically, but it would need to wait until dimod 0.11.

Fair enough. I think that could significantly improve scalability, both memory and time.though

mhlr · 2021-09-16T20:29:12Z

I am thinking about making all the operations keep the polynomials zero free

That would be a backwards compatibility break. I am not opposed philosophically, but it would need to wait until dimod 0.11.

Does anything really rely on the 0 weighted terms being stored explicitly?

arcondello · 2021-09-16T20:43:15Z

Strictly speaking

In [3]: p = dimod.BinaryPolynomial({}, 'BINARY')

In [4]: p['abc'] = 0

In [5]: p
Out[5]: BinaryPolynomial({frozenset({'b', 'c', 'a'}): 0}, 'BINARY')

In [6]: list(p)
Out[6]: [frozenset({'a', 'b', 'c'})]

whether that "break" actually matters to anyone? 🤷

FWIW, it is consistent with the BQM/QM behavior where we do explicitly keep 0 bias variables/interactions.

mhlr · 2021-09-17T00:16:37Z

Should we make a backlog entry for 0.11?

mhlr · 2021-09-22T18:11:34Z

@arcondello The last set of commits are the make_quadratic refactoring

mhlr added 3 commits September 9, 2021 20:13

refactor and optimize BinaryPolynomial.__init__

050b51f

(temporarily) adding notbook demnstrating >6x speedup with new __init…

0452df7

…__ method

update evaluation notebook

b209990

arcondello self-requested a review September 10, 2021 03:19

arcondello reviewed Sep 10, 2021

View reviewed changes

simplify freeze_term

7ddbe4c

arcondello added the enhancement New feature or request label Sep 10, 2021

Merge branch 'dwavesystems:main' into binary-polynomial

6c4cc54

arcondello reviewed Sep 13, 2021

View reviewed changes

Merge branch 'dwavesystems:main' into binary-polynomial

727ca81

mhlr added 5 commits September 20, 2021 23:23

added reduce_terms

44e0921

new version of make_quadratic - make_queadratic_old temporarily retained

6923abd

bugfix

695b45a

simpolifications

887dec0

more simplification

679e15f

Conversation

mhlr commented Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arcondello Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhlr Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhlr Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhlr Sep 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhlr Sep 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhlr Sep 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhlr commented Sep 16, 2021

Uh oh!

arcondello commented Sep 16, 2021

Uh oh!

mhlr commented Sep 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhlr commented Sep 16, 2021

Uh oh!

arcondello commented Sep 16, 2021

Uh oh!

mhlr commented Sep 17, 2021

Uh oh!

mhlr commented Sep 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mhlr commented Sep 10, 2021 •

edited

Loading

arcondello Sep 10, 2021 •

edited

Loading

mhlr Sep 10, 2021 •

edited

Loading

mhlr Sep 10, 2021 •

edited

Loading

mhlr Sep 12, 2021 •

edited

Loading

mhlr Sep 13, 2021 •

edited

Loading

mhlr Sep 13, 2021 •

edited

Loading

mhlr commented Sep 16, 2021 •

edited

Loading