Skip to content

Significant dependency footprint #40

@quasi-coherent

Description

@quasi-coherent

Problem

Admittedly I have not closely examined this library to see if this is even possible, so please close if it is not. I also am using this crate for one purpose, which is to take a plain .sql file and split it on ;, the real benefit coming from not personally having to deal with stupid stuff like select /* a comment; here */ a from... or where x = 'asdf--some--special;characters'. And then repeat that not very many times. Which I could probably just get by lifting (with attribution, of course) some parts.

That aside, I thought it was worth noting that this crate brings in an enormous amount of dependencies and there are no cargo features to avoid having to build all of it and everything that comes with that.

Just at a glance (cargo tree -p sql-splitter has 581 lines, so I didn't look really hard):

  • Depending on sql-splitter means depending on duckdb, which to me seems pretty niche. This now means that compiling something that uses sql-splitter also has to build and link libduckdb-sys with some C++ runtime. I discovered this because a build of my own fails with "libstdc++.so.6: cannot open shared object file: No such file or dependency" and after a long search I traced it here.
  • The duckdb crate brings in the entirety of arrow.
  • Everything for file compression is a dependency--xz2, bzip2, zstd, flate. I could be misunderstanding the intended use here (I totally am misunderstanding the intended use case. I guess it's for parsing e.g. pg_dump where the database returns very, very, very many statements?) If I needed to compress sql files, which seems not likely to me at least, I would reach for one algorithm.
    • These being largely bindings to some C or C++ library brings in more building and linking an external runtime.
  • fake is probably for some testing resources? Could be in dev-dependencies.

Proposed Solution

Allow users to avoid building things that are not relevant.

How should it work?

It's hard to do something like this in a way that wouldn't be considered breaking, but there's probably a way to gate a lot of things behind features. To do it in a way that can avoid a major release, I guess you'd have a feature "all" that enables everything and the default feature default = ["all"].

Alternatives Considered

Maybe some core part (the splitting part) could be its own crate and then depended on here? Dealing with compression could be separated?

The thing is, as far as I can tell, there are two crates that claim to be able to split just one file. Don't need to split terabytes of SQL, just one single file with not very many lines. The other one says it's only for sqlite. This one would appear to be sensitive to the variant, so for instance a comment starting # in mysql is correctly handled by the parser, whereas not considering the flavor of sql syntax would lead to failure.

I dunno. I'm doing this out of order, but I was going to look next to see how reasonable the things I'm suggesting actually are. Maybe now that I know what this crate is for, it makes less sense.

Additional Context

cargo-tree.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions