Skip to content

how to use wids on local dataset #6

@WudiJoey

Description

@WudiJoey

webdataset==0.2.86

shards = [dict(url="lines-000000.tar", nsamples=250)]
ds = wids.ShardListDataset(shards, keep=True)
print(ds[0])

then happen

File "/usr/local/lib/python3.6/dist-packages/wids/wids.py", line 515, in __getitem__
    shard, inner_idx, desc = self.get_shard(index)
  File "/usr/local/lib/python3.6/dist-packages/wids/wids.py", line 510, in get_shard
    shard = self.cache.get_shard(url)
  File "/usr/local/lib/python3.6/dist-packages/wids/wids.py", line 348, in get_shard
    itf = IndexedTarSamples(downloaded, source=url)
  File "/usr/local/lib/python3.6/dist-packages/wids/wids.py", line 205, in __init__
    self.reader = MMIndexedTar(tar_file)
  File "/usr/local/lib/python3.6/dist-packages/wids/wids_mmtar.py", line 50, in __init__
    self._build_index()
  File "/usr/local/lib/python3.6/dist-packages/wids/wids_mmtar.py", line 58, in _build_index
    name = header.name.decode("utf-8").strip("\x00")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 2: invalid start byte

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions