-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Please describe the problem you'd like to be solved
The Package.files() function iterates over all files in a package using queryset.iterator(). For very large packages (thousands of files), this can cause memory spikes, long database queries, OperationalErrors (MySQL 2006 “MySQL server has gone away”) or connection dropped errors, and job failures during ingestion.
Describe the solution you'd like to see implemented
Add a configurable chunk_size parameter to the Package.files() method so that users can reduce the number of rows fetched at a time.
packages.py#L663
Describe alternatives you've considered
Leaving the default chunk size as is still causes issues for large packages (i.e. file count > 20,000).
Additional context
Example error encountered when iterating over very large packages:
Traceback shows failure in package.files() during queryset iteration:
File "/usr/lib/archivematica/MCPServer/server/jobs/client.py", line 215, in submit_tasks
for file_replacements in self.package.files(
File "/usr/lib/archivematica/MCPServer/server/packages.py", line 663, in files
for file_obj in queryset.iterator():
File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/django/db/models/query.py", line 516, in _iterator
yield from iterable
File "/usr/share/archivematica/virtualenvs/archivematica/lib64/python3.9/site-packages/MySQLdb/cursors.py", line 95, in _discard
while con.next_result() == 0: # -1 means no more data.
MySQLdb.OperationalError: (2006, '')
For Artefactual use:
Before you close this issue, you must check off the following:
- All pull requests related to this issue are properly linked
- All pull requests related to this issue have been merged
- A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
- Documentation regarding this issue has been written and merged
- Details about this issue have been added to the release notes