Skip to content

memory footprint #16

Description

@perryn

I was generating a CSV of about 300,000 rows and seeing a lot of memory usage.

Probably the best approach is to support 'streaming' (writing one row at a time) for large files, rather than generating the whole document in memory and then writing it out. I may have a crack at this when I get time, and send a pull request.

In the meantime I was in a hurry, so I made a very hacky monkey-patch to reduce the memory footprint. It is pretty nasty, but I thought you might be interested.

module CsvShaper
  class Encoder    
    # Monkey Patch to reduce memory usage while generating the csv.
    # - CSV::Rows are converted to a string immediately as they have a large footprint
    # - single map! rather than nested calls to map (note: this has nasty side effect of making this instance un-reusable)
    # This is a quick and nasty solution. A better approach would be to write out the rows as they are generated
    # rather than build the whole CSV document in memory. 
    def to_csv(local_config = nil)
      csv_options = options.merge(local_options(local_config))
      cols = @header.mapped_columns
      @rows.map! do |row|
        padded_row = CSV::Row.new(row.cells.keys, row.cells.values).values_at(*@header.columns)
        CSV::Row.new(cols, padded_row, false).to_csv
      end
      cols.join(",") + "\n" + @rows.join
    end
  end
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions