I was generating a CSV of about 300,000 rows and seeing a lot of memory usage.
Probably the best approach is to support 'streaming' (writing one row at a time) for large files, rather than generating the whole document in memory and then writing it out. I may have a crack at this when I get time, and send a pull request.
In the meantime I was in a hurry, so I made a very hacky monkey-patch to reduce the memory footprint. It is pretty nasty, but I thought you might be interested.
module CsvShaper
class Encoder
# Monkey Patch to reduce memory usage while generating the csv.
# - CSV::Rows are converted to a string immediately as they have a large footprint
# - single map! rather than nested calls to map (note: this has nasty side effect of making this instance un-reusable)
# This is a quick and nasty solution. A better approach would be to write out the rows as they are generated
# rather than build the whole CSV document in memory.
def to_csv(local_config = nil)
csv_options = options.merge(local_options(local_config))
cols = @header.mapped_columns
@rows.map! do |row|
padded_row = CSV::Row.new(row.cells.keys, row.cells.values).values_at(*@header.columns)
CSV::Row.new(cols, padded_row, false).to_csv
end
cols.join(",") + "\n" + @rows.join
end
end
end
I was generating a CSV of about 300,000 rows and seeing a lot of memory usage.
Probably the best approach is to support 'streaming' (writing one row at a time) for large files, rather than generating the whole document in memory and then writing it out. I may have a crack at this when I get time, and send a pull request.
In the meantime I was in a hurry, so I made a very hacky monkey-patch to reduce the memory footprint. It is pretty nasty, but I thought you might be interested.