add log abstraction to store records in disk#15
add log abstraction to store records in disk#15hongchaodeng wants to merge 5 commits intogo-distributed:masterfrom
Conversation
db/message/record.go
Outdated
There was a problem hiding this comment.
There is no need to have rev number persisted on disk.
There was a problem hiding this comment.
No needs to have bufio here since OS already buffers IO for you. Double buffer is evil in database.
There was a problem hiding this comment.
That's interesting. We need to investigate more regarding buffering behavior.
It currently uses bufio to write a single request: (length | record).
There was a problem hiding this comment.
Buffer should only be done in kernel space.
In kernel space, OS's page cache will buffer any incoming writes until you fsync on the file or until the flusher thread kicks in. There is no point buffering them in user space unless you want to speed up the reads for buffered data.
There was a problem hiding this comment.
This isn't about buffering. It's just a way to construct a message to be written into disk.
There was a problem hiding this comment.
You can still construct the message with io.writer.
There was a problem hiding this comment.
How about we don't buffer stuff until we know there is a performance issue without buffering? : ) The code will be much simpler.
There was a problem hiding this comment.
Nope. You can keep a downstream branch without buffer. I don't understand this part in OS so I will keep it in upstream.
There was a problem hiding this comment.
This is not upstream, this is just a pull request towards upstream. I'm just pointing out things that doesn't make sense in the pull request.
There was a problem hiding this comment.
Just a follow up on this. According to KAFKA doc, "As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure". It sounds OS page-cache reliance is a better option. And thanks for the discussion here.
There was a problem hiding this comment.
Thanks for doing the due diligence before adding a feature :)
To clarify, most of stuff we were discussing here is buffering instead of caching. Caching is a super set of buffering in the sense that you can read from a cache but cannot read from a bufio writer.
bea727b to
5fc79ea
Compare
@xiangli-cmu
This adds log and message abstraction.
I haven't thought about how to batch encoding yet. It had better be considered in snapshotting.