10 Commits

Author SHA1 Message Date
Rafa de la Torre
d7e5c1383f Update v1.2.0-carto.1 release date 2018-06-11 13:30:43 +02:00
Rafa de la Torre
cb2227d159 Merge pull request #1 from CartoDB/performance-tune-copy-to
Improve performance of COPY TO
2018-06-11 13:29:23 +02:00
Rafa de la Torre
7930d1b8dd Add entry to changelog 2018-06-11 13:27:44 +02:00
Rafa de la Torre
e94fefe902 Merge branch 'v1.2-carto' into performance-tune-copy-to 2018-06-11 13:24:09 +02:00
Rafa de la Torre
9293926047 Add a NEWS.carto.md with the changelog 2018-06-11 13:20:53 +02:00
Rafa de la Torre
fd3cc95573 Remove unused var buffer_sent 2018-06-11 12:17:39 +02:00
Rafa de la Torre
922627daaf Small refactor 2018-06-11 12:14:28 +02:00
Rafa de la Torre
61bc713e0c Improve performance of COPY TO #56
Under some circumstances, the COPY TO streamming can be CPU-bound,
particularly when PG holds the resultset in memory buffers and the size
of the rows << chunk (64 KB in my linux box).

This commits improves the situation by creating a buffer of `chunk`
size and fitting in as many rows as it can before pushing them. This
results in more balanced read and writes (in terms of size and in bigger
chunks) as well as more frequent calls to the callback, thus freeing the
main loop for other events to be processed, and therefore avoiding
starvation.
2018-06-08 15:04:42 +02:00
jeromew
e15feb199a README.md: copy-from error and vacuum #26 2016-08-23 12:10:25 +02:00
jeromew
191a4ec16a Fix documentation of copy-from completion 2016-08-23 11:32:10 +02:00
3 changed files with 37 additions and 3 deletions

12
NEWS.carto.md Normal file
View File

@@ -0,0 +1,12 @@
# CARTO's Changelog
## v1.2.0-carto.1
Released 2018-06-11
Bug fixes:
* Improves performance of COPY TO by sending bigger chunks through low level `push()`. See https://github.com/CartoDB/node-pg-copy-streams/pull/1
## v1.2.0
Released 2016-08-22
Vanilla version v1.2.0 from upstream repository. See https://github.com/CartoDB/node-pg-copy-streams/releases/tag/v1.2.0

View File

@@ -42,10 +42,15 @@ pg.connect(function(err, client, done) {
var stream = client.query(copyFrom('COPY my_table FROM STDIN'));
var fileStream = fs.createReadStream('some_file.tsv')
fileStream.on('error', done);
fileStream.pipe(stream).on('finish', done).on('error', done);
stream.on('error', done);
stream.on('end', done);
fileStream.pipe(stream);
});
```
*Important*: Even if `pg-copy-streams.from` is used as a Writable (via `pipe`), you should not listen for the 'finish' event and expect that the COPY command has already been correctly acknowledged by the database. Internally, a duplex stream is used to pipe the data into the database connection and the COPY command should be considered complete only when the 'end' event is triggered.
## install
```sh
@@ -56,7 +61,10 @@ $ npm install pg-copy-streams
This module __only__ works with the pure JavaScript bindings. If you're using `require('pg').native` please make sure to use normal `require('pg')` or `require('pg.js')` when you're using copy streams.
Before you set out on this magical piping journey, you _really_ should read this: http://www.postgresql.org/docs/9.3/static/sql-copy.html, and you might want to take a look at the [tests](https://github.com/brianc/node-pg-copy-streams/tree/master/test) to get an idea of how things work.
Before you set out on this magical piping journey, you _really_ should read this: http://www.postgresql.org/docs/current/static/sql-copy.html, and you might want to take a look at the [tests](https://github.com/brianc/node-pg-copy-streams/tree/master/test) to get an idea of how things work.
Take note of the following warning in the PostgreSQL documentation:
> COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will already have received earlier rows in a COPY FROM. These rows will not be visible or accessible, but they still occupy disk space. This might amount to a considerable amount of wasted disk space if the failure happened well into a large copy operation. You might wish to invoke VACUUM to recover the wasted space.
## contributing

View File

@@ -42,6 +42,16 @@ CopyStreamQuery.prototype._transform = function(chunk, enc, cb) {
var messageCode;
var needPush = false;
var buffer = Buffer.alloc(chunk.length);
var buffer_offset = 0;
this.pushBufferIfneeded = function() {
if (needPush && buffer_offset > 0) {
this.push(buffer.slice(0, buffer_offset))
buffer_offset = 0;
}
}
while((chunk.length - offset) >= (Byte1Len + Int32Len)) {
var messageCode = chunk[offset]
@@ -70,6 +80,7 @@ CopyStreamQuery.prototype._transform = function(chunk, enc, cb) {
case code.ErrorResponse:
case code.CopyDone:
this.pushBufferIfneeded();
this._detach()
this.push(null)
return cb();
@@ -84,7 +95,8 @@ CopyStreamQuery.prototype._transform = function(chunk, enc, cb) {
if (needPush) {
var row = chunk.slice(offset, offset + length - Int32Len)
this.rowCount++
this.push(row)
row.copy(buffer, buffer_offset);
buffer_offset += row.length;
}
offset += (length - Int32Len)
} else {
@@ -93,6 +105,8 @@ CopyStreamQuery.prototype._transform = function(chunk, enc, cb) {
}
}
this.pushBufferIfneeded();
if(chunk.length - offset) {
var slice = chunk.slice(offset)
this._remainder = slice