ARTICLE AD BOX
I am using the Couchbase Java SDK and need to update a field in a large number of documents (100k–200k).
The update depends on three fields, and I must update a fourth field inside a transactional context.
I am considering several approaches but I am unsure which is the most efficient and scalable.
Use Case
Couchbase , Java
Update condition: based on fieldA, fieldB, fieldC
Update target: fieldD
Estimated affected documents: 100k–200k
Must run updates in a transaction
Approaches I am considering
1. Keyset pagination + batch update
Fetch document IDs in chunks using keyset pagination.
In each chunk, either:
Run a N1QL UPDATE ... USE KEYS [...] query
Perform KV mutateIn operations
I tried this but didn't perform not even better than N1Ql must have done something wrong
2. Query to fetch keys + multi-threaded batch updates
Run SELECT META().id ... WHERE <conditions> to get all matching document IDs.
Execute N1QL update batches across multiple threads.
3. Single N1QL update statement
UPDATE bucket SET fieldD = <value> WHERE fieldA = ... AND fieldB = ... AND fieldC = ...; Simplest approach, but unsure about performance and transactional behavior for large datasets.Questions
Which approach is recommended for updating 100k–200k Couchbase documents in a transaction?
Is keyset pagination + USE KEYS faster or safer than a single large N1QL update?
Does multi-threading improve performance inside a transaction?
Are there best practices for bulk transactional updates using the Java SDK?
