Advanced Options
The Migrate Records step includes several advanced features for handling complex migration scenarios. This article covers each feature in detail.
Mapping Variables
Mapping variables are intermediate calculated fields that are processed during migration but not written to the target table. They allow you to break complex transformations into smaller, reusable steps.
How They Work
- Create a mapping variable with the Add Mapping Variable button in the toolbar above the field mapping table (on narrow panes this action is in the vertical ellipsis menu)
- Define a mapping for the variable (e.g., a formula that extracts a value)
- Reference the mapping variable by name in other field mappings
Use Cases
- Reusable calculations: Compute a value once and reference it in multiple target field mappings, avoiding duplicated logic
- Multi-step transformations: Break a complex transformation into stages — e.g., first extract a JSON value into a mapping variable, then format it in the target field mapping
- Conditional routing: Calculate a flag in a mapping variable, then use it in multiple field mappings to determine output values
Example
Suppose you need to extract a user's department from a JSON payload and use it in two different target fields:
- Create mapping variable
deptwith formula:json_path('$.department.name', [user_data]) - Target field
department_name: Mapping Variable →dept - Target field
dept_code: Formula →dept[:3].upper()
Validation Rules
Validation rules check each record before it is inserted or updated in the target system. They provide a safety net to catch data quality issues during migration.
Configuring Rules
- Open the Migrate Records step and navigate to the validation rules section
- Add a new rule with a condition (formula expression that evaluates to
TrueorFalse) - Set the failure action
Failure Actions
| Action | Behaviour |
|---|---|
| Skip Record | The record is skipped (not inserted/updated). A log entry is created. Migration continues. |
| Log Warning | The record is processed normally, but a warning is logged. Useful for flagging potential issues without blocking migration. |
| Log Error | An error is logged against the record. Depending on your error handling configuration, this may stop the step or be recorded against the record and execution continues. Use for critical data integrity violations that must be investigated. |
Example Rules
-
Required field:
[email] is not None and [email] != ''— Skip Record when no email is set -
Value range:
int([priority]) >= 1 and int([priority]) <= 5— Log Error on invalid priority values -
Format check:
'@' in str([email])— Log Warning on emails without an @ symbol
Formula Validation
Rule conditions are checked for broken variable references in the editor. If a rule references a procedure variable that has since been renamed or removed, the Migrate Records step reports a validation error identifying the affected rule, so the problem is caught before the migration runs rather than at execution time. Update the formula to reference a valid variable to clear the error.
Attachment Migration
When migrating records that have associated file attachments (common in ServiceNow, Jira, and other systems), you can enable automatic attachment migration.
How It Works
- Enable the attachment migration toggle on the Migrate Records step
- When a record is migrated, PB checks for attachments on the source record
- Attachments are downloaded from the source system and uploaded to the corresponding target record
Considerations
- Performance: Attachment migration is significantly slower than record migration due to file download/upload operations. Large attachments (10+ MB) may take several seconds each.
- Storage: Ensure the target system has sufficient storage for migrated attachments.
- Supported systems: Attachment migration is available for ServiceNow and other systems with attachment APIs.
Batch Sizing
Batch sizing controls how many records are sent to the target system in each API call or database operation.
Trade-offs
| Setting | Effect |
|---|---|
| Smaller batches (e.g., 10-50) | More API calls, less memory usage, easier to pinpoint errors, slower overall |
| Larger batches (e.g., 200-500) | Fewer API calls, more memory usage, faster overall, harder to isolate errors |
Guidelines
- Default: The default batch size is appropriate for most migrations
- API rate limits: If you're hitting rate limits (429 errors), reduce batch size
- Memory constraints: If you're running out of memory on large records (e.g., records with big text fields), reduce batch size
- Database migrations: Database-to-database migrations can typically handle larger batch sizes (500+)
Keyset Pagination
Keyset pagination is an alternative to traditional offset-based pagination for extracting records from the source system.
Purpose
Traditional offset pagination (OFFSET 1000 LIMIT 100) can produce inconsistent results if records are added or removed from the source table during extraction. Keyset pagination avoids this by using the source table's primary key to track position.
How It Works
Instead of "skip N records, take M", keyset pagination uses "get records where key > last_seen_key, take M". This is reliable even when the source data changes during extraction.
When to Use
- Large datasets: 100,000+ records where extraction takes minutes or hours
- Active source systems: When records may be created/modified during extraction
- Reliable ordering needed: When you need deterministic, repeatable extraction
Configuration
Keyset pagination is a single toggle on the Migrate Records step — there is no separate "key field" selector. PB uses the source table's pre-configured primary key (e.g. sys_id for ServiceNow, the primary key column for databases) as the keyset cursor.
The toggle is only available when the source table's adaptor declares keyset pagination support in its metadata. If the toggle is disabled or absent, the adaptor or the specific table does not support keyset pagination.
Sampling Validation
Sampling validation checks the accuracy of a migration by re-querying a random sample of the already-migrated records from the target system and comparing each retrieved value against what PB expected to write. It is a post-migration safety check; for a pre-migration dry-run of the same step against a sample of source records (no writes), see Previewing a Migration.
To limit how many records get processed, use the Limit option on the Filter tab, or apply a tightly-scoped filter (see Common Filter Patterns) to extract a small subset for testing.
How to Use
- Open the Migrate Records step and go to the Validation tab
- Enable Sampling Target Check
- Set the Sampling Percentage (1–100). 5–10% is a sensible starting point; higher percentages produce more confidence but cost more time and API calls
- Run the step. After the records are migrated, PB queries the target for a random sample and reports any mismatches in the execution report
You can also exclude individual field mappings from sampling validation (e.g. fields the target system normalises or transforms server-side) by toggling Exclude from sampling validation on the field mapping.
Benefits
- End-to-end confirmation: Detects drift between what PB sent and what the target system actually stored — including silent server-side transforms, truncation, or type coercion.
- Per-field reporting: Mismatches are reported field-by-field so you can pinpoint mapping errors.
- Configurable scope: Tune the percentage to balance confidence against migration time.
Per-Chunk Scripts
Per-chunk scripts run a user-authored Python script in an isolated subprocess once per chunk of records during a Migrate Records step. The script receives the chunk's source and target records, can mutate the target records, and can write artifact files to the execution.
This is the same script runner as the Run Script step, with two differences:
- The script is invoked once per chunk instead of once per execution.
-
pb.batchis populated with the chunk being processed, andpb.set_batchwrites (optionally mutated) target records back into the step.
A Migrate Records step can carry at most one script per timing slot (zero, one, or two scripts in total).
Timing Slots
| Timing | When it runs | Mutations |
|---|---|---|
Pre-Insert (per_chunk_pre_insert) |
After target records have been assembled by the field mappings, before the chunk is upserted to the target. | Mutations written via pb.set_batch flow through into what is sent to the target. |
Post-Insert (per_chunk_post_insert) |
After the chunk has been written to the target. Target primary keys assigned by the target system are populated on the records. | Mutations are accepted but only affect in-memory state used by downstream validation, attachments, and output collection. They do not write back to the target. |
Use pre-insert to apply last-mile transformations a formula can't express (for example, a transformation that needs a Python library, or a cross-record calculation against the full chunk). Use post-insert to observe the post-upsert state of records, write per-chunk artifact files, or accumulate summary data across chunks.
Records Visible to the Script
Records that errored or were skipped earlier in the step (by validation rules, key mapping rules, or field mapping failures) are excluded from pb.batch. The script cannot see them and cannot resurrect them. Scripts that return a batch must return exactly the records they were given — inserting or removing records from the chunk is not supported.
The Batch Payload
The chunk is exposed to the script as pb.batch:
pb.batch = {
"chunk_index": 0, # int, zero-based
"total_chunks": 5, # int | None, when knowable
"source_records": [
{"data": {...}, "primary_key_value": ...},
...
],
"target_records": [
{"data": {...}, "primary_key_value": ...},
...
],
}
To mutate target records, modify target_records and call pb.set_batch:
for record in pb.batch["target_records"]:
record["data"]["full_name"] = (
f"{record['data']['first_name']} {record['data']['last_name']}"
)
pb.set_batch({"target_records": pb.batch["target_records"]})
If the script does not call pb.set_batch, the in-memory records are left untouched and the chunk proceeds with whatever the field mappings produced.
Configuration
- Open the Migrate Records step and go to the Scripts tab on the Advanced area.
- Add a script and choose a timing slot (Pre-Insert or Post-Insert).
- Edit the script body in the in-product code editor. Each timing slot opens with a starter template documenting the available
pb.*API and the batch shape. - Set the Timeout (seconds) for each invocation (default 60, max one hour). The script is killed if any single chunk invocation exceeds the timeout, which fails the step.
Failure Handling
A failed per-chunk script aborts the step. The chunk does not proceed to upsert (for pre-insert) or to downstream handling (for post-insert), and the execution status is set to failed. Stdout, stderr, and any pb.log output from the script are captured in the execution log so you can diagnose the failure.
Artifacts
Per-chunk scripts have access to both artifact surfaces exposed by the script runner:
-
pb.artifact_path(filename)writes into this step's artifact folder, surfaced under Step Artifacts for this step in the execution viewer. -
pb.shared_artifact_path(filename)writes into the procedure-level shared folder, surfaced under Procedure Artifacts at the top of the execution's Artifacts page. Use this when several steps in the same procedure contribute to one output (for example, a running summary appended by both pre-insert and post-insert scripts), or when a downstream step needs to read a file produced by an earlier one without hard-coding the per-step layout.
The viewer deduplicates by filename and reflects the latest on-disk size, so an append-only summary written by post-insert grows safely across chunks. See Step Type: Run Script for the full pb API.
Related Documentation
- Migrate Records Overview — introduction to the Migrate Records step
- Configuring Field Mappings — setting up field mappings
- Formula Language — formula syntax for validation rules and mapping variables
-
Step Type: Run Script — stand-alone version of the same script runner, including the full
pbmodule API - Performance Optimization — tuning batch size and pagination
Comments
0 comments
Please sign in to leave a comment.