prosync — utility to replay changes to a source database on a destination database of a different type
prosync config generate [option...]
prosync init [option...]
prosync run [option...]
prosync complete [option...]
prosync is a utility to perform the change data capture (CDC) of a source database and replay the changes on a destination database of a different type.
prosync is not responsible for:
Database schema transfer.
Transferring data that is missing from log files.
Data verification after the data transfer.
prosync participates in the hot DB migration (refer to Section 1.3 for more details).
To prepare for the migration of a hot DB, perform the following steps:
Prepare and set up the destination DB for the migration:
Create the necessary schemas and tables.
Remove or turn off triggers.
Generate the configuration file:
prosync config generate -o filename
Update the configuration file with appropriate values.
Initialize prosync with the command:
prosync init -f filename
For an Oracle source, prosync will store the current SCN to start the subsequent replication with this value. For a PostgreSQL/Postgres Pro source, prosync will create a logical replication slot for subsequent reading of changes.
Perform the migration as follows:
Launch the procopy load:
procopy load -f filename
When it finishes, launch the prosync replay of changes:
prosync run -f filename
Wait until the difference between Parsing SCN and
Oracle latest SCN decreases to an admissible value.
For the migration of a hot DB, it is critical for this difference
to decrease:
Parsing SCN 3671057 Oracle latest SCN 4995972
Disable the load on the source DB and wait until Parsing SCN
and Oracle latest SCN virtually stop changing.
Stop prosync.
Changes always occur in Oracle even if
nobody changes the data, so Oracle latest SCN
is continuously growing. Therefore, the difference between
Parsing SCN and Oracle latest SCN
never reaches zero.
Table Type #
When executing tasks of the TABLE type (refer to the section called “Types of Tasks” for details) during data migration
involving prosync, do not exclude (through
exclude_columns
or include_columns)
columns that are used by LogMiner as a
row key. Otherwise, executing DELETE or UPDATE
commands can change more than one row.
For tables without keys (heap), do not exclude any columns as LogMiner uses the whole row for identification.
These rules do not apply for data migrations that are only performed by procopy.
When migrating the PostgreSQL/Postgres Pro DB, a replication slot is needed for running prosync replay of changes. Therefore:
This section explains how prosync works with Shardman as a source.
For example, in the YAML configuration file, specify:
source:
driver_name: shardman
shards:
shard-1:
dsn: postgresql://host1:5432,host2:5432/postgres?user=postgres&password=fqs291dbb&target_session_attrs=prefer-standby
shard-2:
dsn: postgresql://host3:5432,host4:5432/postgres?user=postgres&password=fqs291dbb&target_session_attrs=prefer-standby
Where shard-1 and shard-2 are
shard names and dsn parameters specify BiHA nodes in
these shards.
When source.driver_name
is shardman, configuration of each shard must
be explicitly specified. No topology autodetection takes place.
However, refer to the section called “Partially Specifying Shards”
for more details.
Possible values of target_session_attrs
in dsn are as follows:
read-write, primary —
the connection is only to the master. Reconnects to master will occur
(when the master changes or is inaccessible).
read-only — the connection is always to a
standby. Reconnects to other standbys will occur (if the current node
was promoted to master or is inaccessible).
prefer-standby — the preferred connection
is to a standby. If the BiHA cluster has more standbys, reconnects
to them will occur. If only the master is left, a reconnect to it
will occur.
Configure prosync_options as follows:
Set use_replica_auto_sync_slots
to true if you need failover replication slots and replication
from a standby in the BiHA cluster.
Set use_failover_slots
to true if you need failover replication slots and replication
only from the master.
It is recommended to set use_replica_auto_sync_slots
to true and in dsn, set target_session_attrs
to prefer-standby.
By default, exclusion of shards from the configuration is prohibited as
it can cause data loss in the destination when working with sharded and
local tables. However, running prosync
init or run with the
--allow-partial-shardman-read option allows you to exclude
some shards. With this option, prosync will
work without specifying all the shards in the configuration file. This is
useful when:
Data is manually distributed across shards and is located, for example, in specific local tables.
Data is located only in global tables, or data from these tables is only needed.
Global tables from only one shard are processed. The shard is chosen by sorting shard names in the configuration file.
Two replication slots are created on the chosen shard:
A slot for global tables only.
A slot for sharded and local tables.
On the other shards, replication slots for sharded and local tables are only created.
By default, processing of local table is prohibited as the result of this
processing in the destination is unpredictable. However, the
prosync init or
run command with the
--allow-shardman-local-tables option processes local
tables. This is useful when:
At the business logic level, local tables on all shards contain different data.
The local table is on one of the shards.
Tasks of the Schema or Table types
are supported. For more details, refer to the section called “Types of Tasks”.
If a local table located on one of the shards is specified in the configuration file, a replication slot that includes this table will automatically be created on this shard.
So for different local tables, replication slots with different names and different table lists will be created on different shards.
prosync run outputs Shardman-specific statistics. Refer to the section called “Output Statistics” for details.
This section describes prosync commands. Optional parameters are enclosed in square brackets.
prosync config generate [-f|--formatjson|yaml] [-o|--outputfilename] [-c|--source_config] [global_options]
Generates a prosync configuration file.
-f json|yaml--format json|yamlSpecifies the format of the configuration file: JSON or YAML. The default is YAML.
-o filename--output filenameSpecifies the filename of the configuration file. By default, the configuration file will be output to stdout.
-c--source-configUse the procopy configuration file as the basis of the prosync configuration file and add missing options there.
prosync init -f|--filefilename[-u|--update-config] [--allow-partial-shardman-read] [--allow-shardman-local-tables] [global_options]
Gets the current SCN from the source database. All the hosts specified in the configuration file must be accessible.
-f filename--file filenameName of the configuration file, where the connection options are specified.
-u--update-configUse the procopy configuration file as the basis and add missing options there.
--allow-partial-shardman-readRun without the specification of all cluster shards in the configuration file. This may result in data loss.
--allow-shardman-local-tablesProcess local tables of a Shardman cluster. This may result in inconsistent data in the destination.
prosync run -f|--filefilename[--log-levelerror|warn|info|debug] [--dry-run|--read-only] [-c|--clear] [--with-table-stats] [--with-dev-stats] [--allow-key-exclude] [--allow-partial-shardman-read] [--allow-shardman-local-tables] [--final-scn] [--skip-scn-save] [global_options]
Launches the replay of changes to the source DB on the destination DB. Outputs statistics on the replay.
-f filename--file filenameName of the configuration file, where the connection options are specified.
--log-level error|warn|info|debug
Logging level. The default is info.
--dry-runTurns off writing to the destination. prosync will perform all the operations except writing. This flag is useful for checking the performance of writing.
--read-onlyTurns off all the prosync logic except reading from the source. This flag is useful for checking the performance of reading.
-c--clearClear the screen before each next output of the statistics. When enabled, it is recommended to output logs to a file, as follows:
prosync run ... -c 2>prosync.log
--with-table-statsCollect and report the progress per tables and operations. Refer to the section called “Output Statistics” for details.
--with-dev-statsCollect and report the statistics for development. Refer to the section called “Output Statistics” for details.
--allow-key-excudeAllows excluding key columns from replication. If enabled, prosync may not work correctly for tables with unique constraints and heap tables when including or excluding certain columns. Set to true with caution and at the risk of data loss.
--allow-partial-shardman-readRun without the specification of all cluster shards in the configuration file. This may result in data loss.
--allow-shardman-local-tablesProcess local tables of a Shardman cluster. This may result in inconsistent data in the destination.
--final-scn stringSCN to stop the syncronization. The default is “0”.
--skip-scn-saveSkip saving the SCN.
The following statistics is output regardless of the options:
Total Measures — general statistics as a table:
Read — number of operations read from
the source.
Write — number of operations written to
the destination.
Transactions — number of transcations read.
Read bytes — number of bytes read.
Write bytes — number of bytes written.
Shards — statistics on shards as a table (for Shardman source):
Shard name — name of a shard. If global tables are transferred,
a row with the shard name and the “_global” postfix
is added with the statistics on transferring global tables from
this shard.
State — name of the current status: receiveLogs
— reading or waiting for reading data from the source;
processLogs — processing the read data;
reconnecting — reconnection.
Time lag — time of the destination lagging behind the source.
Computed as the difference between the completion time of the
last transferred operation in the source and the current
time on the source server.
Bytes lag — destination lagging behind
the source in bytes.
Computed as the difference between the LSN of the last transferred
operation in the source and the current
LSN on the source server. The operations on all the tables on
the source server are taken into account.
Avg 60 sec, speed — the average speed of
writing bytes to the destination for 60 seconds.
Additional — more general statistics:
Confirmed data time — time of the transferred
data on the source server. Computed as the current source time
minus the maximum Time lag. This value can be
useful to check data in the source and destination.
App time — current time in the
prosync application.
App duration — time of
the prosync operation since the
process start.
If --with-table-stats is specified, more statistics
on tables is output:
Tables Measures — more statistics on tables as a table:
<schema>.<table_name> Insert —
number of transferred Insert operations, where
<schema>.<table_name> is the name of the schema and table
in the destination and Insert is the type of the
transferred operation. Each transferred operation is in a separate
row.
<schema>.<table_name> Write bytes
— number of bytes transferred.
If --with-dev-stats is specified, more statistics
is output:
Total Measures:
Read time — Time spent reading data.
Process time — Time spent processing data.
Write time — Time spent writing data.
Write batch count — Number of batches
written to the destination.
Write, current applying batch importers —
Number of loaders that are currently writing batches.
Additional
Latest batch insert time —
Time the last batch was written.
prosync complete [-ffilename] [global_options]
Clears the replication slot used by run.
The slot name is determined from the configuration file,
whose default name is config.yaml.
--log-level string
Log level of the command. Possible values are error,
warn, info, and debug.
The default is info.