Cross-Database Migration in One Command: A Practical Guide to ingestr Data Pipelines

22 views 0 likes 0 comments 19 minutesOriginalTutorial

Learn how to use ingestr to migrate data between databases (e.g., MySQL to DuckDB) with a single CLI command. This tutorial covers installation, full & incremental sync strategies, automatic type mapping, and troubleshooting for building lightweight ETL pipelines.

#Data Migration # ETL # ingestr # Data Pipeline # DuckDB # MySQL # CLI Tools # Data Engineering
Cross-Database Migration in One Command: A Practical Guide to ingestr Data Pipelines

Cross-Database Migration in One Command: A Practical Guide to ingestr Data Pipelines

Last week, a data analyst from our team approached me: "Hey, could you help me pull the user behavior table from our production MySQL to my local machine? I want to run some analytical queries with DuckDB."

I paused for a moment. Cross-database data migration is one of the most common tasks in data engineering, but it's always a bit of a hassle. Write a custom script? You have to install drivers, handle connection pooling, batch reads, and manage type mapping. Use Airflow? Too heavy for a one-off task. Dbt? It's great for transformation, not for moving data. If you just need to occasionally move a table from point A to point B, isn't there a lighter solution?

This week, I discovered a tool called ingestr. It allowed me to perform a full migration from MySQL to DuckDB with a single command. Even incremental sync only required adding one extra parameter. Today, I'm documenting the complete hands-on workflow. Follow along, and you'll have your first data pipeline up and running in under 15 minutes.


Prerequisites

Before we begin, ensure you have the following:

  • OS: macOS / Linux / WSL (This guide uses macOS, but Linux commands are identical)
  • Python: 3.9+ (We'll use pip for the simplest installation path)
  • A Source Database: Can be a local MySQL/PostgreSQL or a remote instance. Later in this guide, I'll show you how to quickly spin up a MySQL container with test data using Docker.
  • A Target Database: We'll use DuckDB as an example (a local file-based database with zero configuration, perfect for demos).

If you already have a MySQL or Postgres instance, skip to the next step. If not, spend just 2 minutes running a Docker container.


Step 1: Install ingestr

The official documentation provides two installation methods. I highly recommend using pip. Why? Because Python environments are readily available on almost all developer machines, and ingestr relies on dlt (data load tool) under the hood. pip will handle all dependencies automatically.

bash 复制代码
pip install ingestr

After installation, verify it:

bash 复制代码
ingestr --version

If you see the version number, you're good to go. If you prefer Homebrew or want a standalone binary, you can also run their install script: curl -LsSf https://getbruin.com/install/ingestr | sh. The result is the same.

Why pip instead of compiling from source? Although ingestr is written in Go, you don't need to worry about it. The pip package already bundles the compiled binary and the Python glue layer. For end users, pip install is the path of least resistance.


Step 2: Prepare the Source Data

To make this tutorial reproducible, we'll use Docker to spin up a MySQL instance and populate it with some test data. This command launches MySQL 8.0 and automatically runs the initialization SQL:

bash 复制代码
docker run -d --name demo-mysql -e MYSQL_ROOT_PASSWORD=root123 -e MYSQL_DATABASE=appdb -p 3306:3306 mysql:8.0

Once it's running, connect to it and create a table with sample data:

bash 复制代码
docker exec -i demo-mysql mysql -uroot -proot123 appdb <<EOF
CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(200),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO users (name, email) VALUES
  ('Zhang San', 'zhangsan@example.com'),
  ('Li Si', 'lisi@example.com'),
  ('Wang Wu', 'wangwu@example.com'),
  ('Zhao Liu', 'zhaoliu@example.com'),
  ('Qian Qi', 'qianqi@example.com');
EOF

Now we have an appdb.users table with 5 records. This is precisely what we'll migrate.


Practice: One-Command Migration (MySQL → DuckDB)

The core command of ingestr is ingestr ingest. Its design philosophy is beautifully straightforward: use URIs to tell it where to pull from and where to push to, specify a table name, and you're done.

bash 复制代码
ingestr ingest \
  --source-uri 'mysql://root:root123@127.0.0.1:3306/appdb' \
  --source-table 'appdb.users' \
  --dest-uri 'duckdb:///mydata.duckdb' \
  --dest-table 'appdb.users'

Breaking down the parameters:

Parameter Purpose
--source-uri Source database connection string (SQLAlchemy compatible format)
--source-table Table to migrate, formatted as database.table
--dest-uri Target connection string. duckdb:/// indicates a local DuckDB file
--dest-table Target table name

Hit Enter, and you'll see output similar to this (actual time depends on data volume):

复制代码
2026-06-04 10:15:32 INFO Starting ingestion from mysql to duckdb
2026-06-04 10:15:33 INFO Copied 5 rows in 0.8s

Your data is now sitting inside mydata.duckdb. Let's verify it:

bash 复制代码
## The simplest way is to use DuckDB CLI or a quick Python script:
python -c "import duckdb; conn = duckdb.connect('mydata.duckdb'); print(conn.execute('SELECT * FROM users').fetchall())"

You'll see all 5 complete records, including the created_at timestamp field that was automatically migrated. ingestr handled the type conversion seamlessly, so you didn't have to write any mapping logic.


Advanced: How to Handle Incremental Syncs?

Full migration is just the first step. In real-world production environments, you definitely don't want to re-pull an entire table every single time. ingestr supports three incremental strategies:

  • append: Appends new data (ideal when the source table has auto-incrementing IDs or timestamps)
  • merge: Upserts based on primary keys (useful when duplicate data needs updating)
  • delete+insert: Deletes and re-inserts (suitable for refreshing small tables completely)

Let's say we add two new records to the users table and only want to sync the incremental part:

bash 复制代码
ingestr ingest \
  --source-uri 'mysql://root:root123@127.0.0.1:3306/appdb' \
  --source-table 'appdb.users' \
  --dest-uri 'duckdb:///mydata.duckdb' \
  --dest-table 'appdb.users' \
  --incremental-strategy append \
  --incremental-key id

Here, we added two parameters:

  • --incremental-strategy specifies the append mode.
  • --incremental-key tells ingestr which field to track for increments. Since id is an auto-incrementing primary key, ingestr remembers the maximum ID from the last sync and only pulls rows larger than that next time.

Pro Tip / Pitfall Alert: --incremental-key must be a monotonically increasing field (like an auto-incrementing primary key or created_at). If your source table uses UUIDs as primary keys, incremental sync won't work. You'll need to switch to the merge strategy or use a timestamp field as the incremental key.


Troubleshooting & FAQs

1. Can't connect to the database?

Double-check the port and password in your connection string. Since ingestr uses SQLAlchemy-style URIs, remember to URL-encode special characters in passwords (e.g., @, /). For example, if your password is p@ss, encode it as p%40ss.

2. Large table migration hangs?

ingestr batches data pulls by default, but if a single table contains millions of rows, use the --batch-size parameter to control batch size. For example: --batch-size 10000. Also, ensure your source database has sufficient connection limits and read permissions.

3. How to authenticate with the target database?

For cloud data warehouses like BigQuery or Snowflake, you can pass credential file paths or environment variables in the connection string. For BigQuery, for example:

复制代码
--dest-uri 'bigquery://my-project?credentials_path=/path/to/service-account.json'

ingestr supports 20+ data sources and targets. Check the official docs for specific connection string formats.

4. Can I automate and schedule this?

ingestr is a CLI tool and doesn't include a built-in scheduler. The simplest approach is using cron:

cron 复制代码
0 2 * * * /path/to/python -m ingestr ingest --source-uri ... --dest-uri ...

For more complex workflows, integrate it with Airflow, Prefect, or Bruin (ingestr's companion product from the same company).


Summary

Today, we accomplished three key tasks:

  1. Installed ingestr in one step with pip install ingestr
  2. Migrated a MySQL users table to DuckDB completely with a single command
  3. Understood how to use --incremental-strategy and --incremental-key for incremental syncs

The core value of ingestr lies in encapsulating ETL complexity into a single command. You don't need to write Python scripts, manage connection pools, or handle type mappings manually. Just tell it where to get the data and where to put it, and it moves it for you. For backend developers and data engineers who frequently need cross-database syncs for analytics, this tool delivers significant efficiency gains.

Next Steps:

  • Try importing a local CSV file (--source-uri 'file:///path/to/data.csv') into PostgreSQL.
  • Explore the --dest-write-disposition parameter to understand replace / append / merge write strategies.
  • Check out the official ingestr documentation for a complete list of supported sources, targets, and connection string formats.

When data migration becomes this simple, you can focus your energy on extracting value from the data itself. May your data pipelines run smoothly.

Last Updated:2026-06-04 10:04:47

Comments (0)

Post Comment

Loading...
0/500
Loading comments...