
Software Practice Lead
You've created tables, evolved their schemas, and partitioned them for performance. Now let's talk about one of Iceberg's killer features: time travel.
Every write operation in Iceberg creates a snapshot—an immutable record of the table's state at that moment. These snapshots aren't just for show. You can query them. You can roll back to them. You can use them to answer questions like "what did this table look like yesterday?" or "what was the customer's balance before that accidental UPDATE?"
In this exercise, you'll make changes to a table, explore its snapshot history, and query previous versions of the data.
By the end of this exercise, you will be able to:
Start the environment if needed:
docker compose up -dRun Spark SQL:
docker compose exec -it spark-iceberg spark-sql --conf "spark.hadoop.hive.cli.print.header=true"Let's create a fresh table for this exercise. We'll use a smaller dataset so the snapshots are easier to follow:
CREATE NAMESPACE IF NOT EXISTS demo.ecommerce;
USE demo.ecommerce;
CREATE TABLE customers (
customer_id BIGINT,
name STRING,
email STRING,
balance DECIMAL(10, 2),
status STRING
)
USING iceberg
TBLPROPERTIES ('format-version'='2', 'write.format.default'='parquet');Insert some initial data:
INSERT INTO customers VALUES
(1, 'Alice Chen', 'alice@example.com', 1500.00, 'active'),
(2, 'Bob Smith', 'bob@example.com', 2300.50, 'active'),
(3, 'Carol Jones', 'carol@example.com', 890.25, 'active'),
(4, 'David Lee', 'david@example.com', 4200.00, 'active'),
(5, 'Eva Martinez', 'eva@example.com', 150.75, 'inactive');Check your starting point:
SELECT * FROM customers ORDER BY customer_id;Now let's make changes. In a real system, these might happen over days or weeks. We'll compress that timeline a bit.
Update 1: Bob pays off some of his balance:
UPDATE customers
SET balance = 1800.50
WHERE customer_id = 2;Update 2: Carol makes a big purchase:
UPDATE customers
SET balance = 2890.25
WHERE customer_id = 3;Uh oh: Someone runs a bad update—accidentally zeroing out all balances:
UPDATE customers
SET balance = 0.00;Check the damage:
SELECT * FROM customers ORDER BY customer_id;All balances are now zero. In a traditional database, you'd be scrambling for backups. With Iceberg, we have options.
Let's see every version of this table:
SELECT
snapshot_id,
parent_id,
committed_at,
operation
FROM demo.ecommerce.customers.snapshots
ORDER BY committed_at;You should see four snapshots:
Note the snapshot_id values as you'll need them in a moment. Also note the timestamps in committed_at.
Let's look at the data before the disaster. Find the snapshot ID from just before the mass update (the second overwrite in the snapshots table result from above), and query it:
-- Replace <SNAPSHOT_ID> with the actual ID from your output
SELECT *
FROM customers VERSION AS OF <SNAPSHOT_ID>
ORDER BY customer_id;customer_id name email balance status
1 Alice Chen alice@example.com 1500.00 active
2 Bob Smith bob@example.com 1800.50 active
3 Carol Jones carol@example.com 2890.25 active
4 David Lee david@example.com 4200.00 active
5 Eva Martinez eva@example.com 150.75 inactiveThere's your data, intact. Alice still has $1,500. Bob has $1,800.50 (after his payment). Carol has $2,890.25 (after her purchase).
You can also query by timestamp. This is often more practical—you might not know the snapshot ID, but you know "the data was correct at 2pm yesterday":
-- Replace with a timestamp between snapshot 3 and 4
SELECT *
FROM customers TIMESTAMP AS OF '2026-01-07 12:54:50'
ORDER BY customer_id;Adjust the timestamp based on your committed_at values. Any timestamp after snapshot 3 but before snapshot 4 will show you the pre-disaster state.
Looking at old data is nice, but sometimes you need to actually undo the damage. Iceberg supports rolling back to a previous snapshot.
In Spark SQL, you use the rollback_to_snapshot procedure:
CALL system.rollback_to_snapshot('demo.ecommerce.customers', SNAPSHOT_ID);Replace SNAPSHOT_ID with the snapshot before the mass update.
Now check the current state:
SELECT * FROM customers ORDER BY customer_id;The balances are restored. The bad update is undone.
Important: Rollback updates the table's current snapshot pointer to point back to the previous snapshot—it doesn't delete the bad snapshot or create a new one. You can still time travel to see the disaster if you need to. Check:
SELECT
snapshot_id,
committed_at,
operation
FROM demo.ecommerce.customers.snapshots
ORDER BY committed_at;You'll still see the same four snapshots as before, but the table now uses snapshot 3 as its current state.
You can verify that by running:
SELECT * FROM demo.ecommerce.customers.refs;See how main branch points to the snapshot you rolled back to.
Time travel is great for debugging. Let's compare what changed between two snapshots.
SELECT
curr.customer_id,
curr.name,
hist.balance AS balance_before,
curr.balance AS balance_after,
curr.balance - hist.balance AS difference
FROM customers curr
JOIN customers VERSION AS OF FIRST_SNAPSHOT_ID AS hist
ON curr.customer_id = hist.customer_id
ORDER BY curr.customer_id;You should see:
customer_id name balance_before balance_after difference
1 Alice Chen 1500.00 1500.00 0.00
2 Bob Smith 2300.50 1800.50 -500.00
3 Carol Jones 890.25 2890.25 2000.00
4 David Lee 4200.00 4200.00 0.00
5 Eva Martinez 150.75 150.75 0.00This pattern is invaluable for auditing changes or figuring out exactly what went wrong.
Exit Spark SQL:
exit;If you're done for the day:
docker compose down -vIn Exercise 5, we'll explore tagging and branching—creating named references to snapshots and implementing Write-Audit-Publish workflows.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.