Gilles Philippart profile picture  (round 128px)

Gilles Philippart

Software Practice Lead

Exercise 4: Time Travel

Overview

You've created tables, evolved their schemas, and partitioned them for performance. Now let's talk about one of Iceberg's killer features: time travel.

Every write operation in Iceberg creates a snapshot—an immutable record of the table's state at that moment. These snapshots aren't just for show. You can query them. You can roll back to them. You can use them to answer questions like "what did this table look like yesterday?" or "what was the customer's balance before that accidental UPDATE?"

In this exercise, you'll make changes to a table, explore its snapshot history, and query previous versions of the data.

Learning Objectives

By the end of this exercise, you will be able to:

  • View a table's snapshot history
    • Query data as it existed at a specific point in time
    • Use both snapshot IDs and timestamps for time travel
    • Roll back a table to a previous state

Prerequisites

  • Completed Exercise 3 (or have the environment running)
    • About 20 minutes

Step 1: Set Up

Start the environment if needed:

docker compose up -d

Run Spark SQL:

docker compose exec -it spark-iceberg spark-sql --conf "spark.hadoop.hive.cli.print.header=true"

Let's create a fresh table for this exercise. We'll use a smaller dataset so the snapshots are easier to follow:

CREATE NAMESPACE IF NOT EXISTS demo.ecommerce;
USE demo.ecommerce;

CREATE TABLE customers (
    customer_id BIGINT,
    name STRING,
    email STRING,
    balance DECIMAL(10, 2),
    status STRING
)
USING iceberg
TBLPROPERTIES ('format-version'='2', 'write.format.default'='parquet');

Insert some initial data:

INSERT INTO customers VALUES
    (1, 'Alice Chen', 'alice@example.com', 1500.00, 'active'),
    (2, 'Bob Smith', 'bob@example.com', 2300.50, 'active'),
    (3, 'Carol Jones', 'carol@example.com', 890.25, 'active'),
    (4, 'David Lee', 'david@example.com', 4200.00, 'active'),
    (5, 'Eva Martinez', 'eva@example.com', 150.75, 'inactive');

Check your starting point:

SELECT * FROM customers ORDER BY customer_id;

Step 2: Create Some History

Now let's make changes. In a real system, these might happen over days or weeks. We'll compress that timeline a bit.

Update 1: Bob pays off some of his balance:

UPDATE customers
SET balance = 1800.50
WHERE customer_id = 2;

Update 2: Carol makes a big purchase:

UPDATE customers
SET balance = 2890.25
WHERE customer_id = 3;

Uh oh: Someone runs a bad update—accidentally zeroing out all balances:

UPDATE customers
SET balance = 0.00;

Check the damage:

SELECT * FROM customers ORDER BY customer_id;

All balances are now zero. In a traditional database, you'd be scrambling for backups. With Iceberg, we have options.


Step 3: Explore Snapshot History

Let's see every version of this table:

SELECT
    snapshot_id,
    parent_id,
    committed_at,
    operation
FROM demo.ecommerce.customers.snapshots
ORDER BY committed_at;

You should see four snapshots:

  1. The initial append (our INSERT)
  2. An overwrite (Bob's update)
  3. Another overwrite (Carol's update)
  4. The final overwrite (the accidental mass update)

Note the snapshot_id values as you'll need them in a moment. Also note the timestamps in committed_at.


Step 4: Query a Previous Snapshot

Let's look at the data before the disaster. Find the snapshot ID from just before the mass update (the second overwrite in the snapshots table result from above), and query it:

-- Replace <SNAPSHOT_ID> with the actual ID from your output
SELECT *
FROM customers VERSION AS OF <SNAPSHOT_ID>
ORDER BY customer_id;
customer_id	name	email	balance	status
1	Alice Chen	alice@example.com	1500.00	active
2	Bob Smith	bob@example.com	1800.50	active
3	Carol Jones	carol@example.com	2890.25	active
4	David Lee	david@example.com	4200.00	active
5	Eva Martinez	eva@example.com	150.75	inactive

There's your data, intact. Alice still has $1,500. Bob has $1,800.50 (after his payment). Carol has $2,890.25 (after her purchase).

You can also query by timestamp. This is often more practical—you might not know the snapshot ID, but you know "the data was correct at 2pm yesterday":

-- Replace with a timestamp between snapshot 3 and 4
SELECT *
FROM customers TIMESTAMP AS OF '2026-01-07 12:54:50'
ORDER BY customer_id;

Adjust the timestamp based on your committed_at values. Any timestamp after snapshot 3 but before snapshot 4 will show you the pre-disaster state.


Step 5: Roll Back the Table

Looking at old data is nice, but sometimes you need to actually undo the damage. Iceberg supports rolling back to a previous snapshot.

In Spark SQL, you use the rollback_to_snapshot procedure:

CALL system.rollback_to_snapshot('demo.ecommerce.customers', SNAPSHOT_ID);

Replace SNAPSHOT_ID with the snapshot before the mass update.

Now check the current state:

SELECT * FROM customers ORDER BY customer_id;

The balances are restored. The bad update is undone.

Important: Rollback updates the table's current snapshot pointer to point back to the previous snapshot—it doesn't delete the bad snapshot or create a new one. You can still time travel to see the disaster if you need to. Check:

SELECT
    snapshot_id,
    committed_at,
    operation
FROM demo.ecommerce.customers.snapshots
ORDER BY committed_at;

You'll still see the same four snapshots as before, but the table now uses snapshot 3 as its current state.

You can verify that by running:

SELECT * FROM demo.ecommerce.customers.refs;

See how main branch points to the snapshot you rolled back to.


Step 6: Compare Versions

Time travel is great for debugging. Let's compare what changed between two snapshots.

SELECT
    curr.customer_id,
    curr.name,
    hist.balance AS balance_before,
    curr.balance AS balance_after,
    curr.balance - hist.balance AS difference
FROM customers curr
JOIN customers VERSION AS OF FIRST_SNAPSHOT_ID AS hist
    ON curr.customer_id = hist.customer_id
ORDER BY curr.customer_id;

You should see:

customer_id	name	balance_before	balance_after	difference
1	Alice Chen	1500.00	1500.00	0.00
2	Bob Smith	2300.50	1800.50	-500.00
3	Carol Jones	890.25	2890.25	2000.00
4	David Lee	4200.00	4200.00	0.00
5	Eva Martinez	150.75	150.75	0.00

This pattern is invaluable for auditing changes or figuring out exactly what went wrong.


Step 7: Clean Up

Exit Spark SQL:

exit;

If you're done for the day:

docker compose down -v

What You Learned

  • Every write creates an immutable snapshot
    • FOR VERSION AS OF lets you query by snapshot ID
    • FOR TIMESTAMP AS OF lets you query by point in time
    • rollback_to_snapshot restores a table to a previous state (without losing history)
    • Time travel is possible because data files are immutable—Iceberg just changes which files each snapshot references

Practical Use Cases

  • Debugging: "What did this row look like before that ETL job ran?"
    • Auditing: "Show me exactly what data was visible on December 31st for compliance"
    • Recovery: "Undo that accidental DELETE"
    • Reproducibility: "Run this analysis on the same data we used last quarter"

Next Up

In Exercise 5, we'll explore tagging and branching—creating named references to snapshots and implementing Write-Audit-Publish workflows.

Do you have questions or comments? Join us in the #developer-confluent-io community Slack channel to engage in discussions with the creators of this content.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.