November 13, 2024

Database Refactoring: How to Safely Move a Database Column

Table of Contents

Maintaining uptime and data consistency during database schema changes is essential. This example shows how to safely update a database to support new requirements without downtime or data loss. Using triggers, and incremental data migration, database changes can occur parallel to application updates, ensuring both old and new versions work seamlessly. Additionally, rollback mechanisms ensure quick recovery if issues arise. Tools like Harness Database DevOps integrate these changes into CI/CD pipelines for efficient, reliable deployments.

It is crucial to maintain uptime and data consistency while implementing database schema changes. This blog post will provide a real-world example of how to safely alter a database schema to support new requirements with zero downtime, data integrity, and the ability to roll back if needed. nIt expands on a larger scenario, but deep dives into the specifics of how that database migration works.

Our scenario involves an application that tracks inventory across multiple warehouses. Each warehouse is identified by a location field that simply has the name of the city. We’ve recently added a new warehouse in Boston, GA, but our database already contains Boston – the one in Massachusetts. To accommodate this, we need to update our Warehouses table to distinguish between cities and states.

These days, uptime is critical. We cannot afford even one second of downtime, which makes this more complicated. To maximize uptime, we will ensure our database schema can work with the old and new versions of the application simultaneously while ensuring data integrity. Since certain information is tracked in different places before and after migration, maintaining data integrity requires the use of database triggers to ensure that if one location is updated, both are updated. Each change has also been developed with rollback logic so that if something goes wrong, we can fully revert to the previous stable state– with no data loss.

Breaking Down the Changes

The full changes for this scenario can be found in this GitHub repo. These changes are packaged in change-sets, and each change-set can be independently deployed or rolled back. In this blog, we will talk through each of these change-sets and discuss how it ensures safety during the database migration. These changes can be applied using Harness Database Devops or liquibase.

Changeset: Refactor-1

This change-set introduces the new columns to track city and state. It also updates the location column so that if new rows are added that do not specify the location, a unique ID is generated for them. If we did not care about 0 downtime, the ability to rollback while maintaining database consistency, or copying over the existing data, then this would be the entire change.

Full Changeset:

- changeSet:
    id: refactor-1
    author: stephenatwell
    changes:
    - addColumn:
            tableName: Warehouses
            columns:
              - column:
                  name: City
                  type: nvarchar(50)
              - column:
                  name: State
                  type: nvarchar(50)
    - addDefaultValue:
            columnName: Location
            tableName: Warehouses
            defaultValueComputed: newId()   

Changeset: Refactor-1-data-migrate

The use of triggers and transactions in this change-set is what ensures that data remains consistent.

This change-set creates a trigger to automatically populate the new City field with the existing Location data. This ensures that if the old version of the application updates location midway through the migration, then the City field is updated with the new value. This ensures data consistency even if both the old and new versions of the application are running in parallel.

This change-set also copies data from the old ‘location’ column into the new ‘City’ column. This copy is performed in small, incremental batches to minimize locking and performance impact on the production database. During each batch, a set of rows in the table is locked, and the application will not be able to modify those rows until that batch completes. By using small batches, we ensure that only a small percentage of the table is locked at once so that locks are short-lived and do not noticeably impact the performance of our application.

This changelog also disables transactions. If we did not do this, we would lose the benefit of performing our small copies in batches, as each of their transactions would be a child of the main transaction. This transaction nesting prevents locks from being released until the outermost transaction is completed.

Full Changeset:

- changeSet:
    id: refactor-1-data-migrate
    author: stephenatwell
    ignore: false
    # disable transactions so that we can do small/incremental locks during the data copy.
    runInTransaction:  false
    changes:
    - sql:
            sql: |
              CREATE TRIGGER trg_Warehouses_Location_Update
              ON Warehouses
              AFTER INSERT, UPDATE
              AS 
              BEGIN
                  SET NOCOUNT ON;
                  
                  IF EXISTS (SELECT * FROM Inserted WHERE City IS NULL)
                  BEGIN
                      UPDATE Warehouses
                      SET City = i.Location
                      FROM Inserted i
                      WHERE Warehouses.ID = i.ID;
                  END
              END;  
    - sql:
            sql: |
              -- Begin transaction to minimize lock time
              BEGIN TRANSACTION;
              
              
              WHILE 1 = 1
              BEGIN
              
                  -- Update the City column in small batches to minimize lock time
                  UPDATE TOP (2) Warehouses
                  SET 
                      City = Location
                  WHERE 
                      City IS NULL;
                  
                  IF @@ROWCOUNT = 0
                      BREAK;
              END
              
              -- Commit transaction
              COMMIT TRANSACTION; 
    rollback:
    - sql:
            sql: IF OBJECT_ID('dbo.trg_Warehouses_Location_Update', 'TR') IS NOT NULL DROP TRIGGER dbo.trg_Warehouses_Location_Update;

Changeset: boston-georgia:

This change-set updates the reference data in the Warehouses table by adding a new row for the new warehouse in Boston, GA. 

Full Changeset:

- changeSet:
    id: boston-georgia
    author: stephenatwell
    ignore: false
    preConditions:
       - onFail: MARK_RAN
       - sqlCheck:
          expectedResult: 0
          sql: SELECT COUNT(*) FROM Warehouses WHERE City='Boston' AND State='GA'
    changes:
    - insert:
            tableName: Warehouses
            columns:
              - column:
                  name: City
                  value: Boston
              - column:
                  name: State
                  value: GA
    rollback:
    - delete:
            tableName: Warehouses
            where: City='Boston' AND State='GA'

Conclusion

By carefully planning and implementing changes with considerations for consistency, uptime, and rollback, you can ensure smooth transitions and maintain integrity in your data systems. Database changes like these can be safely deployed as part of your CI/CD process using tools like Harness so that database changes no longer slow down your application delivery.

If you encounter similar scenarios and are interested in a seamless DB migration with minimal disruptions, reach out to Harness to learn more about how we can help you realize the benefits of Database DevOps

Learn More

This blog discusses part of a larger scenario that you can learn more about in our overview blog, Database DevOps: managing databases inside your CI/CD pipeline. This scenario leverages Harness Database DevOps and Harness Continuous Delivery to orchestrate this database change along side a change to application code as part of a larger CI/CD pipeline. if you are attending KubeCon 2024, a talk on this topic will also occur on thursday.

Database DevOps