Automating the Data Migration Process: Successes, Struggles, Advice
For a 3 year-long implementation, one can expect a journey that includes many complexities and challenges. And data migrations proved to be equally as challenging a journey. Not only were the initial project requirements to completely alter and transform much of their uncleansed data, but as the project scope grew, the data migration work continued to evolve and grow as well. This blog post will discuss the problem with manual data migration for this specific project, the solution of developing an automated conversion tool, and the successes, struggles, and takeaways from the experience.
Our original data migration process was for the technical team to receive data pulled by the business leads into an excel file and push this file into D365 manually. However, data that was pulled and validated for Test Pilot 1 ended up almost completely useless by the time TP2 rolled around. As a team, we could not keep up with updates to configurations and data changes/dirtiness in their legacy system. After Test Pilot 2, data migrations became the biggest risk of the entire implementation. We knew, as a team, that we needed a tool that could remain constant and reliable but allow the source data and transformation logic to change.
After acknowledging the need for an improved data migration process, the team got to work on an automated conversion tool in hopes to cut down data transformation and data loading time. Developed in SSIS (SQL Server Integration Services) and imported via DMF (D365’s Data Management Framework), there was a 1:1 mapping between the DMF data project and SSIS Data Package object. The SSIS package extracted from the source database, transformed/cleansed/enriched/validated the data and then sent the successful records to a file, which was zipped up as a DMF data package and sent to D365 via Package API.
Once the project level package development was complete, we had the ability to chain the individual packages together to construct multiple orchestrator packages. Within the orchestrator package, we could build up entity dependencies, for example, ensuring customer accounts were loaded before starting to import customer addresses and contacts. And having multiple orchestrator packages versus just one allowed us to break the data migration when other cutover tasks needed to take place. In our scenario, all master data was loaded in one orchestration run, but there were business validations, integrations, and other manual efforts that needed to take place before we loaded transactional data.
Overall, this automation conversion tool was a success, in both pre-cutover data testing and go-live. By the end of cutover, data conversions were noted as one of the top go-live successes when, as mentioned before, it was earlier identified as the biggest risk. To strip down the entire effort and think about key factors of success, there were three areas that stood out: reliability and consistency, improved data migration timing, and error logging.
Success: Reliability and Consistency
Once the initial development work was complete, I could perform an entire data migration exercise, from start to finish, automatically. All that was needed to take source data to the target D365 system was a push of a button and some time spent monitoring and occasionally intervening. Even the newest source data had the ability to be cleaned and transformed without changing any part of the data migration automation tool. Since the SSIS package relied on rules put in place by the business to map and validate data, each record was not being manually altered. Once development was thoroughly tested and approved, this threw human error out the window during the data extract and transform parts of the ETL process. Being able to use Package API to send both the data file and the mapping information also took away further potential mapping errors that can be made when manually importing a file via DMF. There were still manual processes done by the team, such as updating mapping and validation files in Excel, but overall, the consistency and accuracy in which data was being converted into the system was much higher than previous manual methods we had tried earlier in the implementation.
Success: Improved Data Migration Timing
With data project execution triggering one after another, and sometimes multiple DMF jobs happening at once, there was no gap in the start times between entity loading. This sped up the conversion timing and we could perform a full data migration in a matter of a day or two versus several weeks. While the time to load decreased, additional time was cut in other areas. With error logging in place (which I talk about next), issues were identified early, and manual intervention decreased with every conversion run. Cutting down the time it took to convert data made it possible to do a full data migration load more frequently, which in turn allowed for more testing, resulting in more reliable rules and cleaner data. It was a cycle that continued speeding up the process with every migration effort. The timing was also further decreased by adding multi-threading to our larger entity datasets that would allow it.
Success: Error Logging
The error logging system was an important piece of the data migration tool. Since the tool pulled data from a production environment, there was new data continually being added and changed. This meant that we would run into records that did not pass currently defined data validations or could not be mapped to valid D365 target environment values. To avoid losing sight of a record, error logging was put in place for the business to review.
When a record was going through the ETL process and failed at some node, rather than trying to push that record into D365 (as we knew already it will fail), the record is marked as an error with an included error message and was diverted to an excel sheet containing all the record information as well as the error message. This way, the business could review the record and determine the necessary action to take. Error logging information was crucial during cutover as we did not want to omit records without having some way to go back and recover.
Discernibly, there were many challenges to this approach. To start, the client's data migration requirements made this automation tool a very large development effort. While the client wanted all entities to go through this automated ETL process, for some entities it took more effort to develop in SSIS than it was worth due to the complexity of the logic for the small number of records it was transforming, costing time and money.
SSIS itself had some not-so-user-friendly moments that can make updates to the package difficult, such as having to go through every single SSIS 'Union All' nodes and update the field list when additional fields are added to the conversion requirements.
The ability to handle flexibility in data changes was one of the biggest successes, but design changes proved to be quite a challenge. This can be mitigated with better and more thorough requirements gathering from the start, however, changes in design are bound to occur throughout the project's lifetime.
Considering both the successes and struggles, the automated conversion tool may or may not fit a project's data migration needs. While the convenience and reliability sound appealing, one should account for the budget, timeline, and actual need for an intensive data migration tool. Some projects may have data that requires little effort to transform and cleanse or a scale small enough to outweigh development costs. Regardless, I believe every D365 FO data migration project will benefit from the following pieces of advice:
- Loading data is most consistently done with data packages. Human error can be made when manually uploading a file in the DMF by messing up mappings from the file to the staging table. Often fields need to be auto-generated or defaulted and having all the information set in a Manifest file will automatically create the correct mapping. A data package format also concretely shows what was imported and can be used as a historical reference. A DMF data package is a zipped file containing the following files:
- Data file(s) to import (Excel, CSV, txt, etc.)
- Package Header
- The data file fields used to import should match the staging table field names. This will also help to mitigate mapping errors.
- Avoid, if possible, entity collision with integrations. Often there is custom code being used on an entity for integration purposes. There also may be a need for custom code for data migration purposes. If that is the case, it is best to have separate entities to avoid changes in code breaking one process or the other.
- Multi-thread large datasets. In fact, multi-thread as many entities as able. Not all entities can be multi-threaded and for good reason. However, in cases like customers, you have several options for data entity loading.
- With a large customer dataset, using two "multi-thread-able" entities (Customer Definitions and Customer Details) can save you loading time over using the Customers V3 entity.
- Have an error-handling or data audit system in place so data is not lost in the migration process.
- Custom record-level error messaging before sending records to the DMF
- Broader count analysis to help with testing, auditing, and progress tracking
- Work closely with data owners, if possible. They know their data the most and a good working relationship with them will more quickly align your development efforts to their expectations.
Overall, this experience has sparked ideas and possibilities for future data migration efforts that I will take part in. I have personally learned from the key successes, failures, and advice mentioned above and my aim was to provide you with the benefit of this knowledge as well. Knowing that every implementation is unique, I expect the automated data conversion process to evolve. This is not a tool ready to be passed from one project to the next, but rather a recipe for one project’s successful data migration effort, and I hope you have found some components that may work for you.