Tuesday, October 14, 2008

Adventures with SSIS 2005

Okay, before I get started on a rant, let me first be fair: SSIS, or SQL Server Integration Services, is a very cool and powerful new product included with SQL Server 2005 and newer. It allows you to visually design data transformations, and "ETL" (Extract, Transform, and Load) in a way that I had never thought possible.

My run-in with it is based around a major database fix I'm working on whereby a database that I've inherited from another developer has a massive no-no right in the middle of it... In a fit of "mad genious" (and no doubt under the gun of a deadline) the developer decided to attempt to circumvent the problems of a composite primary key in an important relational table of the db by creating a "derived key" by means of appending one tables pk (TripNo) to that of another (ClientNo). The resultant key was called "TripClient", and that key was then used in a multitude of other tables as a foreign key.

Like I say, there was some genious to this, but in fact there was far more madness: In order to accomplish this, both TripNo and ClientNo were created as Text values, rather than identity columns, and so the MS Access database forms were required to manage the assignment of new values using VBA. Furthermore, there was just no way in hell that the database could maintain the TripClient values, and the code that was created to manage the keys had terrible bugs which ended up in massive data integrity issues. None of these columns except for TripNo could be converted "in place" to actual numbers due to various issues, such as the presence of spaces, letters, you name it, and where these errors existed in the ClientNo, they were appended into the TripClient value, causing to be ever more invalid than ever and perpetuated through the entire database!

Worse, because the original developer had trouble assiging unique values, he resorted to adding and subtracting the number 10,000 to and from the attempted new ClientNo for new clients, and sometimes this would fail on the subtraction side, causing the ClientNo to grow in intervals of 10,000 at a time! Crapola I tell you, crapola!!!!

So, all that said, there was no possible way to upgrade/fix the database in place.

My answer? To create a new, empty, shell of the database, correct/normalize all tables using identity columns (and tossing out the whole composite key issue anyhow, because it violates the business rules! ie, far more madness than genius, once again...) and then to transform all existing data into the new database.

Enter SSIS...

After writing several lines of code and testing the performance of my efforts, I realized that I was never going to finish the job... Due to the nature of the changing primary key values of two of the most critical tables in the database, the transform that I was trying to create was just not working. The very nature of the transforms did not map well to the RBAR (row by agonizing row) code I was writing, and did not allow me to visualize the processes.

The minute I began using SSIS, however, I realized that this was the ultimate place to be performing this transformation, and was sure that I was destined for almost immediate success...

Have I been successful yet, though? No, not yet... as it turns out, SSIS is completely ignorant of "blob data fields", ie text and ntext, which are critical fields commonly used throughout the database for storing notes about the "people, places, and things" documented within the database. Because of this a whole host of work arounds are suggested by other afflicted users, and these things.

But before I go into what went wrong, I'm tempted to ask: Am I still doing this wrong??? Maybe I should post before and after diagrams of my schema and seek the advice of those with greater experience and training in this field.

What do you think?

No comments: