Thursday, October 16, 2008

SSIS and the Perils of Embedded SQL

The perils of embedded sql are mainly due to the fact that the system in which the sql is embedded into does not have any mechanism also embedded into it such that it can parse and understand the sql itself. The reason why this is bad is that if the underlying database schema changes, then the system has no way of knowing that the embedded sql no longer matches that schema, and can therefore cause the developer to spend a lot of time attempting to debug the wrong problem.

So, to use this information in the context of SSIS, let's say that you intend to use a temporary table to handle your staging area, as per this blog post, here:

http://blogs.conchango.com/jamiethomson/archive/2006/11/19/SSIS_3A00_-Using-temporary-tables.aspx

In that post, according to steps 3 and 4, it is suggested to add an "Execute SQL Task" to handle the create statement -- it is this task that I would call "unmanaged", because it required you to either manually write the create statement, or else generate it with some other tool, such as Management Studio.

If that statement was badly formed, even if it parsed correctly, would SSIS know what the problem was? What if, for instance, you accidentally created a text field with a length of 50, but it was supposed to be 150? What kind of results would you get back from SSIS? And what if there was a new field created or remobed in/from the schema, well now you must manually hunt down all occurences of embedded sql and manually figure out how to correct the problem. (Granted the 'hacker' in me might think to use search and replace on the ASCII nature of those files, but who knows what other problems that could lead to...)

Whereas the programmer in me says that all of the schema information needed to create that temp table already exists and is accessible to SSIS, and if any modification such as special restrictions etc. are needed, they could be handled with a properties screen; errors would be avoided upfront, and if the schema changed, the "management process" could easily detect and either correct it automatically, or steer the developer to correct it with accurate information about the problem.

An example of where SOME management does already exist in embedded sql within SSIS is with the SCD component -- it generates an Ole DB Destination, and an Ole DB Command; if the inputs to the command object change, invalidating the underlying embedded sql, a warning/error icon is displayed on the object, and when you double click that object, it immediately comes up and tells you which fields are in error, and offers choices for how to handle the problem.

Finally, let me just give you a little perspective on where I'm coming from... I hold a Bachelors Degree in Information Technology -- that is to say that I am not, and am not intended to be, the best programmer on the block... My focus is the accurate flow of information. Despite this, I know so much about the problem, the schema, and programming in general, that I should be able to use SSIS to do almost anything "simple" without having to resort to work-arounds that involve the nitty-gritty details of "hard-core sql programming". Yes, I also know enough to know how to learn those nitty-gritties, but that is not what my job is, and diverting from what my job actually is will dillute my abilities to get my regular work done.

No comments: