Thursday, October 16, 2008

SSIS and the Perils of Embedded SQL

The perils of embedded sql are mainly due to the fact that the system in which the sql is embedded into does not have any mechanism also embedded into it such that it can parse and understand the sql itself. The reason why this is bad is that if the underlying database schema changes, then the system has no way of knowing that the embedded sql no longer matches that schema, and can therefore cause the developer to spend a lot of time attempting to debug the wrong problem.

So, to use this information in the context of SSIS, let's say that you intend to use a temporary table to handle your staging area, as per this blog post, here:

http://blogs.conchango.com/jamiethomson/archive/2006/11/19/SSIS_3A00_-Using-temporary-tables.aspx

In that post, according to steps 3 and 4, it is suggested to add an "Execute SQL Task" to handle the create statement -- it is this task that I would call "unmanaged", because it required you to either manually write the create statement, or else generate it with some other tool, such as Management Studio.

If that statement was badly formed, even if it parsed correctly, would SSIS know what the problem was? What if, for instance, you accidentally created a text field with a length of 50, but it was supposed to be 150? What kind of results would you get back from SSIS? And what if there was a new field created or remobed in/from the schema, well now you must manually hunt down all occurences of embedded sql and manually figure out how to correct the problem. (Granted the 'hacker' in me might think to use search and replace on the ASCII nature of those files, but who knows what other problems that could lead to...)

Whereas the programmer in me says that all of the schema information needed to create that temp table already exists and is accessible to SSIS, and if any modification such as special restrictions etc. are needed, they could be handled with a properties screen; errors would be avoided upfront, and if the schema changed, the "management process" could easily detect and either correct it automatically, or steer the developer to correct it with accurate information about the problem.

An example of where SOME management does already exist in embedded sql within SSIS is with the SCD component -- it generates an Ole DB Destination, and an Ole DB Command; if the inputs to the command object change, invalidating the underlying embedded sql, a warning/error icon is displayed on the object, and when you double click that object, it immediately comes up and tells you which fields are in error, and offers choices for how to handle the problem.

Finally, let me just give you a little perspective on where I'm coming from... I hold a Bachelors Degree in Information Technology -- that is to say that I am not, and am not intended to be, the best programmer on the block... My focus is the accurate flow of information. Despite this, I know so much about the problem, the schema, and programming in general, that I should be able to use SSIS to do almost anything "simple" without having to resort to work-arounds that involve the nitty-gritty details of "hard-core sql programming". Yes, I also know enough to know how to learn those nitty-gritties, but that is not what my job is, and diverting from what my job actually is will dillute my abilities to get my regular work done.

Tuesday, October 14, 2008

Adventures with SSIS 2005

Okay, before I get started on a rant, let me first be fair: SSIS, or SQL Server Integration Services, is a very cool and powerful new product included with SQL Server 2005 and newer. It allows you to visually design data transformations, and "ETL" (Extract, Transform, and Load) in a way that I had never thought possible.

My run-in with it is based around a major database fix I'm working on whereby a database that I've inherited from another developer has a massive no-no right in the middle of it... In a fit of "mad genious" (and no doubt under the gun of a deadline) the developer decided to attempt to circumvent the problems of a composite primary key in an important relational table of the db by creating a "derived key" by means of appending one tables pk (TripNo) to that of another (ClientNo). The resultant key was called "TripClient", and that key was then used in a multitude of other tables as a foreign key.

Like I say, there was some genious to this, but in fact there was far more madness: In order to accomplish this, both TripNo and ClientNo were created as Text values, rather than identity columns, and so the MS Access database forms were required to manage the assignment of new values using VBA. Furthermore, there was just no way in hell that the database could maintain the TripClient values, and the code that was created to manage the keys had terrible bugs which ended up in massive data integrity issues. None of these columns except for TripNo could be converted "in place" to actual numbers due to various issues, such as the presence of spaces, letters, you name it, and where these errors existed in the ClientNo, they were appended into the TripClient value, causing to be ever more invalid than ever and perpetuated through the entire database!

Worse, because the original developer had trouble assiging unique values, he resorted to adding and subtracting the number 10,000 to and from the attempted new ClientNo for new clients, and sometimes this would fail on the subtraction side, causing the ClientNo to grow in intervals of 10,000 at a time! Crapola I tell you, crapola!!!!

So, all that said, there was no possible way to upgrade/fix the database in place.

My answer? To create a new, empty, shell of the database, correct/normalize all tables using identity columns (and tossing out the whole composite key issue anyhow, because it violates the business rules! ie, far more madness than genius, once again...) and then to transform all existing data into the new database.

Enter SSIS...

After writing several lines of code and testing the performance of my efforts, I realized that I was never going to finish the job... Due to the nature of the changing primary key values of two of the most critical tables in the database, the transform that I was trying to create was just not working. The very nature of the transforms did not map well to the RBAR (row by agonizing row) code I was writing, and did not allow me to visualize the processes.

The minute I began using SSIS, however, I realized that this was the ultimate place to be performing this transformation, and was sure that I was destined for almost immediate success...

Have I been successful yet, though? No, not yet... as it turns out, SSIS is completely ignorant of "blob data fields", ie text and ntext, which are critical fields commonly used throughout the database for storing notes about the "people, places, and things" documented within the database. Because of this a whole host of work arounds are suggested by other afflicted users, and these things.

But before I go into what went wrong, I'm tempted to ask: Am I still doing this wrong??? Maybe I should post before and after diagrams of my schema and seek the advice of those with greater experience and training in this field.

What do you think?

Sunday, July 20, 2008

FPS: References and Credits

Something that I looked at, but Didn't Use...

I saw this system in Database Journal, but didn't like it, even though seemed pretty fast:
http://www.databasejournal.com/features/msaccess/article.php/3513061

The basic idea that he's working off of is what I call "turd files", which I HATE! I suppose for one thing, I feel like they represent some sort of security risk, but moreover you must manage them... delete them? overwrite them? save only newer? No thanks, it's hard enough to keep a clean site without additional clutter.

What I Did Use...

All ftp transactions are handled through a class object -- this is good because an additional layer of object could be placed on top which would allow the application/object user to select which type of transfer method to use, dynamically, to achieve the needed results/compatibility.

The class object, "clsFTP", was sourced from here:
http://bytes.com/forum/post810718-12.html
The link above also gives a little further source that requires Google groups permissions.

Creating a File Publishing System (FPS?)

File Publishing System? Never heard of it...

This post will be the first in a series, where I will share my methods and madness of trying to build up a new system, which can combine the best of all of these systems we've looked at today and build a system which features none of the worst elements!

I'm presenting a File Publishing System, of FPS, as a high-level application object, which could either be an application all unto itself, or be included as part of another application which must replicate files back and forth.

File Transfer Protocol (FTP)

One of goals and benefits of an FPS is to abstract the actual protocol used underneath the actual file transfers, and to assure future compatibility with protocol changes that come about later. An example of this is the ever-so-common ftp protocol which is so completely insecure that there is a call on the Internet to completely discontinue its use on publicly accessible servers, which pretty much means every single server on the Internet. the replacement for FTP is either sFTP or other secure shell mechanisms which channel all data across a completely encrypted pipeline.

The REAL problem with FTP, however, is performance based:
  • Each transaction made with the remote server has overhead:
  • It takes time to send a request to the remote server, have it do what you want, and then send you the results.
  • For very small files, the amount of time required for the overhead of transactions is GREATER than the amount of time needed to transfer the files!
"Pure" FTP programs however, compound the problem by also using FTP to collect and find the files to be transferred, and so this would surely add a great number of additional transactions. Furthermore, each time FTP needs the contents of a directory, it executes the equivalent of a DOS "DIR" command (or Linux console "ls" command) and this system call has a measureable impact on the system and thus delay in returning its results back to the client. This is all part of the overhead of FTP -- Not just per file, but "per job".

Microsoft FrontPage Server Extensions (FPSE)
Another "popular" method of file transfer that tries to overcome some of the shortcoming of FTP, however, is Microsoft's FrontPage Server Extensions. It actually does accomplish this, but has some drawbacks which we'll look at first, just to explain why we don't desire to use this method anymore.

CONS: FrontPage == Bad, FPSE == Bad
First off, FrontPage is a terrible webpage editing program -- it munges the code of the page, adds tons of additional meta-data to the page, and so on. I will admit, however, that I know people who use it succesfully and often, but I can never edit their pages with DreamWeaver. Shoot, I use it myself sometimes because it is the easiest way I know of to copy and paste the entire content of a page, pictures and all, to a local location (or even a remote location, thanks to the FPSE).

And the server extensions themselves have builtin curses, because they litter the entire site with folders named "_vti_cnf", which contain some xml-ish meta-data about each file in the folder below them. If you've ever tried to use FTP to completely upload or download an entire website which has the FPExtensions installed, then you already know even more so about the overhead of FTP trying to navigate into these FTP folders and transfer them -- most clients like WS_FTP will time out and die before it can ever finish, and then it has no memory of where to start back off from!

Another potential detriment to the FPSE is that it modifies file permissions, although this is undoubtedly touted somewhere as a security enhancement. Either way, the possibility for conflicts does exist and I have seen many cases whereby FTP users were blocked access to areas of the site for no particularly good reason.

Finally, Microsoft no longer makes FrontPage (it's been replaced with a whole suite of programs which highlight "SilverLight") and they no longer distribute (or are planning on discontinuing to distribute) the FPSE.

So What's Good About the FPSE? FrontPage Publishing!

The FrontPage publishing system is AWESOME! It has split the the process up into a client/server architecture which has intrigued me from the very first day that I saw it. I mean, let's face it -- if you knew how to ask the server to list every page in the entire site, along with the file date/time fields and so forth, all in one breath, how long to do you think it would take?

Milliseconds, I tell you, milliseconds! Okay, for giant sites, it might actually take seconds, but if you did a comparable script which enumerated that same site via FTP it would take MINUTES, instead... possibly even hours, if the site has FPSE installed and the client is not smart enough to ignore those.

So How Do We Move Forward?

That's the big question! As an information technologist, I feel pretty strongly that the protocol has changed before, and it will change again, and so it must be virtualised so that applications can have something to build on that will not be affected so harshly by vulnerabilities of underlying protocols.

Client / Server Forever!

I've developed a rather simple script, first in ASP, but soon in PHP and ASP.Net, which acts sort of like a webservice, but without the "noise of xml"; given authentication, it returns a comma delimited list of all files, like this:

FOLDER,,Folder
FILE,/activate.asp,ASP File,2211,6/5/2008 6:24:24 AM,7/20/2008 2:00:47 AM,6/5/2008 6:24:25 AM
FILE,/bc.asp,ASP File,2289,6/5/2008 6:25:27 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:27 AM
FILE,/calendar.gif,GIF Image,171,6/5/2008 6:25:27 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:28 AM
FILE,/calendar.html,HTML Document,839,6/5/2008 6:25:28 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:28 AM
FILE,/calendar.js,JScript Script File,28841,6/5/2008 6:25:29 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:30 AM
FILE,/cart.asp,ASP File,37291,6/5/2008 6:25:30 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:31 AM

This list can then be easily downloaded and parsed by a desktop application which uses additional logic to intelligently transfer the files.

To test and demonstrate the effectiveness of this, I have built a prototype Microsoft Access Database Application which completely illustrates the downloading of an entire website using only a single script uploaded to the website, and a local database which works with a cached list of site pages, downloaded and parsed from that website.

Next...

The prototype that I've built has been made using a lot of code and components that I've collected over the Internet, so I'll need to make up a little manifest of people who need to be notified that their code and solution information has been included in this project. Partially for copyrighting, but mostly because I hope they may find this project useful!

Next Week...

Next week we'll look at the server-side script where we'll talk about the security mechanisms needed to protect this, and how those can be accomplished. For the ASP script, at least, no additional permissions are needed for it because it does not write or change any information, it only enumerates all files and returns their names and properties for use by a client side application.

Wednesday, January 23, 2008

Smile For the Birdie!

Okay, this is pretty cool... I just stumbled upon (but not by using "StumbleUpon", mind you, although that is a favorite toy!) a new windows freebie called "Windows Live Writer", and was able to connect this desktop application directly to my Blogger blog!

Okay, yeah, you know, I'm a developer, so I can see how this would work, it's not earth shattering, but interesting. But then I see "Insert Map", and I'm thinking "Cool! I've been meaning to draw some new maps!"

But check this out! They've got new "birds eye" views. Here's my office:

Map image

We also noticed that you can rotate the views, and get different time frames on the different photos!

Cool stuff!

Saturday, April 08, 2006

Milt the Talking Musky

Check out Milt the Talking Musky!

This is a cool story that my daughter loves! On the surface, it teaches good values such as honesty, trust worthiness and others, but at the root of it is an Information Technology called "sight words", which are taught to first and second graders and helps them to to read and spell more fluently by memorizing very common words and patterns.

Visit Milt at www.Miltthetalkingmusky.com. A stimulating multi-media educational tool for 1st and 2nd graders!

Saturday, April 01, 2006

Coastal Data Enterprises

Providing Custom Software Solutions to the Low Country and Surrounding Area based on Microsoft Application technologies including Microsoft Access, SQL Server, and Exchange Server using the latest development techniques including ASP.Net, VB.Net, and C#.

Friday, March 31, 2006

Here at TECH CORNER you can find free tips and information as well as a comprehensive and searchable links database.

As part of my work at South University, I have developed software and documentation that I believe could be a valuable source of research of other programmers as they learn and grow, so I will be posting HTML versions of these along with corresponding downloads!

Additionally, I have been working with 3rd party components from Interakt; These components are nothing short of revolutionary in their comprehensiveness, and yet they can also be difficult to navigate until you get the feel for them. Read reviews, tech reports, and howto information!

As other new products come to the forefront of technology, I'll report on these, as well!