Sunday, July 20, 2008

Creating a File Publishing System (FPS?)

File Publishing System? Never heard of it...

This post will be the first in a series, where I will share my methods and madness of trying to build up a new system, which can combine the best of all of these systems we've looked at today and build a system which features none of the worst elements!

I'm presenting a File Publishing System, of FPS, as a high-level application object, which could either be an application all unto itself, or be included as part of another application which must replicate files back and forth.

File Transfer Protocol (FTP)

One of goals and benefits of an FPS is to abstract the actual protocol used underneath the actual file transfers, and to assure future compatibility with protocol changes that come about later. An example of this is the ever-so-common ftp protocol which is so completely insecure that there is a call on the Internet to completely discontinue its use on publicly accessible servers, which pretty much means every single server on the Internet. the replacement for FTP is either sFTP or other secure shell mechanisms which channel all data across a completely encrypted pipeline.

The REAL problem with FTP, however, is performance based:
  • Each transaction made with the remote server has overhead:
  • It takes time to send a request to the remote server, have it do what you want, and then send you the results.
  • For very small files, the amount of time required for the overhead of transactions is GREATER than the amount of time needed to transfer the files!
"Pure" FTP programs however, compound the problem by also using FTP to collect and find the files to be transferred, and so this would surely add a great number of additional transactions. Furthermore, each time FTP needs the contents of a directory, it executes the equivalent of a DOS "DIR" command (or Linux console "ls" command) and this system call has a measureable impact on the system and thus delay in returning its results back to the client. This is all part of the overhead of FTP -- Not just per file, but "per job".

Microsoft FrontPage Server Extensions (FPSE)
Another "popular" method of file transfer that tries to overcome some of the shortcoming of FTP, however, is Microsoft's FrontPage Server Extensions. It actually does accomplish this, but has some drawbacks which we'll look at first, just to explain why we don't desire to use this method anymore.

CONS: FrontPage == Bad, FPSE == Bad
First off, FrontPage is a terrible webpage editing program -- it munges the code of the page, adds tons of additional meta-data to the page, and so on. I will admit, however, that I know people who use it succesfully and often, but I can never edit their pages with DreamWeaver. Shoot, I use it myself sometimes because it is the easiest way I know of to copy and paste the entire content of a page, pictures and all, to a local location (or even a remote location, thanks to the FPSE).

And the server extensions themselves have builtin curses, because they litter the entire site with folders named "_vti_cnf", which contain some xml-ish meta-data about each file in the folder below them. If you've ever tried to use FTP to completely upload or download an entire website which has the FPExtensions installed, then you already know even more so about the overhead of FTP trying to navigate into these FTP folders and transfer them -- most clients like WS_FTP will time out and die before it can ever finish, and then it has no memory of where to start back off from!

Another potential detriment to the FPSE is that it modifies file permissions, although this is undoubtedly touted somewhere as a security enhancement. Either way, the possibility for conflicts does exist and I have seen many cases whereby FTP users were blocked access to areas of the site for no particularly good reason.

Finally, Microsoft no longer makes FrontPage (it's been replaced with a whole suite of programs which highlight "SilverLight") and they no longer distribute (or are planning on discontinuing to distribute) the FPSE.

So What's Good About the FPSE? FrontPage Publishing!

The FrontPage publishing system is AWESOME! It has split the the process up into a client/server architecture which has intrigued me from the very first day that I saw it. I mean, let's face it -- if you knew how to ask the server to list every page in the entire site, along with the file date/time fields and so forth, all in one breath, how long to do you think it would take?

Milliseconds, I tell you, milliseconds! Okay, for giant sites, it might actually take seconds, but if you did a comparable script which enumerated that same site via FTP it would take MINUTES, instead... possibly even hours, if the site has FPSE installed and the client is not smart enough to ignore those.

So How Do We Move Forward?

That's the big question! As an information technologist, I feel pretty strongly that the protocol has changed before, and it will change again, and so it must be virtualised so that applications can have something to build on that will not be affected so harshly by vulnerabilities of underlying protocols.

Client / Server Forever!

I've developed a rather simple script, first in ASP, but soon in PHP and ASP.Net, which acts sort of like a webservice, but without the "noise of xml"; given authentication, it returns a comma delimited list of all files, like this:

FOLDER,,Folder
FILE,/activate.asp,ASP File,2211,6/5/2008 6:24:24 AM,7/20/2008 2:00:47 AM,6/5/2008 6:24:25 AM
FILE,/bc.asp,ASP File,2289,6/5/2008 6:25:27 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:27 AM
FILE,/calendar.gif,GIF Image,171,6/5/2008 6:25:27 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:28 AM
FILE,/calendar.html,HTML Document,839,6/5/2008 6:25:28 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:28 AM
FILE,/calendar.js,JScript Script File,28841,6/5/2008 6:25:29 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:30 AM
FILE,/cart.asp,ASP File,37291,6/5/2008 6:25:30 AM,7/20/2008 2:00:47 AM,6/5/2008 6:25:31 AM

This list can then be easily downloaded and parsed by a desktop application which uses additional logic to intelligently transfer the files.

To test and demonstrate the effectiveness of this, I have built a prototype Microsoft Access Database Application which completely illustrates the downloading of an entire website using only a single script uploaded to the website, and a local database which works with a cached list of site pages, downloaded and parsed from that website.

Next...

The prototype that I've built has been made using a lot of code and components that I've collected over the Internet, so I'll need to make up a little manifest of people who need to be notified that their code and solution information has been included in this project. Partially for copyrighting, but mostly because I hope they may find this project useful!

Next Week...

Next week we'll look at the server-side script where we'll talk about the security mechanisms needed to protect this, and how those can be accomplished. For the ASP script, at least, no additional permissions are needed for it because it does not write or change any information, it only enumerates all files and returns their names and properties for use by a client side application.

No comments: