Archive for the ‘Database’ Category

One too many Tiers

Wednesday, January 12th, 2011

Something has been nagging me lately about the three tier architecture–quite simply, it has too many tiers. If you subscribe to the full three tier architecture, you have an application that, at the end of the day, looks like this:

Yet, if you are using that architecture, you are almost certainly using it with an object oriented programming language–and if both things are true, there is a problem. It’s nature may not be immediately obvious, but it is there nonetheless: this flavor of the n-tier architecture defeats the entire point of object oriented programming.

To review, one of the upside of object orientation is that data and the operations performed on it are encapsulated into a single structure. When so-called business rules (operations, really) are split into ancillary classes (the BL classes), encapsulation is broken. In effect, we are using object oriented techniques to implement procedural programming with dumb C-style structs.

The true value in the multitiered architecture is actually far simpler than this birthday-cake methodology that has been faithfully copied into so many projects: keep presentation and logic separate. Any good methodology gets this much right (like MVC).

In conclusion, the remedy is simple: if you have or are building an application with a multitiered architecture, make your code base cleaner and more intuitive by merging the BO and BL layers.

ZenCart–how NOT to do Upgrades

Friday, November 26th, 2010

I am currently upgrading ZenCart. Why and where are not important. Suffice it to say, the more time I’ve spent with ZenCart, the more I realize that, open source or not, the project manages to do everything wrong.

It all started when I looked at the upgrade instructions. We were upgrading from version 1.3.8 to 1.3.9h. The essence of the instructions is to put a copy of your current install (with template modifications and all) in one directory, an unmodified version of your original install in another, and a fresh install of the new version in a third. Then, you do a diff of the installed version versus the unmodified version of the same version and manually copy your changes into the new directory. Finally, you run the automated database upgrade.

That is way to much work, especially when you consider the fact that those instructions are what you do for minor upgrades.

The process should be very simple. Backup the current setup, unpack the new files, and run the database upgrade script.

A large part of the reason this is the fact that Zen Cart also does templating wrong. Rather than stashing all of the files somewhere simple (/includes/templates/TEMPLATE, using their organization scheme), they are scattered across the entire install in the form of little overrides. Keeping track of the changes made to an install is unpleasant to begin with (source control helps, but it does not make it at all clear which files of the overly-many .php files are original and which are modifications).

When you also add the horrific security bugs that existed in the 1.3.8 line, you get an ecommerce system that I would definitely advise against using.

Regarding ORMs

Tuesday, August 10th, 2010

I cannot say that I have extensive experience with ORMs. After a recent bout or two with them, I found that there was something about them that nagged at me. Tonight, I realized what that something was. ORMs exist as a middle layer between an object oriented programming language and an entity-relational database. ORM advocates refer to what they do as solving the issue of “impedance mismatch”[1].

Herein lies the rub. ORM advocates are not solving the problem wrong, but they are solving the wrong problem altogether. They have, correctly, noted that an ER database is not an object store. However, they write the ORM as a large hack so that one can treat an ER database as an object store. They would do far better to write or use an object store–something actually designed to have a 1-1 correspondence between a “record” and an instance of a class.

There has been quite a stir lately with the rising of “NoSQL” solutions. These are, I think, little more than the old object-oriented databases coming back from the dead. While most are key-value stores and there fore not strictly speaking object-oriented, it is not hard to realize that most adapt themselves more readily to use as object stores. I think that this is where the future is heading for ORMs. In due time, the developers who today write and use ORMs will move their work over to document oriented NoSQL like databases, as these are faster (I would argue inherently, since you are not wasting time copying excess data back and forth and all about) and easier still than the ORM layer itself.

This leads to the obvious question: will a convergence of the NoSQL camp and ORMs result in the death of relational databases? I do not think so. ER databases are not bad because they are not object stores. They have their own advantages and I think they will be with us far into the future. The model has not survived forty years on account of having a poor foundation. ORMs, however, I think we can do without. Rather than solving an impedance mismatch, they are a crutch to use the wrong tool for the wrong job.

References

  1. http://www.agiledata.org/essays/impedanceMismatch.html

MySQL Hatred

Thursday, February 4th, 2010

Further anti-MySQL bile. I hate MySQL, this morning. While doing some nice sysadmin type stuff, I wanted to either lock a database down or (better yet) take it offline completely, while leaving everything else untouched. A quick command in MS SQL Server or a few clicks of the mouse, if you are so inclined. Easy. MySQL does not have this basic, basic admin feature. Hacks from the intrawebs include:

  • FLUSH TABLES WITH READ LOCK
  • Changing user permissions.

What the heck? I have to tinker with user permissions to TAKE A DATABASE OFFLINE? And flush tables locks all right–every table in every database. If you’re running one database, that’s fine. Me, I’ve got closer to 30. Sure, in our case, this is because we have a couple of apps that are badly designed.  But still. What if I had two? Say, a blog and a wiki? Same problem. Take one down, take the other down. Or fiddle with permissions. I’m sorry, this is just wrong.

Unixing away from phpMyAdmin

Tuesday, December 8th, 2009

Here at the ol’ job, we use MySQL (something that I have blogged about before) and, naturally, have phpMyAdmin installed. I seldom use it as I prefer a a nice, CLI interface. It does provide a few amenities that have log me in even when I don’t strictly speaking need it. Some of these are the editing feature (yes, I am lazy enough that sometimes I would rather not sit down and write out an UPDATE query), the printing (which is much nicer than out-of-the-box lp or lpr on *nix machines), and dumping stuff to CSV or Excel (which is nice for one-off reports that I occasionally have to run).

The last couple of days, while working on some reconciliation type reports that get a little involved, I decided to take advantage of the Unix philosophy (a tool for every job, do one thing and do it well, etc.) and make my life quicker and easier from the MySQL command line. So, here is a look at the various tweaks I’ve made.

The first thing to look at is paging. The client doesn’t do any out of the box. After jacking off with a handful of pagers (less, more, most, and w3m) I decided on w3m for reasons that will soon become fairly clear. To make mysql page, simply run the command:

pager w3m

or more, less, most, etc. Whatever command you want to be the pager. This setting can be made permanent

Next up, we have printing. This is why I chose w3m. less and most provided no way that I could see to pass the piped-in text off to a printer. If some pager coniusseur would care to correct me on this score, I am all ears.

Set:

keymap C-p SAVE_SCREEN “| a2ps -r test.txt -1 -r -f 7pts”
keymap q EXIT
keymap C-p SAVE_SCREEN "| a2ps -1 -r -f 7pts"
keymap q EXIT

The second item maps q to exit without confirmation. Out of the box, w3m always prompts. I hate being prompted. Remove at your liking. The first line maps the sequence Control + P to a SAVE_SCREEN command (which is used to dump pages to files) and then pipes it to a2ps. You can look up the options for a2ps, but the end result is that, since no output file was specified, a2ps prettifies the text handed it and sends it off to the printer.

Finally, we have that little problem of dumping to Excel. We do not have to dump straight to .xls or .xlsx format. CSV will do, despite being a poor format in general. MySQL can do this part natively by running a query like so:

select * from foo
into outfile 'someplace.csv'
fields terminated by ',' lines terminated by '\n';

This is nice, but, speaking for myself, I usually review the results before dumping them out, just to be sure it looks roughly the way I want or expect. Another good way to do this, is to simply put the query into a file and run it like this:

mysql -uuser database < query.sql

When in noninteractive mode, the mysql client outputs the records in a tab delimited format. Piping this through sed and into a text file will create a simplistic CSV or opening it in tab-delimited form with a spreadsheet app (like Excel or OpenOffice Calc) will allow it to be exported to a more friendly format.

Levenshtein Rocks

Friday, October 23rd, 2009

The company I work for is running a project in which various numbers are getting scanned. Often, the barcodes were missing or illegible and had to be typed by hand. On the backend, we found that a great many of them were subtly wrong. For example, O (letter oh) and 0 (number zero) were swapped. Well, it’s pretty easy to drop in a quick AJAX callback that checks the barcode number to make sure it is on file. I thought it would be cool, though, to have the program suggest the correct number to the user. If they were right and it was just something we hadn’t seen yet, then they could just leave it be. If not, the system would give them a much better idea where they were messing up.

Meet the Levenshtein distance. I had heard of it before (it is commonly used in spellcheckers), but never had a reason to use it. A quick googling showed gave a blog post in which the writer implemented the dynamic programming implementation of an algorithm to find the Levenshtein distance as a MySQL UDF. It worked beautifully.

All I really want in a web framework…

Saturday, May 16th, 2009

As of late, there has been a great deal of attention paid to web frameworks and these huge time savings that are supposed to be based on the framework selected. Ruby on Rails is, perhaps, the most famous and the one with the most devoted base. I played with Rails for a bit and the honest truth was that I liked the Ruby part a lot better than the Rails part. Rails tried to do a lot, to “help”–and I didn’t like it. I have written similar dislikes for CakePHP which is, by its own admission, an attempt to bring Rails to PHP. Cake is not the only attempt to give PHP users the bliss of Rails. There is also PHP on Trax and Akelos and heaven knows how many others.

It’s not just the PHP boys hopping at the Rails hype. The BBC created a Perl on Rails. Heck, on cliki I even found a project named Lisp on Lines that endeavors to provide a “Rails-like” environment in Common Lisp.

ASP.NET is a little better. It doesn’t try to help with ORM or a lot else, but I can’t help but feel that it has a tendency to be a little stiff at times. Though it is definitely one of the better frameworks out there (and, yes, it is only a framework; contrary to all-too-popular belief, you can write ASP.NET in any .NET language).

What I am saying is that I, personally, want a minimalist framework. I want to be able to assemble a toolkit based on the project at hand, not what is encouraged by the framework. I don’t want to write a plugin to be able to use a bit of JavaScript effectively. Here is what a web framework should provide:

  • Spackling over request details. I need the contents of the GET and the POST, but I sure as heck don’t want to write a GET and POST parser for every application. Preferably, this would also spackle over the exact source of the request. When writing an application, I don’t care whether the request came over CGI, FastCGI, an IIS filter, or mod_lang. I want to the variables and that’s it.
  • A templating engine. A simple, minimalist templating engine. Nothing too fancy and, if I so desire, I should be able to ignore it with impunity or even substitute another. This should be one of the most loosely coupled pieces of the framework. As time goes on, I also expect a minimum of widgets and gadgets to be provided as we get an increasingly large amount of sites and apps whose UI is AJAX/Flash/Insert-RIA-here driven. I really like Smarty for PHP work (which has a plugin API, but not controls in the ASP.NET sense). ASP.NET’s templating engine is good, but its server-side controls are biased towards a method of development that is either outdated (controls rendered and handled serverside) or opaque (how does that callback work, again?).
  • Pretty URL support. Yes, there is mod_rewrite and friends, but it makes things much cleaner to have only a couple of mod_rewrite rules and handle the rest application side.
  • Session management. HTTP is stateless, but web applications, as we all know, are not. Session management is not difficult so much as it is an annoyance to write with every app.

Data access should not, as a rule, be involved. Particularly, data access in which it is difficult to perform raw queries against the database. Generic, all encompassing data access layers are hard to use because they make a number of assumptions (in the name of simplicity) that simply do not hold true for everything. Worst of all are the schema assumptions that frameworks like Cake make.

Django is the exception to the rule, though. Their DAL is excellent. You write an object model that inherits from some base classes. This is used as a very thin layer and you can still easily (yes, easily is the key word; I know you can write raw queries in Cake and Rails, but they try to hide it as much as possible) drop down into plain old SQL. This way, the developer is not boxed in by such things as the dictionary included in the framework.

One time, I wrote an app in CakePHP and I used a term for an entity, I cannot remember what, that had a plural not recognized by Cake; now Cake requires that all tables have plural names. The plural for this word was not recognized, even though it was the correct English usage. This caused the DAL to go haywire and cost me an hour or two of debugging.

The overwhelming philosophy is to provide basic support for that which needs to be done in the act of serving up a web application and leave everything else to the user. The new wave of frameworks tries to do as much as possible for the user. In this sense, a framework is like any other tool or any other application: when you try to do the user’s work for them and are overzealous about making things “easy”, you get underfoot and make things difficult. RoR has made the “blog in 10 minutes” example almost mandatory for a new framework. Well, it’s great that you can throw together a simple blog in 10 minutes, but who wants to? Therein lies yet another rub: RoR and its kin are tuned towards mass producing almost-clones of certain types of applications. I, for one, do not like writing lots of almost-clones. I’d rather download Drupal or any one of a thousand other open source packages and configure that than write a custom almost-clone. What I want in a web framework is small. So, can I have that framework, yet?

Your Own Private Database Server with VMWare

Thursday, March 22nd, 2007

As the title implies (I hope) this post is a quick guide on how to set up a database server in a virtual machine with the aid of VMWare Server. The nice thing about Server is that it runs as a service, so our virtual server can continue running even after we shut down the console. In this example, I will use FreeBSD though you can use anything you want from Linux distro du jour, to OpenBSD, to Solaris, or even to Windows. On top of this fine OS, you will need to run some database system. I can’t make an example out of nothing, so I chose PostgreSQL. If you have never heard of it, it is, featurewise, the most advanced OSS RDBMS.

What is the motivation behind this little excercise? Well, using VMWare Server in this manner can help with testing so that you can run your apps against a server and get a little closer to real-life. You can also use it to run web apps that you may want to use personally (like a personal bug tracker), but don’t want the world to see. Finally, it is an opportunity to try something new if you haven’t done it before. So, let’s get to it, then!

The first thing you will need to do, if you haven’t done it already, is install VMWare Server. It is a free download, though it will need to register (again, free) for a serial number. So, if applicable, saunter over to http://www.vmware.com/ and let them be your guide.

Done? Okay. Next, we will need to spawn off a new virtual machine. Log into VMWare and, on the home tab, select “Create New Virtual Machine”. While your mileage may vary, I chose the following options:

OS: Other -> FreeBSD
RAM: 160 MB
Hard Drive: 8GB, Not allocated, split into 2GB (mandated by the fact that I put the VM on a FAT32 external hard drive)
Network Card: NAT (this will change later).

and defaults for the rest. The VM we just created is, obviously, completely blank. So, go to http://www.freebsd.org/ and download the disc 1 ISO (we shall not need disc 2 for this tutorial).

Once that is finished, go back to the VMWare Server Console. Select the FreeBSD machine, then click VM->Settings. Set the CD drive to point to the ISO you just downloaded. Click OK and power up the virtual machine. The machine should boot straight up to the FreeBSD installer.

From here, do a standard FreeBSD install. I will leave the explanation of that to the FreeBSD project’s quite good documentation (or, perhaps at a later date, a different tutorial). As an FYI, due to the limited use towards this VM will be put, I elected not to install ports. While useful in general, it is hard drive space wasted here. If you kept it simple, the install should finish pretty quickly. When the prompt shows up asking you if wish to reboot, switch out of the machine (Ctrl+Alt) and switch the CD back to the physical drive so that the VM will not boot to the installer again. Then choose yes. The system will reboot and bring you to a simple login prompt. Login as root with no password. Now the fun begins.

The software that we will want on this box is simple: SSH and Postgres. Postgres was the whole point in doing this and having SSH for “remote” administration is both cool and useful. While, in eventuality, we want to make this a private server, we will need to download the software off the web. Keep the networking card set to NAT (or whatever you use to connect directly to the web). Once logged in, you can determine what device equates to your “card” by looking at /var/run/dmesg.boot In the case of my VM (and therefore, probably yours as well), the card was lnc0 and to get an IP run:

# dhclient /dev/lnc0

In FreeBSD, there are two “software worlds”: ports and packages. Ports is similar to Gentoo’s portage (in fact, it is the progenitor of it) in that it is an automated build system. Packages is more similar to a basic Debian or RPM system: it downloads compressed binaries and installs them on the system. For simplicity and speed, I will use packages here. The basic way to add packages is:

# pkg_add package.tbz

In order for this to work, package.tbz must exist in the current path. Fortunately, for some packages, there is the nifty little short cut

# pkg_add -r package

Which will both download and install the specified package. Like I said, this doesn’t work, out of the box, for everything. So, let’s just use it to get what we need to do the rest. Wget is a cool utility that runs on *NIX systems. It can be used to download files from the command line. So, grab it as above:

# pkg_add -r wget

This will download and install wget in /usr/local/bin. Add this to your path by adding the following to your .cshrc:

set PATH=${PATH}”:/usr/local/bin”

Then run:

source .cshrc

Now we can use wget in all its glory. Go to FreeBSD’s website and go to ports. Search for postgresql. A quick look at the latest version tells us that, in addition to postgresql-server, we will also need gettext, gmake, libiconv, postgresql-client. Fortunately, libiconv and gettext were installed with wget. So, look at the URL for the package and, for each of the remaining, run:

# wget URL

When done, you should have TBZ’s for all of the packages you need. Install gmake, then postgresql-client, and finally postgresql-server.

Almost home free, we just need to do some more configuration on both the client side and in VMWare to finish up. First we need to initialize Postgres’s various settings. We do this with the following command:

# /usr/local/etc/rc.d/postgresql initdb

Then we start the server itself:

# /usr/local/etc/rc.d/postgresql start

We will want Postgres to start up whenever we start the machine, so, using good or not so good old vi add the following line to /etc/rc.conf:

postgresql_enable=”YES”

We will also need to create a database and db users. To create a DB nice and quick, log in as pgsql (a user installed by Postgres) and run

# createdb xyz

and voila! you have a database you can log in to.

Test everything to make sure it is working right, but at this point you should have FreeBSD running PostgreSQL just fine. The next step is to put this on a private subnet and make sure that we can access it from our host system (note: to help test, installing the PostgreSQL client on the host is recommended).

Now for the host-only part. Go to VM->Removable Devices->Ethernet Card 1->Edit and change the setting from NAT to Host Only. Then return to the guest machine and rerun dhclient. If successful, the VM will have acquired an IP on the private subnet. In the prompt, enter this command:

# ifconfig

Your card should be shown on the list with a fresh IP. Next, we need to make sure that PostgreSQL will actually listen on the port. First, edit the file /usr/local/pgsql/data/postgresql.conf. Set the following lines accordingly:

listen_addresses = ‘*’

port = 5432

This tells PostgreSQL to listen on all addresses (note that this does not have to be so, it just makes life easier in this scenario) on port 5432 (which is the client’s default). Next, edit the file /usr/local/pgsql/data/pg_hba.conf (HBA = Host Based Authentication). This file acts as Postgres’s private firewall (kind of). The rules in this file are evaluated to determine whether a given incoming connection will be permitted. I added this line to mine:

host all all 0.0.0.0 0.0.0.0 password

host refers to any SSL or non SSL port, the second field is the databases to be allowed (which can be set to a delimited list), the users that will be allowed (which again, can be specified) the IP address and mask (setting to 0.0.0.0 allows anything; think of a zero in any given portion of the address as a wildcard), and finally the type of authentication (password sets it to the good old fashioned password-based authentication; PostgreSQL has many options). Finally, go ahead and restart the PostgreSQL server with this command:

# /usr/local/etc/rc.d/postgresql restart

That’s it! Fire up a client on the host’s side and run it, specifying the IP address of the VM.

A few parting notes. I intentionally set security quite lax. Why? This is a virtual machine, not a real one, first, and secondly the only permitted connections will be off the private host machine. Be much more conscientious if this is for something remotely related to production! Secondly, while this article goes into detail only on the installation of PostgreSQL, the same lessons and principles can be applied to just about anything else.

References:

FreeBSD Handbook

PostgreSQL Manual