Archive for the ‘Programming Languages’ Category

Literature Review: PEGs

Thursday, June 16th, 2011

Parsing Expression Grammars, or PEGs, are syntax-oriented parser generators, meant to ease the task of building rich programming languages. I had had the opportunity to tinker with PEGs sparingly and, finally, I got around to reading the original paper (available here: http://pdos.csail.mit.edu/~baford/packrat/popl04/). My reading notes from the paper can be downloaded here:

http://www.mad-computer-scientist.com/blog/wp-content/uploads/2011/06/peg.html

I am fully aware that this is not, as it were, a new paper. It came up originally in my searches for a good parsing library in Common Lisp. For the project that it was intended for, I ultimately moved on to using Ometa. While Ometa is a fine system, it actually did not win on power grounds because, quite simply, I do not need the extra expressiveness for what I am working on. It won out because the implementation was better than the PEG library I had tried.

As it is kind of old territory, my review has little to say. In reality, when I first ran across PEGs I felt strangely out of the loop, but here goes anyway:

PEGs are a powerful mechanism for defining parsing grammars. The form of the language itself is similar to standard EBNF in its general layout, but allows native creation of grammars. It avoids the ambiguities inherent to Context Free Grammars by using prioritized selection of paths through the grammar. As a result, it is actually more powerful than traditional CFGs while being simpler to use.

While PEGs seem to have also caught on a lot better across than its predecessors (discussed in the paper), the seem to receive less notice than Ometa, which further builds on PEGs.

Polymorphism, Multiple Inheritance, & Interfaces…Pick 2.

Saturday, February 5th, 2011

The title for this post comes from a statement that was brought up by a  coworker as having been said to him. The overall point of this post will be simple: given that choice, your answer should be obvious: you want polymorphism and multiple inheritance, because there is nothing that you can do with interfaces that you cannot do with multiple inheritance.

Interfaces provide two things, depending on their use: a form of multiple inheritance in languages that do not otherwise support it and design-by-contract capabilities. Clearly, in the former case, you are better off with multiple inheritance, as you receive the full power of the feature. In the latter case, it is trivial to create an almost-empty class that acts as an interface, if that is the effect you are after.

The main objection raised was the counter example: what if you have a class Animal and another class Plant. Surely you do not want a programmer to inherit from both? That would not make sense. To which I would answer Why not? If it makes sense to whomever wrote it, why prevent it? They might, after all, be creating something for the little shop of horrors.

Largely, I  think the thinking that interfaces are somehow superior to multiple inheritance comes from never having used multiple inheritance in a system built from the ground up to support it (like CLOS in Common Lisp) as multiple inheritance strictly supersedes interfaces.

One too many Tiers

Wednesday, January 12th, 2011

Something has been nagging me lately about the three tier architecture–quite simply, it has too many tiers. If you subscribe to the full three tier architecture, you have an application that, at the end of the day, looks like this:

Yet, if you are using that architecture, you are almost certainly using it with an object oriented programming language–and if both things are true, there is a problem. It’s nature may not be immediately obvious, but it is there nonetheless: this flavor of the n-tier architecture defeats the entire point of object oriented programming.

To review, one of the upside of object orientation is that data and the operations performed on it are encapsulated into a single structure. When so-called business rules (operations, really) are split into ancillary classes (the BL classes), encapsulation is broken. In effect, we are using object oriented techniques to implement procedural programming with dumb C-style structs.

The true value in the multitiered architecture is actually far simpler than this birthday-cake methodology that has been faithfully copied into so many projects: keep presentation and logic separate. Any good methodology gets this much right (like MVC).

In conclusion, the remedy is simple: if you have or are building an application with a multitiered architecture, make your code base cleaner and more intuitive by merging the BO and BL layers.

The “Business Perspective” is a False Canard

Thursday, February 4th, 2010

Flipping through the C++ FQA, the phrase “from a business perspective” popped up a number of times and it occurred to me how often I have heard that phrase or something like it to refer to the needs of the management as opposed to the needs of a programmer. In fairness, I must also add that the author of that fine document is semi-quoting from the C++ FAQ. As I stared at those words, something jumped out at me: when it comes to tech, there really isn’t any such thing as a “business need” because the geeks and the suits ultimately want the same thing.

What kinds of things do we find in the “business perspective”? Well, how about these:

  • Economy of price – we need to keep costs down in order to increase our margin
  • Economy of time – closely intertwined with economy of price, but still separate in that we want to get our product to market ASAP, even aside from price, to help grab up marketshare
  • Capability – it must do whatever it is that we need it to do

That should pretty much cover it. The friction between the two groups does not come from these basic wants. Developers do not want to work longer on a project than is necessary. By and large, they want to do it and move on.

About the only time I see that there is a real collision, is when developers try to make their own jobs a little more interesting. Even here, we see that this is mostly unconscious. The developer trying to interestify the job usually believes consciously that they are solving some problem that is stalking the whole project. This is hardly unique to the developer side of the equation as we (or, at least I) have seen the business types getting all distracted by shiny little trinkets.

So, then, at the end of the day, the friction seems to come less from the core concerns (which are, more or less, shared by both parties) but how they are perceived. But the phrase “business perspective” is a lead-in to a pack of nonsense.

Ruminations on Literate Programming

Thursday, November 12th, 2009

First off, I would like to begin by saying that this post will be a little different than usual. It is not so much an explanation, a tutorial, or the asserting of an opinion (and you all know what an unopinionated fellow I am), as it is a monologue like discussion. Running through possibilities, tossing out ideas, but it is not likely to present any firm conclusions. So, here we go.

I recently read Donald Knuth’s paper on WEB, a literate programming system that he wrote with others for their own use. The paper is listed in the references section. At first glance, literate programming makes perfect sense in academia. Code is not written that is not intended to be published as either a paper or a book. Using literate programming makes the task of doing both easier. As the content of the code changes, the commentary itself is readily changed to match.

The question that comes to mind, though, is whether or not literate programming has potential for the working programmer. In Academia, the real work is generally not the programming. The programming itself is merely a way to try to prove whatever the hypothesis is. It is the equivalent to a test in a physics laboratory. The working programmer is not using his code as a mere test. It is the final product and it has to work. Moreover, it must also be delivered in a reasonable (or, more often, unreasonable) amount of time. In this different atmosphere, does literate programming still have a place? Would it work as well for someone writing code to track truckloads as it does for Knuth when he writes his books? At first glance, it would be easy to say no, but Knuth extols the methodology for reasons that most programmers would find appealing. He intimates the LP makes maintenance easier.

If there is anything the working code monkey would love to see, it is an easier job in maintenance. Most of us have had that experience of looking at a screenful of code and wondering what he (or I) was thinking when this was written. If we are writing down are reasoning with the code, then the questions go away. We may not agree with the reasoning, but at least we would understand the angle from which the problem was hit. Naturally, most people would do as poor a job of maintaining an essay as they would the comments (there are virtually no comments in my production code). As with any methodology, its utility stands on its practitioners, not on its non-practitioners.

On StackOverflow, several users run down the idea as being outdated or outmoded, being suited to the dark days of when we were limited to two-character variable names. While the utility may be increased under such conditions, they have missed the point. Literate programming is not about writing a lot of comments–it is about writing a book or article on the problem, side by side with the problem’s solution. Literate programming is not an idea confined to a specific time. It is not a hack (as intimated). It is a way of looking at programming that turns the whole process on its head. The machine becomes auxiliary, the human audience becomes primary. It may be that this approach does not hold practical utility–but it is not something to be as lightly shoved aside as the idea of starting a completely new and independent piece of software in RPG III.

These rambling thoughts led me to look into some present day tools (even Knuth’s own WEB has been superseded, it seems). The one with the biggest following is noweb, which is language agnostic. My biggest complaint as I fish through the tools I could find, is that they were almost universally using TeX as their typesetting format. Historically speaking, this makes sense. Knuth wrote WEB and TeX and, more specifically, he wrote WEB for TeX. I, however, do not want to compose text in TeX or LaTeX. As I have written before, it is just to cluttering. There are a few out there that rely on something else. I found one that used wiki syntax. At least noweb supports HTML mode which, while still imperfect for composition (as an interchange and basic display format, it is excellent), is at least usable.

Any value that LP has will largely rest on the fact that it forces the programmer to think a little bit more about what he is doing as he is doing it. In this way, it is not unlike Haskell’s type system (which also makes it unsurprising that the Haskell community is one of the more vibrant outposts for LP).

A lot of questions still remain. Most LP tools are usable for standard write-compile-test cycles. For languages like Lisp, a separate tool would have to be created (not that a lot of weekend warrior projects do not already exist). On StackOverflow, a few users expressed concern for how you would use LP in a collaborative environment. Personally, I would suspect that it would work similar to the way that most technical writing team works: divide and conquer. Distributed source control systems like git or darcs make this even easier.

So what is it then? Academian pipe dream or underused tool? There is only one way to find out. Try it.

References

What is the Point of this?

Sunday, November 8th, 2009

I recently stumbled across some articles on WS-BPEL. BPEL stands for Business Process Execution Language. At first this caught my attention because, well, it sounded like some potentially slick DSL that would help describe business rules and execute them. Slapped in front of a good domain-specific API, something like this could help slash development time. Of course, such things are usually little more than pipe dreams, but today’s pipe dream is tomorrow’s brave new world. So, it is always better to keep an eye on things.

Perhaps the first tip off that this had nothing new to offer is that BSPEL is based on XML. Seriously, how can much good come from XML? Even the few times where the end result is cool (like WSDL and SOAP), a better interchange format could have been chosen. Imagine, for example, a YAML or JSON based web services platform? With wider support that would just rock. But I digress.

Here is a tutorial of sports on WS-BPEL. When you get past the buzz words and the fancy terminology, you have an XML based scripting language to tie basic web services together. Pretty disappointing. After looking at the examples, I do not see any way that this wins out over using Java, C#, or PHP. It is quite a stretch to refer to what this thing does as having anything to do with “business processes”. Even an IBM reference on the subject just shows a few simple control mechanisms joined up with the ability to call web services.

So, if you have seen this used in the wild to an efficacy above and beyond typical programming or scripting languages, please drop me a line or a comment—because this looks like buzzword tag soup.

Levenshtein Rocks

Friday, October 23rd, 2009

The company I work for is running a project in which various numbers are getting scanned. Often, the barcodes were missing or illegible and had to be typed by hand. On the backend, we found that a great many of them were subtly wrong. For example, O (letter oh) and 0 (number zero) were swapped. Well, it’s pretty easy to drop in a quick AJAX callback that checks the barcode number to make sure it is on file. I thought it would be cool, though, to have the program suggest the correct number to the user. If they were right and it was just something we hadn’t seen yet, then they could just leave it be. If not, the system would give them a much better idea where they were messing up.

Meet the Levenshtein distance. I had heard of it before (it is commonly used in spellcheckers), but never had a reason to use it. A quick googling showed gave a blog post in which the writer implemented the dynamic programming implementation of an algorithm to find the Levenshtein distance as a MySQL UDF. It worked beautifully.

A Quick PL Thought

Wednesday, September 30th, 2009

I hate programming languages that make me do a lot of typing at a stretch. If I can type for a long time, it means I don’t have to think while I’m doing it and if I don’t have to think about it, it can be automated–and if it can be automated, the blasted machine should be doing it anyway.

Lots of Insipid Stupid Parentheses

Tuesday, September 22nd, 2009

For a bit of private research, I was reading some papers on MLisp, a Lisp dialect (pre-processor, technically as it simply compiles its input into normal, S-expression Lisp code) based on M-expressions. Given that the first paper I read was published in 1968, it seems that people have been griping about Lisp’s parentheses for almost as long as there has been a Lisp to complain about. Of course, as Bjarne Stroustrup said, “There are only two kinds of languages: the ones people complain about and the ones nobody uses.”

Some of the original motivations behind MLisp have fallen away. For example, the MLisp User’s Manual mentions three motivations (page 2):

  1. The flow of control is very difficult to follow. Since comments are not permitted, the programmer is completely unable to provide written assistance.
  2. An inordinate amount of time is spent balancing parentheses. It is frequently a non-trivial task just to determine which expressions belong to which other expressions.
  3. The notation of LISP is far from the most natural or mnemonic for a language, making the understanding of most routines fairly difficult.

Both Scheme and Common Lisp (pretty much the only remaining living variants of Lisp) provide comments. Since R6RS, Scheme includes multiline comments as well as single line, so this motivation is clearly gone. Two and Three really have no business being separate. They both say that Lisp is hard to read, something that almost thirty years later, to the point where Peter Seibel’s 2003 book, Practical Common Lisp briefly addresses this objection near the beginning.

Here is a snippet of the result, from Enea, Horace (1968) MLISP

A:=DO I :=I+1 UNTIL FN(1);
RR:=#:acAD(),READ()>; A:=DO I :=I+1 UNTIL FN(1); B:=COLLECT <I:=F'N(I)>UNTIL I EQ 'END; WHILE ,((A:=READ()) EQ 'END) DO INFUT(A); C:=WHILE 7((A:=READ()) EQ 'END) COLLECT 4.b; FOR I ON L DO FN(1); J:=FOR I IN L DO FN(I) UNTIL QN(1); FOR I IN 1 BY 4 TO 13 DO FN(1); FOR I IN 1 TO 10 DO FN(1); J:=FOR I IN L COLLECT FN(1); J:=FN(FUNCTION(+), FUNCTIC!N(TIMES)); J:=<g,2>,<4,<6,8>>> SUB ~,l>; J:=q(a Y 2 ., 3 9 4 ? 5 Y 6Y 7 Y 8 Y gY o>); OFF; END. (Input follows end.)

This MLisp, which looks like an evil union between Pascal and Basic, is the result of one of (if not the) earliest attempts to solve that problem of those pesky parentheses. So, over forty years ago, we get to see two traditions established:

  1. People whining about the parentheses in Lisp
  2. People using Lisp to build DSLs

And the world ever is as it always was.

Web Servers in the Language du Jour

Tuesday, September 22nd, 2009

Has anyone besides me noticed an increased tendency for people to write new web servers in their language du jour? For example, we’ve got the WebServer CodePlex project to write one in C# .NET. Django packages one written in Python, for development purposes. Ruby has Mongrel. There is Hunchentoot for Common Lisp. Heck, I even found a Perl one on SourceForge whose last file release date was in 2000.

The height of absurdity comes with nanoweb, a web server written in PHP. That just seems wrong, like the programming gods should strike someone down for even thinking about it. That’s right. It’s not enough to watch the world blow security holes in PHP web applications, now they get to do it in PHP web servers, too. That’s just great.

Whatever happened to good old C-based web servers, like Apache? About the only one in that list I can really see is Django’s. It really does simplify development by allowing you to push deployment details off until you are ready to deploy. Visual Studio does the same thing when you are testing ASP.NET applications. The other ones, though, actually want to be production web servers. Django warns you against deploying on the development web server. About the only way you could use Visual Studio’s (which, dollars to donuts, is probably just a stripped down version of IIS) is to run the project in debug mode on the server in an instance of Visual Studio–which would be just plain stupid. Hunchentoot is also nice, because few web servers have good tools to integrate with Common Lisp. About the best you’ll do is straight CGI or mod_lisp–and, with mod_lisp, you will still have to interact with the module at a fairly low level (which I found disappointing).

If you are running a web application for the whole world to see, than you are far better off with a larger-scale HTTP server, like Apache, IIS, or Lighttpd. If you are using embedded applications, use one of the micro C-based servers–you’ll need those precious ounces of resources that C can save even more if you are embedding the thing in a printer or something like that.