Eccentric Flower:201005/use useless

From Eccentric Flower

«May 2010 «Eccentric Flower

use useless;

Those of you who have actually read my various effusions on programming stuff over the years may recall that I am not a big fan of standard code libraries.

(Those of you who have always skipped my codeish bits, or have tried to read them but found your eyes glazing over, are cordially invited to skip this entry. However, I'll try to interject a couple of bits of explanation where I can, to give you a fighting chance if you stick around.)

For the uninitiated, a library is a body of code that someone else wrote to do something that you need to do and you don't see the point of writing again from scratch. I am skeptical of libraries. Sure, I use them. There are problems they solve which I am actually incapable of solving, and problems they solve which were such a pain in the ass to solve that I'd be foolish to try to do it again myself. I am all for saving labor.

But I've noticed that there are plenty of libraries where the tradeoff between "complexity avoided" and "operational clarity" is not a good swap. My favorite example is CGI handling. (Novices: calm down, I'll explain it in a moment.) There are people who have been writing CGI-based web interactions for years and who still have absolutely no idea how GET or POST requests work, because they've never looked at the guts of how they work. Furthermore, given the stock Perl CGI library, which is four thousand lines of absolute gibberish, you could hardly blame them for not wanting to dig in.

I have a routine I wrote years ago which parses input to a Perl script from a GET or POST request in a standard HTTP environment. I've been reusing it verbatim for years. I just grab some other piece of code where I already used it and copy and paste it.

(Explanation for novices: CGI is a method of transmitting data from web pages. Say you're on Amazon. You place an order and press some form of "submit" button. The data - who you are, what you've ordered - is sent via CGI. Some script on their server receives the info, parses it, does things with it, and then probably responds by generating HTTP output - that is, sending you a fresh web page which says "Thank you for your order" or "Is this your card?" or some such.)

This routine is the incoming part - the part that takes parameters/data being sent to the script and takes them apart into meaningful variables (name = eccentricflower, purchasetotal = 79.50, and so forth). It's one of two bits of vital code you need to "do CGI." I paste it here not because I expect you to read and understand it, but so you will get a feel for the size and simplicity. This is it. This is all of it. There are no hidden dependencies, other than a global hash called %data to store all the crap you get:

sub get_CGI
{
	my @pairs;

	if ($ENV{'REQUEST_METHOD'} eq 'GET')
	{
		@pairs = split(/&/, $ENV{'QUERY_STRING'});
	}

	elsif ($ENV{'REQUEST_METHOD'} eq 'POST')
	{
		my $buffer;
		read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
		@pairs = split(/&/, $buffer);
	}

	foreach my $pair (@pairs)
	{
		my ($cmd, $val) = split(/=/, $pair);
		
		# Decode hex munging, force lowercase
		$val =~ tr/+/ /;
		$val =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
		$cmd =~ tr/A-Z/a-z/;

		$data{$cmd} = $val if ($val);
	}
}

OK, I admit that if you're handling form/multi-part (which you need for things like uploading entire files), it gets a little more complicated. Say, another ten lines. Tops.

This is the only thing that nine-tenths of the people who include CGI.pm in their code are using it for. Four thousand lines.

Furthermore, if I need to special-case, I can tinker with this code easily. Suppose that if I get a variable called "action" I don't want to throw that in with all the other %data for some reason, but do something else. That's another two or three lines, not complicated ones. If I wanted to do some special-casing with CGI.pm, it would be a complete pain in the ass.

That's the input side. Now consider the output side.

If you want to send basic HTTP content - say that you're not really doing anything more complicated than squirting HTML out to the user's web browser - and you're already working in an established HTTP environment (i.e. Apache or some other web server is calling your script) - you have the same sort of choice. You can grab all or part of the idiotically-named LWP libraries (it's from "libwww-perl"), which are so huge that many people have written alternate "lightweight" HTTP libraries; you can grab one of those so-called lightweight libraries which will still be a few thousand lines; or you can write one line of code by hand.

That's right. One line.

print "Content-type: text/html\n\n";

That's it. Don't forget the two carriage returns* at the end, they're very important. You have just initiated an HTTP stream. Everything you print to standard output from then on goes, in effect, straight to the user's web browser. Organize your code so you do all your processing before you're prepared to generate any HTML output. Then send that line, and start sending your output. When the script ends, your web server will close the connection and turn out the lights for you.

Hey, maybe you just want to redirect the user to a standard response page somewhere - like a fixed "thank you" page you don't have to send dynamically?

print "Location: http://www.eccentricflower.com/SomeFixedPage.html\n\n";

Remember, when you're running under a web server, anything to standard output goes to the web server, which is expecting some sort of HTTP response from you. The web server knows nothing about your script, but it knows how to read HTTP responses, and so it knows exactly what to do with that Location line. Just remember the two carriage returns. The second one - a blank line, basically - is the cue that "HTTP headers are done now," and if you don't send it, your web server will wait forever for you to send it more cryptic HTTP header messages.

There are people who have been writing HTTP response scripts for years, cargo-cult style, who do not understand this is all they're doing when they call a header-initiating routine somewhere deep in the heart of darkness that is LWP. I feel it is important to understand what you do. Otherwise, one day, LWP is going to fail you and you're not going to have the slightest idea how to solve a problem that someone is likely paying you a salary to solve. When they hired you they thought you knew what you were doing, and now you are scrod.

That's why I distrust libraries. And I dislike when I have to use them blindly, not just because of the huge overhead waste (four thousand lines!!) but because they are a black box. When I use one I'm admitting that I don't have the time or energy to look into that box. And that makes me a little disappointed in myself, as well as worried about the day that it breaks.

I have to do something in code I'm writing right now that I don't understand how to do. And it's possible, if the time constraints get bad, that I'll just pull in a library and call a routine in that library. But that will be the lazy way, and moreover, for me, it will be the less-maintainable way.

Because, you see, if I actually do the research and put in the lines of code to do it in the raw myself, I'll understand what I'm doing ... and as importantly, five years from now, when I have to come back to this code cold and make changes (as you do), I'll understand what I did and why I did it. Whereas if I throw in a black box, five years from now it will still be a black box.

Remember this next time someone tells you that use of stock libraries makes for more maintainable, more long-lived code. That may be true when your code is going to pass through the hands of future morons who can't read code well enough to figure out what you did and why. It is completely untrue if you are the sole programmer and you depend on self-documenting code to remind yourself what's what in code that you only look at once every few years at most. When I write something, I most likely will not see it again for a long time, by which time it will have long since been pushed off the memory shelf.

But I know I will always see it again. In my business no project ever fully dies.


* Isn't "carriage return" hilarious? I'm not the only programmer who uses it and yet it is terminology that dates from the era of the manual typewriter. I've noticed that younger hackers don't use it nearly as much, they prefer "newline" (which, by the by, is what \n means), so it's possible that one of these days, finally, long after its sell-by date, "carriage return" will eventually vanish from our lexicon.


<< older | © 2010 columbina | newer >>




Jweader:

If you put two carriage returns ('\c\c') at the end of your line the server will wait forever. It's not that people prefer newline (or more properly, line feed, but you know that) - it's the proper term. You can return the carriage to the beginning of the line without feeding the paper forward.

And this piece is presumably written by someone who spends most of his time writing functional code, not object-oriented code. Your disdain for functional encapsulation wouldn't get you very far in a C++ or Java shop. Concise, well-written APIs (and the functionality behind them) are stock and trade here. In fact, the project I'm working on right now is explicitly cleaning up an existing set of libraries (for controlling autonomous undersea vehicles) for more general re-use. The functionality already exists, from years of development over many projects. It's my job to help turn the whole mess into something that's accessible and understandable - which I think is really your complaint in the first place. Libraries are wonderful things, if they're well-written and the API is sensible. Sounds like LWP is neither.

-- 21:48, 5 May 2010 (BST)


Columbina:

Actually in Perl the code for a carriage return is \r, and yes, I know the difference. But I work in Unix, where we consider \r an oddity of other, less-well-developed OSes. Since we only ever encounter \r's when we're having to strip them from @#$! alien-format files, we can safely call newlines "carriage returns" without much danger of cosmic retribution. So there.

Again, I am not arguing against libraries per se. For example, speaking as someone who once was foolish enough to write his own date/time handling library, I can now say that the Perl Date.pm is well written, has good documentation, and nice, lucid function names. It is, in short, a sound API. But date libraries really ARE that complex. If you included an entire library/API for one function call that you could write yourself in twenty lines, I would wonder what your motivations were ... unless you were on a team and you wanted to make sure that no one went cowboy and tried to do it themselves and possibly failed miserably. ("Why isn't this code working? It's the same API as everyone else ... oh.") But I never work on a team. The last time I tried to be part of a programming team, my alleged superior nearly ended up at the bottom of the Charles with a slit throat.

If I worked in a Java or C++ shop, I would slit my own throat. Preemptively.

-- 21:57, 5 May 2010 (BST)


Columbina:

Incidentally, I was oversimplifying when I said the server would wait forever. What will actually happen is one of two things:

1. You'll try to send some output that's not legal HTTP header (for example, the content of the web page you intended to send), and the server will fuss at you and the user will see an error 500.

2. You'll never finish sending a header, and your code will end, and when the server falls off the end it will say, "Hey, I never got a full header from you, what's the deal?" ... and the user will see an error 500.

-- 21:59, 5 May 2010 (BST)


Ysabel:

There's lots I want to say, but I will stick to this one:

Libraries are not the cause of cargo cult programming, and if you have to deal with cargo cult programmers (some of us do), libraries make their code MUCH MUCH MUCH easier to untangle and fix later.

Unlibraried cargo cult programmers make truly frightening stuff. I've had to deal with it before.

-- 22:17, 5 May 2010 (BST)


DanLyke:

Is it too pedantic to suggest that for a "Location:" header element you really should include a "Status: 3xx", where "3xx" follows the appropriate semantics for the sort of redirect you want?

As someone who's used an ethernet sniffer to find a bug in router firmware that was causing a SOAP call to fail, I think you're being a bit hard on CGI.pm. Along side having the pretty worthless output stuff (although some of the form handling escape code is useful), it also silently handles the differences in semantics between running as a CGI or in an Apache mod_perl environment. Which can be handy; I've never had to learn mod_perl beyond that and a "BEGIN" block.

And it ain't SOAP.

Having said that, I'm pretty sure my next web app will be written in roughly bare metal C++, because my feeble little brain really doesn't have the neurons to keep up with all these silly newfangled application frameworks and languages.


-- 22:20, 5 May 2010 (BST)


Settsimaksimin:

i see the phrase 'carriage return' and i immediately hear the "ding!" my mother's old manual typewriter used to make.

-- 22:25, 5 May 2010 (BST)


Columbina:

Ysabel: While I agree that an unlibraried cargo cult programmer is far worse than a libraried one, my point was that libraries can encourage cargo cult programming in those who lack the impetus to look deeper. I didn't mean to imply an absolute link.

Dan: My thoughts on mod_perl will have to wait for another day; they are long and nasty. I have fought through many, many misguided mod_perl installations, and in fact, at my current job, I have turned it off and never looked back. If that sounds like a stupid decision, that's because I don't have the time for 6000 words right now to explain why, in my particular environment and situation, it was not. Suffice to say that my stance on mod_perl is akin to my stance on libraries; often useful, but just as often gets in the way of useful.

I have absolutely no use for SOAP or similar frameworks. Again, maybe useful somewhere; for my jobs and my environment, overkill that gets in the way.


-- 23:16, 5 May 2010 (BST)


Columbina:

I was thinking about this on the way home and I realized that I endorse two purposes of libraries, and that nearly all the libraries I like fit one of those two purposes.

The first is a library as the body of code and data which embodies an "object." I am not rigidly opposed to object-oriented code, any more than I am rigidly opposed to libraries; sometimes it can be very useful. In the case of Date.pm, the object is obvious - the data, the physical object as it were, is a single date. You then have a set of functions which set the date, all or in part ("set the year to ..."), which alter or manipulate the date ("advance one month"), or return/access the date, in whole or part, in various ways ("tell me what day of the week the date is" or "tell me this date in this format"). Combining these functions allows you to do very complicated operations intuitively. The Date.pm calls for "go to the date thirty days in the future, back up to the preceding Friday if that date happens to be on a weekend, and tell me what calendar month the date is now in" are only slightly less human-readable than the descriptive text I have just typed.

Similarly, I use a library for a database-handle object every single day of my working life. As it happens, I wrote this one myself because I didn't like the existing DAOs that were lying around, but it is nonetheless a library and I use it constantly. This makes sense not just because there is a set of database functions which seem to belong together (connect, disconnect, select, update, insert, delete, commit, rollback) but because, at least in Perl, the database handle is a persistent data construct which must be explicitly opened and which hangs around until it is explicitly closed (yes, Perl will do automatic closing if you leave it open as you fall off the end, but that's sloppy).

Meanwhile, the problem with CGI.pm or LWP is that there isn't really any sort of "object," yet in each case they insist on trying to pretend there is one. Each of those is basically a library of possible transactions or signals - they're protocols, they're not objects. If they handled themselves like protocols, I might like them more. But I don't want to be tied to an imaginary CGI object which pretends to have persistence when it doesn't (CGI is legendarily, remorselessly, heartbreakingly stateless). I just want the call for "Gimme the data" and another call for "Here's data comin' at ya."

The other library purpose I endorse is a group of essentially standalone, small utility functions grouped by purpose. Dan, you mentioned the escapement code for HTML entities. Yes, those are very useful utility functions. I eventually carved out a set of my own so I wouldn't have to include LWP just to get ten lines of (however useful) code. Where did I put my &escape_url and my &unescape_url functions? Why, in my standard web utility library, of course, which also has useful functions for URL dissection, host name munging, and many other small tasks related to the general purpose of "web/HTTP mongering" which otherwise I'd have to write fifty times in fifty places. As I say, I am not at all opposed to efficiency. If someone had put a Perl library on CPAN that had all those little pesky web-related functions, and ONLY those little pesky web-related functions, then I'd probably be using it right now. But it doesn't exist, because people seem to insist on grouping functions together in a way that doesn't make sense to me, or making me buy into some phony object paradigm just to get to the stuff that's actually useful. I figure that about once a year, I carve a useful function out of someone else's library, and move it to a place where it will actually be useful to me. I am in favor of using good code when I find it. The problem is digging through ten thousand lines of crap to find it.

-- 23:33, 5 May 2010 (BST)


Spc476:

I, for one, would be interested in seeing those 10 lines that implement form/multipart (why yes, I am calling your bluff).

In defense of CGI.pm (not that I use Perl (ack-ptui) but that I like playing Devil's advocate), the code you presented doesn't handle form/multipart; nor does it handle name/value pairs where the name part repeats (or rather, the code overwrites earlier values). You also don't handle the case where you POST to http://www.example.net/blah.pl?foo=1&bar=2. CGI.pm has to handle all those oddball corner cases (to be fair, my own C code (which I've been using since the mid-90s) doesn't handle QUERY_STRING in a POST either).

I think what you are slowly getting to is that all abstractions are leaky ( http://www.joelonsoftware.com/articles/LeakyAbstractions.html ) and that using a library without knowing all the details will come back and bite you.

I, too, tend to avoid external libraries, because there's already too many moving parts to troubleshoot (just today I found out why my custom written syslogd was hanging---I was using sendmail's sendmail instead of Postfix's sendmail when sending email notifications which was working about 99.9% of the time; it only took me four hours to track that particular problem down once I could reproduce it on demand).


-- 00:56, 6 May 2010 (BST)


Columbina:

This sample is more than ten lines but that's because it accepted many different bits of data in the POST request:

# The POST method uses the mixed content type because it may have a
# fileblob to post. This is called from the image browsing page.
elsif ($ENV{'REQUEST_METHOD'} eq 'POST')
{
	my $buffer;
	my $delimiter;
	my @params;

	read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    	
	$buffer =~ /^(\S+)\s+Content-Disposition/;
	$delimiter = $1;
	@params = split($delimiter, $buffer);

	foreach my $param (@params)
	{
		my $paramname;

		# A param that does not begin with Content-Disposition is garbage
		next if ($param !~ /^\s+Content-Disposition:/);
		$param =~ s/^\s+Content-Disposition: form-data; name="([^\"]+)";*\s+//;
		$paramname = $1;
		$paramname = lc($paramname);

		if ($paramname eq 'type')
		{
			$param =~ s/\s*$//;
			$filetype = lc($param);
		}
		elsif ($paramname eq 'stockno')
		{
			$param =~ s/\s*$//;
			$stockno = $param;
		}
		elsif ($paramname eq 'imgfile')
		{
			$param =~ s/\s*$//;
			$filename_from_form = $param;
		}
		elsif ($paramname =~ /^delete_(\w+\.(jpg|gif))/)
		{
			$deletion = $1;
		}
		elsif ($paramname eq 'confirmation')
		{
			# If we send this then we also need to send the filename to delete
			# as 'imgfile' param; this is the second stage of deletion.
			$deletion = "confirmed";
		}
		elsif ($paramname =~ /^select_(\w+\.(jpg|gif))/)
		{
			$filename_from_form = $1;
		}
		# The fileblob requires special measures before hashing
		elsif ($paramname eq 'fileblob')
		{
			$param =~ s/^filename="([^\"]*)"\s+Content-Type: (\S+)\s+//;
			$filename_from_blob = $1;
			$fileblob = $param;
		}
	}
}

That code's several years old and I wouldn't necessarily write it quite that way now, but it's what I had handy.

URLs in the form url?param=1&param=2 are not POSTs, they are GETs, and the first code sample above handles them quite well, thank you.

If I encounter code where a parameter repeats (such as a multiple-selection listbox) I special case it. I also often special-case implied parameters, like when you want to pass in url?12345 and assume that 12345 is a value, not a variable name.

That all said:

I think what you are slowly getting to is that all abstractions are leaky and that using a library without knowing all the details will come back and bite you.

about summarizes my feelings.


-- 01:31, 6 May 2010 (BST)


Kymmz:

I am one of those people who generally skip the codey entries because beyond 1998-era html, code is a mystery to me, but I must say, that was a very interesting entry!

-- 03:54, 6 May 2010 (BST)


Andy:

Someone was telling me about a rant, which I think we'd both be in agreement with, about how programming has changed, which was part of an explanation of why MIT got rid of the old 6.001. Roughly, it said that programming used to be writing good code that did a task, while now programming is putting a little bit of glue between a bunch of ill-designed, poorly-documented, mostly-working libraries that do something close to what you want to do. And that old fogies like us hate this change.

I'm totally with you on libraries. On SNL many years ago, Guido Sarducci did a bit selling a new product he had invented called "Mr. Tea". It was sort of like Mr. Coffee, but for Tea. You just supply a cup, boiling water, and a tea bag, and Mr. Tea does all the rest! Far too many libraries are like Mr. Tea. The work to convert what you have into the format the library wants, and to convert what comes back into the format you want, is more work than the work the library actually does.


-- 19:58, 6 May 2010 (BST)


DanLyke:

Re Andy's "...now programming is putting a little bit of glue between a bunch of ill-designed, poorly-documented, mostly-working libraries that do something close to what you want to do.", I have spent the past few weeks porting a bunch of C# code to Visual C++, and thence to the Macintosh, and am now looking at the iPad as a target.

This. This is .NET, this is OS/X Cocoa, this is iPhone/iPad OS: I just wanna draw some freakin' dots on the goldarned screen, and have the hardware tell me where the user put their mouse or finger. That's all.

Instead I'm constantly in some recent CompSci grad's idea of "the one true way" of architecting my application, and it's super telling that none of those recent (as of when they wrote this stuff) CS grads agrees on what "the one true way" is.

The number of neurons I have devoted to figuring out what some other moron was thinking versus those devoted to actual algorithms to solve what should be the hard bits of all of these problems is wrong.

And there are many days when I yearn for the simplicity of the Apple ][.

-- 18:02, 7 May 2010 (BST)

Comment:

<< older | © 2010 columbina | newer >>

Personal tools
eccentric flower
fiction