Coding for Everybody

rvp

flfederation so naturally several people have written a way to convert code written with braces to Python whitespace

Uh-oh! shades of the IOCCC and the original Bourne-shell a-spectering here.

echo early 1973 pipes > fold -sw 10 > tr a-z A-z >

I thought the original pipe operators were '^'. Maybe I'm mis-remembering.

But pipelines aren't so bad. You can chain stuff together in a left-to-right sequence

Speaking of which, from what I have seen of your language, it's looks like strictly left-associative with no precedence. Am I right?

flfederation I know you won't approve of arrays being 1-based

Not at all, I'll grumble, but, I'll go along with what the language mandates. awk is 1-indexed, for instance.
However, having your language be 1-indexed and the underlying extension language 0-indexed is, to me, a very jagged split. It's something I would avoid.

Another thing is the 100 or so "commands" (like PSET?) you mention. My preference, here also, tends towards minimalism: Few keywords, rest pushed out into libraries.

flfederation Parser generators are vastly overrated.

On the contrary, unless you're a language guru and can hand-roll recursive descent parsers correctly , parser-generators are the only guarantee that you're not shooting yourself in the foot. And, I don't mean that you need yacc to generate your code for you. You can write your own parser from scratch. What I mean is that you should a) have a grammar for your language. and b) run the grammar through yacc (bison is better because it does LR) to iron out any ambiguities. Otherwise, you could end up with a parser which cannot parse valid inputs as defined in their own grammar. This is bad not only for the user, but, the guy who implemented the language as well.

The other problem I have with bespoke languages is their lack of diagnostics. The most they do is, you know, "syntax error at line n; foo unexpected". Producing proper diagnostics, (nay, suggestions for improvement, even, nowadays) are a heck of a lot of work in itself. You can see why I'm no language designer, though I've written plenty of parsers.

flfederation based on the LED one I picked up at a thrift shop for few dollars.

That's also what the person whose C+curses code I based my shell-script on said. This was years ago. I picked up the tarball from sunsite.unc.edu--if you remember that site.

flfederation

shades of the IOCCC and the original Bourne-shell a-spectering here.

It was just a bit of fun. I've never used it.

I thought the original pipe operators were '^'.

I'm very interested in Unix history, I read about this in the past week or so. It was intended I suppose, to work exactly like all other shell direction. Except the > on the right would indicate "write to stdout" or the equivalent, since otherwise the previous > would imply "write to this file" instead of "write to stdin on this program". All context was based on the extra >.

Speaking of which, from what I have seen of your language, it's looks like strictly left-associative with no precedence. Am I right?

Completely.

Not at all, I'll grumble, but, I'll go along with what the language mandates. awk is 1-indexed, for instance.

I hadn't considered that, but yes I suppose it is! My LISPer friend even compared it to awk (he's far too kind).

However, having your language be 1-indexed and the underlying extension language 0-indexed is, to me, a very jagged split. It's something I would avoid.

That's sound design advice. But the language is designed about doing what is considered "optimal" at least by the project's philosophy first, affecting at most about 100 features-- I mean 100 is arbitrary but it's also a 10x10 grid. I used to have a big poster with the entire language on it, just for fun.

Python for example, does split and join in a way that I find really obnoxious (I'm confident it wasn't arbitrary) but here, it works this way:

 p "Hello there everybody" | split p " " | join p " "

I'm really not a fan of the way that Python reverses one of these. So I fixed it to show it could be done. I have a strong preference as well. But I don't take making "improvements" like that lightly, and I do try to avoid them.

It is assumed that most users will not bother using Python, but if they do they will have to learn how it works. I think it's a useful lesson to learn about 0-vs-1 indexing, but a lesson that can be safely put off until after the early stuff.

The goal isn't to teach everything in the language, it's to teach the concepts and then if someone wants to play with the language, actually have a non-useless set of features. Full mastery is considered possible for someone without previous experience.

Another thing is the 100 or so "commands" (like PSET?) you mention. My preference, here also, tends towards minimalism: Few keywords, rest pushed out into libraries.

Yes, though this was modeled on BASIC which has such commands up front. Strictly speaking, there are only two graphics commands in this language which actually produce graphics; line and pset.

Interfacing with libraries is a useful feature (to say the least) but not a core feature. Why make libraries for a language with only 100 commands, when you've already added a way to interface with Python libraries?

I mean any subset of those libraries is going to be vastly inferior to the existing set, and any "native" way of bringing import into the language is only going to save the first step; that native feature would then be several times more complex than any other feature, which defeats the purpose of making it "native" at all. import truly might as well be done with inline Python then.

My feeling is that if you're ready for import or anything like it-- you're probably ready for Python, and have fun. By then they may not wish to use the extension at all. By then they may wish to simply put it aside and learn Python, not use it as an extension, but we are nice enough to offer a choice. By that time they may have found that the shell provides whatever they need.

Think of it like the first stage in a rocket. You can have a single-stage rocket, but most people are going to drop the first stage and move forward.

It's intended to make the ride along the first stage as smooth as possible. There are (deliberately) few instances of it straying far from the implementation language design, changing indexing or parameter order or any naughty things like that. They're only done when it's considered worth the inconsistency. Obviously that comes down to opinion-- as a rule, I would say yours is the good one.

On the contrary, unless you're a language guru and can hand-roll recursive descent parsers correctly , parser-generators are the only guarantee that you're not shooting yourself in the foot.

If I do manage to hit my foot here, it's only with a nerf dart.

I don't dispute the advantages that you get from doing it the way you (and indeed the industry) considers "correct"-- it is less error prone, it is more testable from a comp-sci perspective, I mean if you want proofs, if you want to chase efficiency, the advantages you named are indeed features of doing it "correctly".

But if anybody ever extended the syntax of the language considerably (at which point the smart thing to do would be to rewrite the parser "correctly") then you would have every right to say "I told you so". In that context, you would be 100% right. Perhaps the thing to compare this to is the "Seasoned professional" here: http://www.ariel.com.au/jokes/The_Evolution_of_a_Programmer.html

I'm deliberately taking the "First year in college" approach. I have reasons for doing this other than laziness, and in a way my justification is just as circular as "you should do it correctly so that it's correct".

I wanted to make a language that was so ridiculously simple, someone inexperienced could learn to write a parser for it (and understand the most fundamental ideas behind parsing text) even before they ever cared (or truly need to care) about parser generators.

As a result, someone has already (hand)written a better parser, but they use it in their own implementation of their own very simple language. Perhaps I'd feel bad, except that for that sort of hobby project, handwritten parsers are extremely common. I'm not lowering the bar, I'm simply not raising it.

Otherwise, you could end up with a parser which cannot parse valid inputs as defined in their own grammar. This is bad not only for the user, but, the guy who implemented the language as well.

The biggest problem I've found with it (there's a workaround, and it wasn't a problem in the design originally) is that I bolted on syntax checking later in the game. I'm really pretty happy with it, except for the part that checks for references to unused (otherwise non-existent) identifiers.

This is imperfectly (but effectively) solved with a simple workaround, it's a single line with two tokens when you want to call a function that hasn't yet been defined.

It could also be solved more automatically with more granular checking (ignore errors for UDFs) or it could be perfectly and ideally solved with another transpiler pass (There are two; it could possibly be integrated in the first pass). It simply isn't enough of a bother to fix it so far. I would call it the biggest flaw so far, but it was introduced relatively recently. And it is fixable pretty easily without changing the language:

function a is defined
function a calls function b <- error happens here
function b is defined
function b calls function a <- no problem

Compiling that may give you an error. Alright:

b = 0
function a is defined
function a calls function b <- no problem
function b is defined
function b calls function a <- no problem

So, is it:

Elegant? No.
Simple? Yes.
Entirely and elegantly fixable? Yes!
Worth it? Ehh... updates are very infreqent.
Less trouble to fix than writing a more formal parser? Yes! Depending on how often you do that already.

Would making the parser more formal make this easier to fix? Not necessarily, but either way it's more trouble.

I likely can't formally prove the parser works because it isn't formally specified. But such formality befits a language with so many more users, not a simple toy used to do what is essentially tutoring.

It's not even recursive. I mean you can have recursion in the program you're writing for it, but the parser doesn't need it. It's very, very simple.

The other problem I have with bespoke languages is their lack of diagnostics. The most they do is, you know, "syntax error at line n; foo unexpected". Producing proper diagnostics, (nay, suggestions for improvement, even, nowadays) are a heck of a lot of work in itself.

If a truly serious debugging tool is needed, it's probably better to run it on the Python code that it outputs.

I know I risk sounding arrogant, but you're making arguments for formality and precision; I'm only trying to tell you that it would be overengineering. If one of us is foolish about this, we both know it's not you. But I am arguing why it's neither-- which is to say I'm defending the choice of a sub-par parser.

I could actually get someone to replace it with a better (more formal) one. It simply wouldn't change anything in real terms. At best, it would be like adding a third (inline) wheel to a bicycle.

You can see why I'm no language designer, though I've written plenty of parsers.

And I am confident your parsers are better, I only dispute that a better parser is truly necessary for this application. I don't dispute that they are as good as necessary in a typical application. For now, I will insist that there are applications where there is no serious or practical need for a formally defined parser, and this is almost certainly one of those.

I picked up the tarball from sunsite.unc.edu--if you remember that site.

It comes up in things I've read, it seems to redirect to ibiblio.org now.

It's always possible I'm wrong. It's also worth mentioning that I've thought about these sorts of questions for years, and I thought about them even before other people brought them up as feedback. Which isn't to say I haven't gotten anything from the feedback, but a lot of it is stuff I expected to hear when I was making these decisions.

They were very deliberate, they were thought over at length, they considered the practicality of typical and formal approaches, which I could have gone to the effort of if necessary.

There were iterations of this, but not as many as are even typical for this sort of thing. I'd done toy languages and toy parsers before, but this is the first one that was useful enough for me to actually do real work with. Only incidentally, because if it didn't I would have simply used something else.

All it does is compile to Python. You can't run that code without instantly gaining all the features Python has, for better or worse. The worst thing it can do is output broken Python code, but a user can do that anyway if that's the goal. Maybe that sounds like passing the buck, though I think of it more like having Python as a second line of defence.

rvp

flfederation except for the part that checks for references to unused (otherwise non-existent) identifiers
...
It's not even recursive...

Ah! I see now what you're doing-- it's a line-at-a-time parser. I built one myself ages ago. A friend's niece who was a 1st year CS student wanted an assembler for her parsing classes, and both of us sat down and wrote one in C in 1/2 a day. The parser for this just read lines one at a time, tokenized them, built a symbol table for the variables and location references, emitting op-codes for each line of assembly--no AST, precedence eval, etc; strictly left-to-right. And, at EOF, walk the symbol table entries to finally resolve all the forward references and emit errors on non-resolutions.

It puzzled me when you mentioned each function being supplied with its own variable as a convenience, but, now I see why. What can I say, mate? I've done this sort of thing myself--and for the same reasons: teaching someone the very basics (but with admonishments like: Only try this at home. Do not use in production environment. Do not inflict on friends, etc). But, it isn't something I would lean on too heavily, if you get what I mean. However, if your students are OK with it, then it's fine, I guess.

It's quirky and very kludgey, though.

flfederation

rvp I wanted to mention, as I was just watching a talk by Stephen Bourne, that when they were developing the shell (the new one in 1975-76, after pipes were developed but developed on V6 which didn't have stdio) that they considered yacc and lex, but considered it too heavyweight. Bourne said "it wasn't actually too difficult to parse shell code" and this is basically the attitude I took as well. Couldn't tell you whether they ever redid the shell parser using something like yacc or lex, but it was developed without those.

flfederation

It puzzled me when you mentioned each function being supplied with its own variable as a convenience, but, now I see why. What can I say, mate? I've done this sort of thing myself--and for the same reasons: teaching someone the very basics (but with admonishments like: Only try this at home.

I made a version that doesn't need a variable to start with, but its actually a little more confusing if its optional.

The only ways to create an identifier are as the first token of a shared line, or the second token of other lines (which start with for, forin or function) so variables are easy to locate-- even with grep.

I didn't make these decisions just to be cheap-- I mean I already had Python.

I set out making other dialects with different features and quirks, tailored to the whims of people who thought those things made them easier.

I've also toyed with the idea of teaching coding by helping someone design their own toy language. Though I'd still like t o work with other teachers on that. Such a collaboration is sure to produce a lot of duds, but I think if done a few handfuls of times, the odds of something truly interesting coming out of it are pretty good. I mean collaborating with teachers that don't know a great deal about coding, but do understand the beginner level students they try to teach.

From there, it might also help to work with their students on the design too.

rvp

flfederation I'm familiar with that bit of the early history of Unix...but I suspect that the real reason Mr. Bourne eschewed yacc/lex was that only C had the flexibility of syntax--and a pre-processor--which allowed him to write code in this favourite language: ALGOL. Yacc/lex has a very rigid syntax; that probably put him right off.

« Previous Page