A pluggable compiler and virtual machine in Php. Dogfood yourself with your own plugins!

1. Download the compiler with plugin architecture

 

 

Note: Contrary to what many people seem to believe, the term dogfooding is not derogatory. Follow the link to Wikipedia to see where the term comes from.

 

 

2. Benefits of a plugin architecture

A plugin architecture allows multiple developers to work on the same application and let yet another person assemble their work later on. It allows the plugin developer to understand just his plugin API without having to understand all the details of the entire application.

Source control (git, bazaar, subversion,...) is not a substitute for a pluggable architecture. Source control still leaves lots of serious co-ordination issues unsolved. A developer may still have to understand the entire application even to make the smallest change to it. A plugin architecture, on the contrary, clearly de-lineates the boundaries of particular features and allows the developer to view the remainder of the application through a simplified plugin API. External teams can contribute plugins without co-oordinating explicitly with the core team.

The management of pluggable application projects is also a lot easier. A plugin architecture allows for a workable and realistic subdivision of the project into tasks, that is, to create, modify, or replace plugins; which in turn tremendously simplifies estimating duration, budget, and ETA (delivery date). In practice, however, projects tend to be started with just a scope description or with requirements. Rarely, the project manager focuses on the plugin architecture.

The difference in budget, duration, and ETA between a pluggable and a non-pluggable architecture, diverges exponentially. It is pointless to manage cost and effort if the starting point already, will cause the budget and delivery time to escalate. And yes, source control systems are seriously overrated. Again, they cannot substitute for a good plugin architecture.

 

3. Examples of pluggable architectures

Good examples of pluggable architectures are the popular CMS web site systems. Joomla has an elaborate plugin system and thousands of external teams contributing extensions. Wordpress also has thousands of externally-contributed plugins. Drupal too.The software powering Wikipedia, that is, MediaWiki, is also real pluginzilla. Pluggability is the main reason for an application's lasting success. The existence of thousands of externally contributed plugins is the hallmark for it.

In the realm of fully-fledged web applications, sugarcrm and its extensions, and moodle with its extensions are other nice examples of how pluggability can lead to an entire ecosystem of outside developers improving and extending the application. Co-ordination between developers is done through explicit API specifications and not through source control. It would simply be impossible to achieve that level of co-ordination between such a number of persons just with source control. Source control simply does not scale to that level.

If you think of it, operating systems (OS) are simply programs that accept programs as plugins. In that way, Linux, Windows, Mac OS, IPhone, IPad, and Android are also plugin fiestas. The usefulness of an OS is directly proportional to the availability of such plugins (that is, programs). An application cannot become nor remain leader in its kind, without a plugin structure or without a substantial number of outside developers contributing plugins to it.

 

4. What's wrong with frameworks?

The two main architectural concerns for an application are:

  • 1. Extensibility (plugin architecture)
  • 2. Separation of Concerns (SOC)

Extensibility trumps every other possible concern. For example, there is simply no point in implementing SOC guidelines such as MVC no matter how commendable MVC could be, if the application is not pluggable. Developers really waste a lot of time looking at the wrong things. Frameworks and things like MVC do not matter in the long run. By focusing on things that do not matter developers lose sight of the essential issues. To the extent that frameworks invariably impose their own architecture, frameworks tend to lead to incorrectly pluggable or even unpluggable architectures.

The plugin architecture must reflect how outside developers want to extend the application. Therefore, it must reflect the problem domain for the application. A general-purpose framework will never be able to do that. On the contrary, it will simply impose constraints that remove the flexibility to create the right plugin architecture. There is a very good reason why none of the succesful applications mentioned above uses a general-purpose framework. If they had done so, they would simply never have become succesful.

For the same reason, grand upfront engineering does not work. It amounts to creating a complicated framework without ever letting the users or outside developers have a say about where they want to go. Linus Torvalds summarizes this point succinctly: I think the real issue about adoption of open source is that nobody can really ever “design” a complex system. That’s simply not how things work: people aren’t that smart - nobody is. And what open source allows is to not actually “design” things, but let them evolve, through lots of different pressures in the market, and having the end result just continually improve.

 

5. So, why not pluggable compilers?

LLVM looks like a first attempt at breaking through as a pluggable compiler platform. However, its core libraries are still closed to plugins. Compilers and virtual machines may be considered to be too hard to have a real plugin architecture.

I do not think they are. By the way, operating systems are as hard, or even harder, to build than compilers. This does not mean that outsiders cannot contribute device drivers, applications, and other plugins. The real reason for the lack of pluggability of compilers and virtual machines is that compilers and virtual machines are badly understood. The core mechanisms, that is, the lexer, the parser, and stack machine, are indeed not necessarily trivial. But again, I will demonstrate that you do not need to fully master the compiler's core, to contribute useful plugins to it.

 

6. How not to do it

In the case of GCC, bundling compilers indeed turns them into plugins of the bundle. But then again, who wants a bundle of compilers? Most GNU programs that I know of, have a very unpluggable architecture: GCC, glibc, and so on. They'd better do something about it, or they will all get replaced one day or the other.

These projects pretty much look like one-man shows built by people who are capable of dealing with extreme complexity but whose results are hermetically closed to others even to those working on the same project, let alone to external teams. These guys are very intelligent, but obviously not good at communicating with other people. There seems to be lots of obfuscation going on, premature optimizations, and cryptic, unexplained names everywhere in the source code. The worst problem is the deep pointer arithmetic that shows up everywhere in the source code:


Show/Hidden c code
View source
 
//The horror story ...
z1==(*xmany)->(*f)(x,(*zt)->vt[&xsome][3])->greeting=NULL:1000L;
 

By the way, this expression probably moves the cursor back one position in vi when you are in typing "mode". In another "mode" it exits the editor, with or without saving your changes depending on the configuration settings in "xmany" and "xsome".

Expressions like this one work too. Even though there is no particular requirement in C to write programs in that way it is done very often so. This is what gives C its reputation for being difficult. Inherently, however, there is nothing difficult about C, especially if you use a garbage collector and a small additional library that gives you string buffers, array lists and hash tables. Glibc actually implements those too, but again with difficult-to-use APIs.

Another secret behind the ease of scripting is that every data structure can easily be represented in memory as a tree of lists and hash tables, like in JSON; with a hash table just being a particular efficient type of list which allows for fast lookup by key. So, the whole world consists of lists of lists. The fact that this view works rather well, is one of the many reasons for JSON's popularity and for scripting's popularity in general. As Paul Graham said, progress in the field of programming languages bring us closer and closer to LISP, and one day, we will finally understand what the point actually was.

For efficiency reasons, however, it still makes quite a bit of sense to model a specialized data structure, and create the functions that operate on it, in C. This need will not disappear any time soon. You do not need to write C like in the example above, but you definitely can. That is why you will often run into C programs that were written like that. Remember, the guy who wrote it like that, did so, only because he can and not because he had to. He is also very proud about the fact that nobody else understands his program, and that nobody else wants to maintain it.

The GNU projects are all a bit like that. By making available important programs early on in the life of free software, however, the GNU programs have important historical merits. Richard Stallman has truly earned his place in the annals of the history of computer science, as the father of the GPL, that is, the father of the nation, the pater patriae. His true glory is not to have written 40 programs...GCC's unreadable source code and unpluggability will erase the memory of so many victories. ... But...what will live forever, is his GPL.

Even today, GNU remains a closed shop: no forums, no remarks, no user feedback. No addressing of user concerns, while the source code is written in extreme cryptographical style.

The easiest solution to the problem that organizations started implementing for the problem of unreadable source code is to require the use of programming languages in which it is not easy to write unreadable source code; also not by the previous guy on the project who has left now. Not using C has pretty much become an issue of damage control. The same holds true for not using Perl. Even though Php is not fundamentally more readable than C or Perl, removing the capability of doing pointer arithmetic and other such abusable features helps cutting short attempts to write things that nobody wants to read. By the way, if your organization needs to improve its source code management infrastructure and procedures for Php applications, feel free to contact me.

The problem is, in fact, much more one of source code review than anything else. It is, indeed, a managerial issue. But then again, in order to institute a budget for source code review people would first have to understand why it is needed. They usually come to understand this much too late, or never at all.

A counterexample of  C source code that is highly readable, highly pluggable, and highly maintainable is the linux kernel. The project manager, Linus Torvalds, is clearly everything that the GNU project managers are not. This alone, would already be enough to explain why linux has become so popular. Therefore, as far as I am concerned, it is not GNU/linux. It could be GPL/linux, or simply linux, but definitely not GNU/linux. By the way, how could anybody exercise their freedom 1 rights, if the source code is totally unpluggable and written in a cryptographical, outer-space, Martian alphabet? When the bytecode or machine code generated for a program is more readable than the source code itself, I tend to think to myself: Houston we have a problem.

 

7. Why Php?

Php is considered to be an easy programming language. Like any scripting language, Php is just flexible, even if it is not as flexible as Javascript. I pretty much adhere to John Ousterhout's view, as he wrote in his seminal paper, Scripting: Higher Level Programming for the 21st Century.

If we over-simplify Ousterhout's view there are only two languages: C and scripting. We should not write a program in C if we could also write it in scripting unless there are very good reasons why we must write the program in C. The programming language 'scripting' can be general-purpose (e.g. Php, Javascript), database-only (e.g. SQL). data-only (XPath, BNF, regular expressions, shell (e.g. Bash), or specialized in any other domain. The parts that need to be blazingly fast must be done in C while the remainder must be done in scripting.

Do not start me on why I do not consider Java, C++, or C# to be acceptable options for a programming language. A programming language is either blazingly fast or else it is extremely malleable. To cut a long story short, Java, C++, and C# are neither of both. They are the opposite of both. I am now fighting with the desire to start a long rant on why Java, C++, and C# are so lousy, but I think I am still able to restrain myself, but only just.

 

Java is as dumb as its original promotional video:

 

 

And C# is not one ounce better.

Does a compiler or virtual machine need to be written in C? At some point, time permitting, I would like to be ready with a working example to show that it is not true. Only a very small part of a virtual machine needs to be done in C, while the compiler itself can definitely be done entirely in scripting.

We are almost back to the point at which McCarty demonstrated on paper how easy and convenient it is to implement a LISP compiler in LISP. Php is on the road to LISP but is not there yet. Javascript is already getting rather close. We are almost capable of doing what McCarthy did in 1958. We are slowly but surely catching up with him. I believe that some day we will finally get it.

So, yes, my point of view is that 90% of a compiler and a virtual machine should be implemented in their own scripting language. Only 10% of the virtual machine should be written in C. In the part to be done in C I want to demonstrate clearly that C can be as readable as Php or Javascript. Really, it will support my argument that if a C program is unreadable, it was done so on purpose. The mentality in the C programmers' community is simply like that: the more unreadable, the more brownie points.

I am currently using Php as a substitute for "their own scripting language". The demo programming language I am creating should look very much like Php even if I have something closer to Javascript in mind.

 

8. Does Object-Oriented Programming (OOP) contribute to achieving a plugin infrastructure?

The short answer is: no. Most object-oriented programming languages, including Php, are not capable of adding methods to a class, on the fly. Imagine we have a class MyClass1:


Show/Hidden php code
View source
 
class MyClass1
{
        function myMethod1()
        {
        }
}
 

We can instantiate the class and use in the following way:


Show/Hidden php code
View source
 
myObject=new MyClass();
myObject->myMethod1();
 

A plugin developer wants to add myMethod2:


Show/Hidden php code
View source
 
class MyClass2 extends MyClass1
{
        function myMethod2()
        {
        }
}
 

Another plugin developer wants to add myMethod3:


Show/Hidden php code
View source
 
class MyClass3 extends MyClass1
{
        function myMethod3()
        {
        }
}
 

Now you must choose: You can either use MyClass2 or MyClass3 in your program but not both. This is not how plugins should work. You should be able to use both. Inheritance can, therefore, not be used as a plugin mechanism.

Is there another way to add a method to a class such as: MyClass1->addMethod()? No. Only the experimental RunKit extension can do it. There are other (rather ugly) workarounds. Php is simply not yet as flexible as javascript. But then again, OOP is not a goal in itself. It is just one possible way of making programming easier. Internally the expression $myObject->myMethod(args) amounts to calling the function MyClass1_MyMethod($myObject,args). The only realistic solution to solve the problem is to do the same manually by defining an extension function:


Show/Hidden php code
View source
 
myClass1Method2($myObject,$otherArgs) { }
 

Therefore, under the existing rules of OOP not everything can be a class or an object. Plugin functions that extend an existing class cannot be methods. Impossible. If we carefully look at how $myObject->myMethod($args) truly works under the hood it amounts to doing:


Show/Hidden php code
View source
 
$myClass=lookupClass($myObject);
$function=OOLookup($myClass,$myMethod);
$function(myObject,$args);
 

Everything in OOP, therefore, revolves around the OOLookup() function. The OOLookup() function is a function that returns a function. It is a first-class function. It is obvious that OOLookup() is just one particular way to look up a function. We could conceivably use any first-class function to do this. A first-class function is a function that can take functions as arguments and return a function as a result. An example of another first-class lookup function is the Currying lookup function. Saying that the only interesting first-class lookup function is OOLookup(), amounts to seeing the entire world as consisting of only nails with OOP being the proverbial hammer. Therefore, saying that everything is an object is not pure but dumb and very limiting. From there, a first explanation for the dumbness of the Java and C# programming languages.

Relaxing the rules for OOP to allow for adding methods dynamically would be useful. But then again, it would not help that much. The true power lies in the capacity to implement any lookup function that you can think of. We simply need first-class functions for that. And this is exactly where javascript shines brightly. It also explains the miracle of JQuery.

 

9. Why write a compiler or a virtual machine?

Concerning pluggability, there is nothing particularly special about compilers or virtual machines besides the fact that all programmers -- who routinely create programs -- use these programs but hardly ever create one. So, most programmers cannot program the one single program that they need to do their work. They usually only know how to create programs to assist other people in their work, but not to assist themselves. When you think of it, this is a silly situation, isn't it? So, let's dogfood ourselves and build a pluggable compiler and virtual machine.

In my next blog post I will describe the plugin architecture for the pluggable compiler and virtual machine that you can already download at the top of this blog post.