Rainforest

Sankuru

Implementeren, customiseren, uitbreiden, en troubleshooten van Joomla/Virtuemart

Views: 518

Wij helpen met ...

Virtuemart
Joomfish
Andere extensies
SocialTwist Tell-a-Friend

Automatische vertaling

English Arabic Chinese (Simplified) German Japanese Russian Spanish



Hergebruik open source

Datgene wat U nodig hebt, bestaat vaak al, en dekt 80% van Uw behoeften. Wij zorgen voor de ontbrekende 20%.

Gratis offerte

Vraag vandaag nog gratis een offerte aan.

Why statically typing defeats its purpose PDF Afdrukken E-mail
Waardering: / 0
SlechtZeer goed 
Geschreven door erik   
dinsdag 10 november 2009 07:22
There are no translations available.

One could think that the debate pitting statically typed languages against dynamically typed scripting languages is merely a silly theoretical exercise, until one takes into account how much money goes around in software development and maintenance. One paper which mentions dollar amounts, says:

The National Institute of Standards and Technology (NIST) estimates that software errors cost the U.S. economy $59.5 billion a year and software sales accounted for $180 billion.[...]Boeing spent roughly $800 million on software for the 777, and they might need to spend five times that on the 787.

If there are is substantial difference in software quality between  software built using statically typed languages and dynamically typed scripting languages, the consequences can be measured in the billions, if not trillions, on a worldwide basis.

 

Quality criteria for a software

  • Correctness. This is the main criterion that trumps all others. There is no point in comparing a correctly working system to a system containing multiple bugs. We invariably prefer the correctly working system.  Unfortunately, determining software correctness, is non-trivial endeavour. In spite of multiple research efforts, there is still no effective mathematical model available to model software correctness in order to proceed to effective verification of software.
  • Performance. Ceteris paribus, we prefer software systems that run faster and consume fewer resources.
  • Simplicity. Meaning: source code understandability. The number one cause inducing errors in software, is the fact that the person modifying it, does not sufficiently understand its source code, including source code written by himself. Even though there are various measures for complexity, they can not readily be applied to source code, because issue is mostly cognitive: it is the perceived complexity that matters, which often eludes any measure of objective complexity.
  • Flexibility. We are continuously modifying software to fit changed or new circumstances. This can be easy or can be hard to achieve. The easier we can modify software, the better.

 

Quality of programming languages

Since any Turing-complete language can emit exactly the same behaviour as any other Turing-complete language, if any programming problem can be programmed correctly in one Turing-complete language, it can also be programmed correctly in any other Turing-complete language.

Therefore, any statement saying about Turing-complete languages "This particular language is more powerful than other languages" is necessarily wrong. It is never true.

However, the one Turing-complete language can still be simpler than the other.

Grammar simplicity. The smaller the grammar for the language, the more likely people will understand source code written in it. The grammar of a programming language can usually be captured in Backus-Naur form (BNF). Ceteris paribus, we favour programming languages with a smaller BNF representation.

Source code simplicity. The fewer the number of statements required to represent a program, without resorting to anonymous nested containers, while fully naming program constructs explicitly and unambiguously, the more people will understand it.

For example, the following contruct is not fully named:

function f(items)
{
print_header();
foreach(item in items)
{
print_item(item);
}
print_footer();
}

It contains an unnamed anonymous foreach block. Even though explicitly naming an anonymous block, may increase the number of statements required to represent the program, it also reduces (cognitive) complexity, instead of increasing it:

function f(items)
{
print_header();
print_body(items);
print_footer();
}

function print_body(items)
{
foreach(item in items)
{
print_item(item);
}
}

Therefore, before comparing the number of statements required to represent a program in two different languages, both representations must first fully name all their anonymous blocks.

 

Scripting languages require fewer statements to represent the same program

When considering the general object-oriented method call:

y=(find_function(f,x1)) (x1,x2,...,xn)

We can see that there are up to n function argument types involved, and one return type. In absence of the use of generics (templates) and given m data types in the software system, a statically-typed language will need to implement the function f up to m^(n+1) times:

<type n+1>=(find_function(f,<type 1>)) (<type 1>,<type 2>,...,<type n>)

In absence of generics, the one dynamically-typed method will be represented by up to m^(n+1) statically-typed methods. Therefore, in terms of source code complexity, statically-typed program representations are m^(n+1) times more complex than dynamically-typed program representations.

 

Exponentially growing complexity in Java and C#

Before the introduction of generics in Java, and C# the only viable solution to address the issue of runaway and exponentially growing complexity, resulting from statically typing, was to use the root object type for all types involved in the function:

object=(find_function(f,object)) (object,object,...,object)

This is exactly what dynamically-typed programming languages do.

Consequently, in absence of generics, statically typing will lead to either exponentially runaway complexity or else to introducing dynamically typing.

In other words, as soon as a non-trivial algorithm needs to be represented in source code, statically-typed programming languages such as Java and C# have no other option than switching to dynamically-typed programming.

Ousterhout writes: It might seem that the typeless nature of scripting languages could allow errors to go undetected, but in practice scripting languages are just as safe as system programming languages.

There is, of course, a reason for this. Excluding the simplest programs, statically-typed languages resort to dynamically typing anyway. So, how could there be a difference in type safety between dynamically and statically typed languages?

 

Generics do not solve the problem

Of course, the Java and C# world refused to concede defeat.

The fact that any non-trivial program in Java and C# had to resort to dynamically typing, invalidated the very reason for existence of these languages. If the Java and C# programmers would end up scripting anyway, they could as well use a scripting language, instead of wasting their time statically typing every variable in their programs.

Generics were introduced in Java and C# in order to stop the practice of having the write programs in terms of the root object and therefore, resort to dynamically typing.

However, generics have lead to runaway complexity in the grammars for Java and C#. These languages have become substantially more difficult to use. In addition to grammar bloat, generics lead to code bloat. Furthermore, debugging a non-trivial generic program is notoriously difficult, and can reasonably be considered beyond the capabilities of the average Java and C# programmer:

There are three primary drawbacks to the use of templates: compiler support, poor error messages, and code bloat.[...]So the indiscriminate use of templates can lead to code bloat, resulting in excessively large executables.[...]Almost all compilers produce confusing, long, or sometimes unhelpful error messages when errors are detected in code that uses templates. This can make templates difficult to develop.[...]Unfortunately, compilers historically generate somewhat esoteric, long, and unhelpful error messages for this sort of error.[...]Ensuring that a certain object adheres to a method protocol can alleviate this issue.

Approximately everybody who uses generics, agrees that the approach is in need of serious fixing, while it has become obvious that the existing implementations are too complex already for the average programmer to use correctly.

The java and C# worlds are now sitting between a rock and a hard plate. Expanding the language grammar further, is a non-starter. At the same time, leaving the numerous show-stopping issues with generics unfixed, is not an option either.

 

The use of C versus scripting

In order to use a scripting language, we first need a scripting engine. Therefore, we need to program the scripting engine in a language that does not need one itself, such as C. Consequently, we will need always need C, if only, to create the scripting engine that will allow us to develop our programs in scripting language, instead of C.

Note that a polymorphic C program is not more efficient than a script. As soon as a C program requires flexibility in the following style:

object=(find_function(f,object)) (object,object,...,object)

it will be as slow as a script. The performance penalty is not the result of the language used, but the result of the need for polymorphic flexibility. Therefore, the speed of such polymorphic C program will not be any better than the speed of a corresponding script, but just harder to read and maintain.

Even the linux kernel is subject to this trend. The tremendous success in re-exposing internal kernel interfaces to user space, such as in the fuse project, is to an important extent motivated by the desire to implement parts of the kernel in scripting language. Re-exposing the internal kernel interfaces for other device drivers would make it possible to implement many device drivers in scripting language. This would undoubtedly take much of the cost out of developing them. Even the linux kernel increasingly has strong polymorphic needs, that can only be satisfied through scripting.

The scripting engine itself consists of four major parts:

  • A lexer groups source code characters in source code into words, also called "tokens".
  • A parser groups tokens into sentence trees, which ultimately constitute a single program tree (AST or Abstract Syntax Tree).
  • A code generator will translate the program tree into bytecode or machine language.
  • The optional virtual machine will execute the bytecode.
  • A standard library which gives full access to the kernel's services, through the libc library or even directly.

 

The lexer and parser are simply programs, navigating a lexer and a parser Turing table. The tables themselves are matrices containing essentially numbers.

The lexer-parser generator can easily be re-implemented in scripting. Performance of the lexer-parser depends on the correct generation of the numbers in their tables, and not on the speed of the generator.

The result of their work is an AST, which is tree of embedded hashes. The hash, (associative array) is an essential data structure, which is usually implemented in C. Implementing the hash class in scripting language does not make much sense, because the associative array is usually a prerequisite built-in data structure for scripting languages.

The code generator may or may not be essentially polymorphic. If it is, it could be re-implemented in scripting language, without associated performance penalty. It would, however, need to use a pre-generated version of its own code. The virtual machine would probably not be able to execute its own bytecode. However, it could consist of a pre-generated version of the translation of its own bytecode into machine language.

Consequently, it would definitely be possible to implement the scripting engine's code generator and virtual machine in scripting language. But then again, it would only be meaningful to do so, if its code is highly polymorphic.

The standard scripting library can usually be implemented in the scripting language itself. Most of the resources are consumed by the kernel functions themselves, and relatively little by the wrapper functions calling them.

The better the construction of the scripting engine is optimized for performance, the more it becomes possible to re-implement parts of it in the scripting language itself, without noticeable performance penalties.

The same holds true for other native libraries. The more efficient the scripting engine and the more polymorphic the library, the more it makes sense to re-implement it, or parts of it, in scripting language.

Therefore, it is the need for polymorphic flexibility, in conjunction with the availability of faster scripting engines, that will make it increasingly desirable to move system libraries, or parts of them, from native language to scripting languages.

 

More scripting and less C

One issue, holding this process back, is the incompatibility between scripting engines as well as the bloat and performance penalty, incurred when using more than one scripting engine simultaneously.

A typical contemporary linux system contains one of more shell scripting engines, of which at least one bash implementation, a python, a perl, a php, and several javascript engines, and usually also at least one Java virtual machine, each with essentially yet another re-implementation of the standard system libraries, in order to expose the kernel mechanisms as well as other application libraries typically used in scripting.

In additional to that, quite a few large applications implement their own incompatible scripting engine with associated libraries.

How many new, incompatible scripting engines do we need, before all our needs for new, incompatible scripting engines are completely satisfied?

I estimate that the footprint of a typical linux system currently consists for at least 75% in efforts that are duplicated in one way or another. This massive duplication is almost invariably the result of incompatibilities between programming languages and essential libraries.

Therefore, a project like Parrot, therefore, does make sense. But then again, we are still far away from a minimal and common scripting engine library in which only the bare scripting engine essentials are implemented in  unpolymorphic C, and which can truly be reused by any scripting system. In my opinion, Parrot is too ambitious, making too many debatable choices, for other scripting engine teams, other than the one working on Perl, to consider porting their language to Parrot.

Given the difficulty to create common scripting engines that are optimal for all scripting languages, it is more likely that one dominant scripting language will emerge, pushing the other ones sideways.

Since Javascript has a near-monopoly for scripting in the ubiquitous browser, and combined with the advantage of being a highly flexible prototype-based language, Javascript stands the best chance to emerge as the winner and, in the future, to displace python, perl, php, and eventually even shell scripting.

The massive performance improvements in the Google v8 and Mozilla tracemonkey Javascript engine implementations are now reviving intense interest to use Javascript for more than just the browser. The impetus for server-side javascript and desktop javascript will undoubtedly continue.

Conclusion

The proportion of native C code running inside the typical linux system will keep decreasing, along with the emergence of faster scripting engines. However, native C code will never completely disappear.

At the same time, I do not see any legitimate place for C++, Java, or C# in well-designed systems. As argued above, these languages are truly evolutionary dead ends.

The system of the future has its essential mechanisms written in unpolymorphic C, while implementing everything else increasingly in Javascript.

 


blog comments powered by Disqus
 
 
Joomla 1.5 Templates by Joomlashack