Wednesday, July 29, 2015

Multilingual Mania

I have written parts of a project in two or more programming languages for as long as I can remember. The decision to use two or more languages is motivated by my desire to use the best tool for each job, and the nature of the work that has been my stock in trade for more than 25 years. That is how long I have been marshaling applications published by various software companies into software systems that appear to the casual observer, and to their intended users, to be one complete application.

Benefits of Integrated Applications

This type of integrated application has many benefits for its users, and for the people who pay for them.

Benefits for Users

  • There are fewer applications to learn.
  • Everything is entered once, and distributed behind the scenes to other parts of the application that need it.
  • Everything is entered once, and distributed behind the scenes to other parts of the application that need it.
  • Entering everything once only saves time for input and editing.
  • Since everything is entered just once, there is only one chance to make an input error, and needs checking just onece.
  • Subsequent processes are easier to set up and run.
  • No time is wasted locating and loading inputs for subsequent steps, because the system knows how to find them.
  • The system can prevent processing the same data twice, or failing to process a batch.

Benefits for Managers

  • Assembling a custom application from purchased off the shelf software is usually much less expensive than writing one from scratch.
  • Each part performs the task at which it excels. I think it is safe to say that no software is best at everything.
  • Components can be replaced when a better product is found for its job. Frequently, such replacements happen behind the scenes, and end users are unaware of the change.
  • The time from conception to deployment is significantly shorter (usually measured in years).
  • Significant maintenance costs are shared among users of the purchased software, so that everyone gets high quality updates for less.
  • Some of the burden of correcting design and programming errors falls on the shoulders of the publishers, relieving your IT budget to some extent.
Given these advantages, why would anybody write software from scratch? There are many reasons, but an important one is that it wasn’t always this way. The first computer software was written from scratch because there were no commercial packages. Over the last 25 years or so, demand for an increasing number of specialty applications has inspired companies to build a business around a commercial package to meet it. Contributing to this development is the steady improvement in the quality and robustness of the tools available to programmers. These improvements are the result of more capable hardware and a better understanding of the needs of programmers.

One Project, Many Languages

Just as a carpenter needs a variety of tools and materials to build a house, so it is with programmers and software. While the distinction is a bit more blurry, I like to think of the languages as materials and the editors, compilers, interpreters, and so forth as the tools. The principle is pretty much the same, though; when you build a kitchen, you use different materials for the counter top. Cabinets, pantry, and floor, just as you use different programming languages to write the parts of a Web site that run on the Web server and the parts that run in the visitor’s Web browser.
Just as using two or more spoken languages presents challenges, so it is in software, although there are a few differences.

Syntax

Every language has grammar and syntax rules, whether it is spoken, written, or fed into a computer as operating instructions.
Thankfully, like the written languages upon which they were modeled, programming languages fall into groups that share common elements of grammar, syntax, and even vocabulary. Hence, just as a person who speaks German can figure out a lot of Danish, Dutch, Flemish, Norwegian, and Swedish text, or an Italian speaker can grasp Romanian and Spanish, so a person who knows C or C++ has relatively little trouble with JavaScript, Perl, PHP, and Python. While the similarities won’t make a programmer an overnight expert in the new language, they give him or her a head start.

Objects and their Names

Usually, a more significant hurdle than the syntax, vocabulary, or grammar of a new language is learning the names of the objects that inhabit the application domain, and how they are related. The usual term for this is Object Model. The Object Model is a map of the territory. Whether you use JavaScript, VBScript, PerlScript, or some other scripting language to manipulate them, the objects that you manipulate inside a Web browser and how they are related to one another is similar, and they usually have the same names: Windows, Text Boxes, List Boxes, Combo Boxes, Buttons, Forms, Frames, Toolbars, etc. Likewise, when you manipulate a Microsoft Word document, you work with Documents, Stories, Sentences, Tables, Characters, and so forth, whether you use Visual Basic for Applications (VBA), C#, or C++ to do the manipulating.
This brings us to the point of this essay, the naming of instances of these objects. There are two major aspects of object naming.
  • Naming Convention: A naming convention is a generalized plan for the naming of objects and other variables manipulated by a program.
  • Naming Scheme: A naming scheme is a plan, ideally based on standard practices observed by the organization, for naming objects of like kind, such as different Text Box objects in a form, or Person objects in the business logic of a data base driven application.
Naming schemes are beyond the scope of this article; they deserve an article of their own.

Naming Conventions: Benefits and Limitations

While there is nothing magic about naming conventions, they play an important role in communicating vital information about the moving parts of a software application. Nevertheless, it is essential to understand what a set of naming conventions is, and, equally significant, what they are not.

What They Are

The best way to characterize a naming convention is by listing and briefly describing its key features, which can be summarized in one word, ACES.
  • Adaptable: It is easy to adapt it to the requirements of your environment. Any number of circumstances might require adaptation; one good example is that the primary spoken language of your group is not English.
  • Consistent: It is internally consistent. For example it differentiates individual items of a kind from collections or arrays of items of the same kind consistently for all types of objects, so that a quick glance at a name tells you whether it refers to an individual object, a collection of like objects, or an array of like objects.
  • Extensible: For most projects, I found that assigning dedicated tags to the objects in your data model was more trouble than it was worth. However, that general statement in no way prohibits extending the conventions by defining a set of tags for a group of closely related objects that play a significant role in your application domain. For example, if your application revolves around airplane parts, it might make sense to create tags to identify large classes of parts, such as engine parts or airframe parts.
  • Simple: Consistent application of simple naming conventions contributes far more than complex conventions that go unused because they are too complex for daily use.

What They Are Not

One word, AIMS, nicely summarizes what naming conventions are not.
  • Absolute: Naming conventions are recommendations to guide you, not rules to slavishly follow.
  • Inflexible: It is appropriate from time to time to deviate from a naming convention. For example, while I advocate designating function arguments with a prefix, a  public interface that is distributed only in binary form should almost certainly dispense with them.
  • Mechanical Applying a naming convention should almost never be done mechanically. This goes double for retroactive application to an established code base. If you choose to do so, use a proper refactoring tool, such as those built into Visual Studio. Resist with all your strength the temptation to use the Find and Replace feature of your text editor, which cannot distinguish symbol names that should be changed from comments that use the same text in ordinary words that may be better left unchanged.
  • Scheme: Naming conventions and a naming scheme are symbiotic pairs; they work together to produce a consistent set of names for the objects that store, transport, and transform your data.

What Do Naming Conventions Encode

A naming convention encodes attributes of a variable that affect how code interacts with it, so that you can focus on the work to be done, without constantly referring to the variable definitions. Table 1 (below) gives examples of the attributes typically encoded into a naming convention, accompanied by a brief explanation of how each affects the code.
Table 1 gives examples of attributes commonly encoded by a naming convention.
Basic Type The most important single bit of information you need to know about a variable is its type, which determines what operations may legally refer to it, other types to which it is functionally equivalent, and the types into which it can be transformed (cast). If the type is an object, it identifies the code, called methods, that is attached to it, that enable the variable to “do” things.
Early programming languages, such as FORTRAN and the original versions of BASIC, appended a special character to a name for this purpose. That worked well when there were only a handful of variable types, but it is insufficient for modern applications that employ dozens of types. Even FORTRAN IV had begun to outgrow the limited set of suffixes when it added double precision and complex numbers to its list of basic types.
Number In this context, number has a slightly different meaning than it does in the grammar of spoken languages. Number differentiates singular from plural, but it goes a tad further. While English, and most other spoken languages, stop at differentiating one versus many, programming needs a more fine grained differentiation of the notion of plural (many) things.
  • Arrays: An array of things has upper and lower bounds and a count, usually calld its size or length. Individual members, called elements, have a subscript that uniquely identifies it.
    • Arrays have a fixed size, though most programming languages support enlarging an array by increasing its upper bound.
    • New items can be inserted into an array at any position, and in any order.
    • The elements of an array are usually enumerated by incrementing or decrementing the subscript until a bound is reached.
  • Collections: A collection is a more loosely organized group of like objects. The essential difference between an array of objects and a collection of them is that collections are dynamic; a collection grows as needed to accommodate new items being added to it, although it may have an initial capacity, which the machine treats as an estimate of how many items you expect to put into it.
    • While a collection may have memory reserved for it based on an estimate of its ultimate size, the estimate is optional, since, by default, new items are appended to the end.
    • The usual method of adding items to a collection is its Add or Append method, although some collections permit items to be inserted into the middle. New items added by the default method go at the end of the list.
    • The members of a collection are usually enumerated by means of an Enumerator object that uses its knowledge about the contents of the associated collection to return its members one by one, typically in the order in which each was appended.
Scope or Lifetime Scope and lifetime are synonyms that indicate the visibility of the variable.
  • Local: The variable and its value exist for the lifetime of the routine in which it is defined. When the routine exits, the variable ceases to exist.
  • Parameter or Argument: The variable belongs to the argument list of the routine in which it is declared. Its value and location existed before the routine was called (within the scope of the calling routine), and it continues to exist when the routine returns, unless the call is the last statement of the calling routine.
  • Return Value: The variable is created by the routine that defines it, but its value is returned to the calling routine as the value of the function procedure. Though its value survives the routine, the location where it is stored is usually inaccessible. By convention, most functions return a value by storing it in a CPU register, from which the calling routine promptly retrieves it.
  • Class or Module: The variable was defined in a class or module, but outside all of its routines. When the module is a class, its variables exist for the lifetime of an instance of the class; variables defined in ordinary modules (for example, a Visual Basic module) exist for the lifetime of the application.
  • Static: A static variable is a special kind of class variable, which is marked as Static, conferring the lifetime of a Visual Basic module variable.
  • Global or Application: A global or application variable is defined outside the boundaries of all of its routines and modules, endowing it with a lifetime of the entire run of the application, from the time it loads into memory until it terminates and is unloaded. The usual method of making a variable global in scope is by marking it as Public, except in C and C++, which treat any variable defined and initialized outside of a function as public, unless it is marked otherwise.
Usage Usage is a specialized attribute, primarily applied to array and collection indices to indicate whether they store the First, Last, or Current position in the associated array or collection. Unlike the other attributes discussed here, usage is usually conveyed by a suffix.
A set of naming conventions typically permits all four properties, or any combination of them, to be applied to a variable.

Why Encode All This Into the Name

Since every bit of this information is already encoded into the definition, why repeat it in the variable name, you may ask. The one-word answer is accessibility. Though most of these attributes are part of the definition, the object of this exercise is to put the information where you need it, in the variable name. For a local variable, parameter, or return value, well written code usually makes the definition fairly accessible. Nevertheless, there are situnations in which the definition is less accessible.
  • Occasionally, a really long switch or Select Case block is unavoidable, leaving the definition hundreds of lines, or several pages, away from the place where it is used.
  • Class variables, especially protected variables defined by a base class, are usually defined in a different module, which may belong to another project, and is, therefore, relatively inaccessible. The same holds for public (global) variables owned by the module that defines the main entry point routine.
  • When you are working from a code listing, such as during a code review, the hints that would be available from IntelliSense if you were working in your code editor, are unavailable. The same is true when you are working with side by side listings in a Diff tool, such as IDM UltraCompare or the difference viewer of your source code control system.

The Reddick-Gray Unified Naming Conventions, version 2.0

The Reddick-Gray Unified Naming Conventions, available at http://www.wizardwrx.com/RGUNC_Resources/, are the result of over 20 years of experience developing multilingual software. When I started extending the Reddick VBA Naming Conventions to support the multilingual applications that I was developing in the middle 1990’s, nobody was talking about multilingual development, because it was comparatively rare. There might be a bit of JavaScript in your Web applications, but that code represented islands dotting a sea of static HTML, even if that HTML was being generated on the fly by a Web server. The code consisted of a handful of lines that performed a very specific task, and it was treated as a black box.
Since then, the applications have grown bigger, more complex and dynamic, and must render as nicely on the screen of an Android or Apple phone, any number of Android and IOS tablets of various sizes and shapes, and, oh, by the way, a 24” computer monitor that has an aspect ratio of 4:3, which may be running at one of a number of screen resolutions, typically starting at 1024 by 768 and going up. Look at any current job description on Dice, Monster, Stack Overflow, or anywhere else, and it is evident that multilingual programming has become the norm.
Multilingual programming is here to stay. With Java, Sun Microsystems tried to implement one language that could do it all, and run on anything. They failed, in large measure because the lowest common denominator presentation layer was too low to be usable. Microsoft tried again with Silverlight, aiming a bit higher, but the outcome was pretty much the same, though for a different reason. Nobody uses Silverlight or takes it seriously. Now comes HTML5, which makes no such one size fits all claims. The new mantra is “mobile first,” tacitly acknowledging that the small screen and the desktop need separate presentation layeers.
This realization brings the concept of the n-tier application into sharp relief, and gave rise to such “novel” concepts as the Model-View-Controller application model, in which the data model, data access, presentation, and business logic are four distinct layers, implemented in at least two programming languages (e. g., model, data layer, and controller in C#, and viewer in some variation of JavaScript). I call MVC “novel” because it is a logical evolution of the old Client/Server model that was all the rage in the early 1990’s.
The needs of multilingual programming include uniform naming conventions applicable to all programming languages. Since programming languages are more alike than different, so should the accompanying variable naming conventions.
The general objective of the RGUNC is to define a single set of variable naming conventions that can be applied with little or no modification to any programming language and application domain. This required one huge concession, about which I have said nothing up to now: most scripting languages are loosely typed. This means that a huge number of the programming languages that appear in today’s multilingual mix are languages that play fast and loose with variable types. This affects the design of a convention in two ways.
  1. The role played by variable types in their application is diminished, though not eliminated.
  2. Scope and lifetime of variables is usually unaffected, except in those increasingly rare scripting languages that play fast and loose with variable scope.
The roles played by variable number and Usage are the same.

Diminished Role of Variable Type

Most scripting languages make no pretense about supporting variable types, and the interpreters that implement them perform little, if any, type checking. This statement is literally true with respect to the primitive\ types: integer, floating point, and string. Although the script engine performs no type checking on any of its variables, the objects, themselves, never relax their standards. Pass a Worksheet object that came from Microsoft Excel to a method that expects a Rowset object that came from SQL Server, and watch how fast your application crashes. But it will be the object that initiates the crash, not the script interpreter, which is left to clean up the mess, if it can.
Accordingly, scripting languages require type tracking that is more relaxed in some respects, but every bit as strict in others. These conventions accommodate that with simplified type tags for primitive types, such as numbers and strings, and optional simplification of object tagging. Moreover, you have the option of using exactly the same tags throughout, regardless of language. Just be aware that most script interpreters won’t enforce them, so you must code carefully and test thoroughly. In my own work, I dispense with differentiating signed from unsigned integers, single versus double precision floating point numbers, and so forth.

Role of Variable Scope

Most modern scripting languages understand scope, and build fences around local functions that hide their private variables from the main script, and vice versa. There is one glaring exception, exhibited by many scripting languages, which is that variables defined in the main script are visible to all of its local subroutines. Alas, Visual Basic Script (VBScript) is guilty of this offense, and I suspect this is at least partially responsible for the exploitability of some of the security flaws that surface from time to time in it. In any event, it is cold comfort to the security conscious programmer that a local function or subroutine can see and change any variable declared in the main routine.
In this respect, VBScript has plenty of company, including Perl, one of my favorite scripting languages. Thankfully, there is a relatively simple way around this dilemma, which can be applied to any scripting language that exhibits this behavior, including both VBScript and Perl, and is considered a Good Design anyway. If the main routine follows the design pattern of a classic C program, in which the main routine makes a few basic decisions, and calls functions that do the real work, and defines no variables of its own, then the global namespace is empty, and every variable lives behind a good fence.

Conclusion

How you use this or any naming conventions is largely a matter of personal or group preference. They are here to help you, not to confine you.
Use them the way they were intended to be used, and write solid code.

No comments: