Thought Snippets
This is a collection of notes and thoughts that I hope will benefit my fellow software wizards.
Thursday, July 30, 2015
The Collapse of the Pass By Value versus Pass by Reference Distinction
between part of a program, variously called functions, procedures, routines,
and subroutines.
1) By Value: The subroutine was told the value of a variable, but the caller
kept the location where it was stored to itself.
2) By Reference: The calling routine told the subroutine where to find the
value.
Even then, arrays have always been passed by reference. That is, the memory
location where its first element was stored was handed off to the
subroutine. Combined with information about the type of value that was
stored in the array and the number of elements it contained (the latter
usually passed separately, by value, except in BASIC), the subroutine could
extract individual elements from it. This worked well enough for BASIC, and
C even pretended to play along, by passing pointers to structures and
objects by value. At the very least, the subroutine could mess with the
data, but it couldn't change the caller's pointer to it.
This approach worked pretty well when a good percentage of everyday programs
were written in some dialect of BASIC, leaving the serious work, including
most of the plumbing, to be written in C or C++ by "real programmers," who
were assumed to know what they were doing. Mind you that it isn't entirely
their fault that a lot of buggy C and C++ code made it into production.
Pressure from upper management to get things done faster, with fewer people,
played a significant role, too, but I digress.
In 2002 came the Microsoft .NET Framework, which was supposed to put an end
to all of that with its managed heap and all that. With the .NET Framework
came something else that has received little coverage; the distinction
between pass by value and pass by reference has pretty much collapsed into a
heap of rubble. But nobody noticed.
Before I go further, allow me to illustrate with an example from some of my
own working C# code.
private static void RecordInitialStateInLog (
Dictionary<StreamID , StreamStateInfo> pdctStreamStates ,
StateManager psmTheApp ,
string pstrBOJMessage )
The code snippet shown above is the signature of a subroutine,
RecordInitialStateInLog, that takes three arguments.
1) pdctStreamStates is a Dictionary (associative array) of StreamStateInfo
objects, indexed by StreamID, an enumerated type.
2) psmTheApp is a reference to a StateManager, a Singleton object that
exposes properties and methods to support operations commonly required of
character mode applications.
3) pstrBOJMessage is a garden variety string, if you can say there is such a
thing in a .NET application.
The first and simplest of the two complex objects, pdctStreamStates ,
exposes the data that the subroutine needs through its Keys and Values
properties, both of which can be enumerated. A Count property tells us right
off how many items we can expect to find in each of those collections.
Likewise, the psmTheApp argument, through its properties, exposes
AppErrorMessages, an array of strings, which we can read, but cannot change,
AppExceptionLogger, an instance of the
WizardWrx.DLLServices2.ExceptionLogger that we can use to report and log
exceptions, AppReturnCode, a read/write integer that holds the exit code
returned by the program when one of several methods on the StateManager is
called, and others.
Nowhere in the method signature does the word "reference" appear, or any
similar word. Neither does the phrase "By Value" appear anywhere.
Watch what happens when the main program calls the routine. Below are the
machine instructions that make it happen.
155: RecordInitialStateInLog (
156: dctStreamStates ,
157: s_smTheApp ,
158: strBOJMessage );
0047400E push dword ptr [ebp-44h]
00474011 mov edx,dword ptr ds:[3567254h]
00474017 mov ecx,dword ptr [ebp-48h]
0047401A call 0040C6B0
The first instruction is as follows.
push dword ptr [ebp-44h]
This instruction pushes the machine address where string strBOJMessage, a
counted Unicode string, is stored.
The next instruction puts a pointer to the state manager into CPU register
EDX, where the CLR looks for the second argument when the subroutine needs
it.
mov edx,dword ptr ds:[3567254h]
The third step in the setup of the call stores the location of the
Dictionary object, dctStreamStates, in another machine register, ECX.
mov ecx,dword ptr [ebp-48h]
Finally, the subroutine is called.
call 0040C6B0
Without delving further into the machine code and getting off topic, what
just happened here? The main routine just told a subroutine,
RecordInitialStateInLog, where, within its memory, it can find three bits of
information that the subroutine needs to do its work. Armed with this
information, the subroutine can do anything with those objects that each
permits. That last phrase is significant; I shall return to it shortly.
1) Starting with the string, about all it can do with strBOJMessage is take
its length copy some or all of it into a new string, and convert it to all
upper or lower case characters, also yielding a new string.
2) Dictionary dctStreamStates allows its values to be enumerated and copied.
Since the dictionary isn't marked as read only (which the called routine can
ascertain by evaluating its IsReadOnly property), the subroutine can even
append items to it, replace existing items, and delete items.
3) Finally, StateManager psmTheApp is a mixed bag; some of its properties
(e. g., AppReturnCode, can be changed to inform the main routine that the
program should report an error when it end, while AppErrorMessages is a read
only array of strings, whiles AppRootAssemblyFileDirName is a read only
string.
You have probably realized by now that StateManager is a custom class, and
its design determines what users are allowed to do with its properties.
Which properties are read/write and which are read only are the result of
deliberate decisions about which properties a consuming assembly should be
allowed to change, and which should be protected against changes. The
AppReturnCode is fair game for the application to change at will, but you
wouldn't want the application to be able to change the text of the error
messages or the name of the program directory.
Elsewhere in the code, an exception handler sets the error code to
MagicNumbers.ERROR_RUNTIME (+1), a nonzero value, to signal that a run-time
exception has been caught and reported, causing the task to fail.
The final statement executed by the main program makes a decision based on
the value of the s_smTheApp.AppReturnCode property.
Environment.Exit (
s_smTheApp.AppReturnCode > MagicNumbers.ERROR_SUCCESS
? s_smTheApp.AppReturnCode
: MagicNumbers.ERROR_SUCCESS );
Astute observers will notice that this could be simplified by eliminating
the decision, and passing the value of the AppReturnCode property straight
into the Environment.Exit routine. The reason that I didn't do so is that
this example came from a work in progress, and the final version will
substitute a different routine that makes better use of that decision to set
one of its arguments. I wrote it this way to remind me to make the
replacement in the final version.
There are two noteworthy things about this example.
First, the subroutine got a reference to each of its three arguments. Within
the limits imposed by the objects, themselves, it can plunder their
properties more or less at will.
The second consequence follows from the first: when the caller regains
control, some or all of the properties may have been changed.
The exception is the string, strBOJMessage, which is immutable, meaning that
assigning a new value to it within the subroutine, or even in the main
routine, creates a brand new string, leaving the original intact when a
subroutine makes the change. Conversely, when the new value is assigned in
the main routine, the old one is lost. Strings are always the odd duck on
the pond.
The .NET Framework specification really muddies things, especially if you
are accustomed to thinking of structures the way C and C++ programmers use
the term. According to "Value Types (C# Reference)," at
https://msdn.microsoft.com/en-us/library/s1ax56ch.aspx, a Value Type is
either an Enumeration or a Struct. Wait a minute, you say, I can understand
how an enumerated type can be a value type, because they boil down to an
integer, but how can a Struct be a value type? I thought the other value
types were the simple numeric types (integer, long, float, double). A closer
look reveals that the intrinsic value types are, indeed those four, plus
enumerations. Call it Microsoft Magic; all four are classified as Structs!
I have a theory about why this is so, but my proof is limited
semi-scientific observation of the machine code that implements my .NET code
running in the Visual Studio debugger. If my guess is correct, though,
Microsoft has successfully future-proofed all four basic numeric types by
making their internal representation opaque, for which there is ample
precedent. For example, a C implementation of almost any major encryption
algorithm is made more portable by specifying variables that represent
integers of a specific bit width (usually 32 or 64), as a typedef, and a
Win32 HANDLE is an opaque struct.
There is nothing in the definition of struct that requires it to contain two
or more members. Obviously, to be useful, it needs one member, but that's
all it really needs. Hence, the following structure is legal.
struct _int32 {
Value ;
} int32 ;
Since the machine address of the first member of a structure and the address
of the structure, itself, are the same, defining a value type as a structure
hides the implementation details without affecting user code. If I pass my
int32 structure to a routine that knows how to handle such a thing, it can
find the other members, if any, without anything from me beyond the address
of the first member. Concrete examples of this abound , even in the Win32
API. For example, both long (64 bit) integers and floating point numbers are
structures. Integers store the lower and upper 32 bits in machine word sized
chunks, while floating point numbers store a mantissa and an exponent, which
are passed around and mostly treated as a unit, until it comes time to
format it for printing or use it in a mathematical operation. Those chores
fall to routines in system libraries that you can safely treat as black
boxes, whether your code is written in C#, VB.NET, C++, or something else.
But why 32 bit integers, too? What happens when the processor architecture
is 64 bits? Your 32 bit integer occupies only half a machine register, a
detail that matters only at the very lowest levels of the code, in the
native code generated behind the scenes by NGEN, the Native Code Generator
service. Since it's a structure, the 64 bit runtime just handles it,
transparently, and you carry on. That's why you see System.Int32 in the
Locals window of your debugger, and in the argument lists displayed in a
stack trace. This simple device abstracts away the hardware dependency.
Intermediate Language sees only System.Int32; and the native code generator
knows exactly what to do with it, whether your CPU architecture is 32 bit,
64 bits, 128 bits, or more. Your code just works, without any changes.
Even this is not entirely new; for years, we have had 16 bit integers, known
by various names (WORD, Short, and so on) that were treated in much the same
way by 32 bit hardware, in which a 16 bit integer occupies only half of the
32 bit register. Indeed, the present day assemblers still recognize 16 bit
registers AX, BX, CX, DX, DI, and SI, which correspond to the lower half of
32 bit registers EAX, EBX, ECX, EDX, EDI, and ESI
As an aside, code that manipulates ANSI characters uses the original 16 bit
subdivisions, AH, AL, BH, BL, CH, CL, DH, and DL, all of which behave like 8
bit registers.
WHAT PRACTICAL USE IS THIS?
The topic "Main Features of Value Types" says " Assigning one value type
variable to another copies the contained value." Ignore the first sentence;
it's the one that stirs up the mud. Treat value types AS IF they directly
contain values, because, in truth, a structure cannot directly contain a
value. Only a structure member can do that. The issue is that value types
are sufficiently small that making a copy is computationally cheap, whereas
copying a reference type is neither computationally cheap, nor good
engineering, since it defeats the purpose of defining the object in the
first place.
Given the preceding statement yields the following practical distinctions.
1) Value types are effectively passed by value, period. With respect to
value types, what changes in the subroutine stays in the subroutine.
2) Reference types are effectively passed by reference, period. With respect
to reference types, there are no secrets. The calling routine sees all
changes made to the properties of the reference types that it passed into
the subroutine.
Value types represent a very small subset of the objects that inhabit a
typical .NET assembly. Enumerations, integers, floating point numbers,
decimal numbers, a relatively small number of system types (e. g.,
System.DateTime, System.TimeSpan, System.GUID, and a few others) and user
defined structures are value types. Everything else is a reference type or a
string.
This second statement has significant consequences for both robustness and
security of applications.
You have heard it said that one of the tenets of good object oriented design
is data hiding. The example above should make abundantly clear why this is
important; the read/write properties of any object that is visible to a
routine can be changed by it. From this precept, I draw two rules.
1) Unless consumers of an object _must_ be able to change the value of a
property, make it read only. If the value must be updateable, consider a
method, instead of a write property. Methods offer two advantages over
properties; they can take arguments that can be used to supply additional
information that can be used by the method to decide whether to allow the
update, and it is considered acceptable to allow a method to fail by
returning a distinct exit code or raising an exception.
2) Unless a routine needs access to most or all of the properties of an
object, consider passing in only the properties that it needs as individual
arguments. This also decouples the routine from the object.
In security terms, the preceding two rules implement the Need To Know
principle. In addition to making the routine more secure by reducing its
attack surface, they reduces the risk of unintended changes to object
properties that may not surface until the application is in production.
HOW DOES THIS AFFECT THE OVERALL DESIGN
Rigorous application of the Need To Know principle is mostly old school
design. The main routine makes a few key decisions, and calls one or more
subroutines that do the real work. Each of those subroutines makes a few
more decisions, and calls more specialized subroutines to perform the
required tasks. This process continues until no more decisions remain to be
made, and the routines that comprise the leaves of the program's process
flow are pretty much drop-through routines that perform a series of actions,
with few, if any, decisions. When objects are brought into these routines as
and only when needed, every routine conforms to the Need To Know principle,
and the overall attack surface is minimized by design.
Wednesday, July 29, 2015
Multilingual Mania
Benefits of Integrated Applications
This type of integrated application has many benefits for its users, and for the people who pay for them.Benefits for Users
- There are fewer applications to learn.
- Everything is entered once, and distributed behind the scenes to other parts of the application that need it.
- Everything is entered once, and distributed behind the scenes to other parts of the application that need it.
- Entering everything once only saves time for input and editing.
- Since everything is entered just once, there is only one chance to make an input error, and needs checking just onece.
- Subsequent processes are easier to set up and run.
- No time is wasted locating and loading inputs for subsequent steps, because the system knows how to find them.
- The system can prevent processing the same data twice, or failing to process a batch.
Benefits for Managers
- Assembling a custom application from purchased off the shelf software is usually much less expensive than writing one from scratch.
- Each part performs the task at which it excels. I think it is safe to say that no software is best at everything.
- Components can be replaced when a better product is found for its job. Frequently, such replacements happen behind the scenes, and end users are unaware of the change.
- The time from conception to deployment is significantly shorter (usually measured in years).
- Significant maintenance costs are shared among users of the purchased software, so that everyone gets high quality updates for less.
- Some of the burden of correcting design and programming errors falls on the shoulders of the publishers, relieving your IT budget to some extent.
One Project, Many Languages
Just as a carpenter needs a variety of tools and materials to build a house, so it is with programmers and software. While the distinction is a bit more blurry, I like to think of the languages as materials and the editors, compilers, interpreters, and so forth as the tools. The principle is pretty much the same, though; when you build a kitchen, you use different materials for the counter top. Cabinets, pantry, and floor, just as you use different programming languages to write the parts of a Web site that run on the Web server and the parts that run in the visitor’s Web browser.Just as using two or more spoken languages presents challenges, so it is in software, although there are a few differences.
Syntax
Every language has grammar and syntax rules, whether it is spoken, written, or fed into a computer as operating instructions.Thankfully, like the written languages upon which they were modeled, programming languages fall into groups that share common elements of grammar, syntax, and even vocabulary. Hence, just as a person who speaks German can figure out a lot of Danish, Dutch, Flemish, Norwegian, and Swedish text, or an Italian speaker can grasp Romanian and Spanish, so a person who knows C or C++ has relatively little trouble with JavaScript, Perl, PHP, and Python. While the similarities won’t make a programmer an overnight expert in the new language, they give him or her a head start.
Objects and their Names
Usually, a more significant hurdle than the syntax, vocabulary, or grammar of a new language is learning the names of the objects that inhabit the application domain, and how they are related. The usual term for this is Object Model. The Object Model is a map of the territory. Whether you use JavaScript, VBScript, PerlScript, or some other scripting language to manipulate them, the objects that you manipulate inside a Web browser and how they are related to one another is similar, and they usually have the same names: Windows, Text Boxes, List Boxes, Combo Boxes, Buttons, Forms, Frames, Toolbars, etc. Likewise, when you manipulate a Microsoft Word document, you work with Documents, Stories, Sentences, Tables, Characters, and so forth, whether you use Visual Basic for Applications (VBA), C#, or C++ to do the manipulating.This brings us to the point of this essay, the naming of instances of these objects. There are two major aspects of object naming.
- Naming Convention: A naming convention is a generalized plan for the naming of objects and other variables manipulated by a program.
- Naming Scheme: A naming scheme is a plan, ideally based on standard practices observed by the organization, for naming objects of like kind, such as different Text Box objects in a form, or Person objects in the business logic of a data base driven application.
Naming Conventions: Benefits and Limitations
While there is nothing magic about naming conventions, they play an important role in communicating vital information about the moving parts of a software application. Nevertheless, it is essential to understand what a set of naming conventions is, and, equally significant, what they are not.What They Are
The best way to characterize a naming convention is by listing and briefly describing its key features, which can be summarized in one word, ACES.- Adaptable: It is easy to adapt it to the requirements of your environment. Any number of circumstances might require adaptation; one good example is that the primary spoken language of your group is not English.
- Consistent: It is internally consistent. For example it differentiates individual items of a kind from collections or arrays of items of the same kind consistently for all types of objects, so that a quick glance at a name tells you whether it refers to an individual object, a collection of like objects, or an array of like objects.
- Extensible: For most projects, I found that assigning dedicated tags to the objects in your data model was more trouble than it was worth. However, that general statement in no way prohibits extending the conventions by defining a set of tags for a group of closely related objects that play a significant role in your application domain. For example, if your application revolves around airplane parts, it might make sense to create tags to identify large classes of parts, such as engine parts or airframe parts.
- Simple: Consistent application of simple naming conventions contributes far more than complex conventions that go unused because they are too complex for daily use.
What They Are Not
One word, AIMS, nicely summarizes what naming conventions are not.- Absolute: Naming conventions are recommendations to guide you, not rules to slavishly follow.
- Inflexible: It is appropriate from time to time to deviate from a naming convention. For example, while I advocate designating function arguments with a prefix, a public interface that is distributed only in binary form should almost certainly dispense with them.
- Mechanical Applying a naming convention should almost never be done mechanically. This goes double for retroactive application to an established code base. If you choose to do so, use a proper refactoring tool, such as those built into Visual Studio. Resist with all your strength the temptation to use the Find and Replace feature of your text editor, which cannot distinguish symbol names that should be changed from comments that use the same text in ordinary words that may be better left unchanged.
- Scheme: Naming conventions and a naming scheme are symbiotic pairs; they work together to produce a consistent set of names for the objects that store, transport, and transform your data.
What Do Naming Conventions Encode
A naming convention encodes attributes of a variable that affect how code interacts with it, so that you can focus on the work to be done, without constantly referring to the variable definitions. Table 1 (below) gives examples of the attributes typically encoded into a naming convention, accompanied by a brief explanation of how each affects the code.Table 1 gives examples of attributes commonly encoded by a naming convention.
Basic Type | The most important single bit of information you
need to know about a variable is its type, which determines what
operations may legally refer to it, other types to which it is
functionally equivalent, and the types into which it can be
transformed (cast). If the type is an object, it identifies the
code, called methods, that is attached to it, that enable the
variable to “do” things.
Early programming languages, such as FORTRAN and the original versions of BASIC, appended a special character to a name for this purpose. That worked well when there were only a handful of variable types, but it is insufficient for modern applications that employ dozens of types. Even FORTRAN IV had begun to outgrow the limited set of suffixes when it added double precision and complex numbers to its list of basic types. |
Number | In this context, number has a slightly different
meaning than it does in the grammar of spoken languages. Number
differentiates singular from plural, but it goes a tad further.
While English, and most other spoken languages, stop at
differentiating one versus many, programming needs a more fine
grained differentiation of the notion of plural (many) things.
|
Scope or Lifetime | Scope and lifetime are synonyms that indicate the visibility of the variable.
|
Usage | Usage is a specialized attribute, primarily applied to array and collection indices to indicate whether they store the First, Last, or Current position in the associated array or collection. Unlike the other attributes discussed here, usage is usually conveyed by a suffix. |
Why Encode All This Into the Name
Since every bit of this information is already encoded into the definition, why repeat it in the variable name, you may ask. The one-word answer is accessibility. Though most of these attributes are part of the definition, the object of this exercise is to put the information where you need it, in the variable name. For a local variable, parameter, or return value, well written code usually makes the definition fairly accessible. Nevertheless, there are situnations in which the definition is less accessible.- Occasionally, a really long switch or Select Case block is unavoidable, leaving the definition hundreds of lines, or several pages, away from the place where it is used.
- Class variables, especially protected variables defined by a base class, are usually defined in a different module, which may belong to another project, and is, therefore, relatively inaccessible. The same holds for public (global) variables owned by the module that defines the main entry point routine.
- When you are working from a code listing, such as during a code review, the hints that would be available from IntelliSense if you were working in your code editor, are unavailable. The same is true when you are working with side by side listings in a Diff tool, such as IDM UltraCompare or the difference viewer of your source code control system.
The Reddick-Gray Unified Naming Conventions, version 2.0
The Reddick-Gray Unified Naming Conventions, available at http://www.wizardwrx.com/RGUNC_Resources/, are the result of over 20 years of experience developing multilingual software. When I started extending the Reddick VBA Naming Conventions to support the multilingual applications that I was developing in the middle 1990’s, nobody was talking about multilingual development, because it was comparatively rare. There might be a bit of JavaScript in your Web applications, but that code represented islands dotting a sea of static HTML, even if that HTML was being generated on the fly by a Web server. The code consisted of a handful of lines that performed a very specific task, and it was treated as a black box.Since then, the applications have grown bigger, more complex and dynamic, and must render as nicely on the screen of an Android or Apple phone, any number of Android and IOS tablets of various sizes and shapes, and, oh, by the way, a 24” computer monitor that has an aspect ratio of 4:3, which may be running at one of a number of screen resolutions, typically starting at 1024 by 768 and going up. Look at any current job description on Dice, Monster, Stack Overflow, or anywhere else, and it is evident that multilingual programming has become the norm.
Multilingual programming is here to stay. With Java, Sun Microsystems tried to implement one language that could do it all, and run on anything. They failed, in large measure because the lowest common denominator presentation layer was too low to be usable. Microsoft tried again with Silverlight, aiming a bit higher, but the outcome was pretty much the same, though for a different reason. Nobody uses Silverlight or takes it seriously. Now comes HTML5, which makes no such one size fits all claims. The new mantra is “mobile first,” tacitly acknowledging that the small screen and the desktop need separate presentation layeers.
This realization brings the concept of the n-tier application into sharp relief, and gave rise to such “novel” concepts as the Model-View-Controller application model, in which the data model, data access, presentation, and business logic are four distinct layers, implemented in at least two programming languages (e. g., model, data layer, and controller in C#, and viewer in some variation of JavaScript). I call MVC “novel” because it is a logical evolution of the old Client/Server model that was all the rage in the early 1990’s.
The needs of multilingual programming include uniform naming conventions applicable to all programming languages. Since programming languages are more alike than different, so should the accompanying variable naming conventions.
The general objective of the RGUNC is to define a single set of variable naming conventions that can be applied with little or no modification to any programming language and application domain. This required one huge concession, about which I have said nothing up to now: most scripting languages are loosely typed. This means that a huge number of the programming languages that appear in today’s multilingual mix are languages that play fast and loose with variable types. This affects the design of a convention in two ways.
- The role played by variable types in their application is diminished, though not eliminated.
- Scope and lifetime of variables is usually unaffected, except in those increasingly rare scripting languages that play fast and loose with variable scope.
Diminished Role of Variable Type
Most scripting languages make no pretense about supporting variable types, and the interpreters that implement them perform little, if any, type checking. This statement is literally true with respect to the primitive\ types: integer, floating point, and string. Although the script engine performs no type checking on any of its variables, the objects, themselves, never relax their standards. Pass a Worksheet object that came from Microsoft Excel to a method that expects a Rowset object that came from SQL Server, and watch how fast your application crashes. But it will be the object that initiates the crash, not the script interpreter, which is left to clean up the mess, if it can.Accordingly, scripting languages require type tracking that is more relaxed in some respects, but every bit as strict in others. These conventions accommodate that with simplified type tags for primitive types, such as numbers and strings, and optional simplification of object tagging. Moreover, you have the option of using exactly the same tags throughout, regardless of language. Just be aware that most script interpreters won’t enforce them, so you must code carefully and test thoroughly. In my own work, I dispense with differentiating signed from unsigned integers, single versus double precision floating point numbers, and so forth.
Role of Variable Scope
Most modern scripting languages understand scope, and build fences around local functions that hide their private variables from the main script, and vice versa. There is one glaring exception, exhibited by many scripting languages, which is that variables defined in the main script are visible to all of its local subroutines. Alas, Visual Basic Script (VBScript) is guilty of this offense, and I suspect this is at least partially responsible for the exploitability of some of the security flaws that surface from time to time in it. In any event, it is cold comfort to the security conscious programmer that a local function or subroutine can see and change any variable declared in the main routine.In this respect, VBScript has plenty of company, including Perl, one of my favorite scripting languages. Thankfully, there is a relatively simple way around this dilemma, which can be applied to any scripting language that exhibits this behavior, including both VBScript and Perl, and is considered a Good Design anyway. If the main routine follows the design pattern of a classic C program, in which the main routine makes a few basic decisions, and calls functions that do the real work, and defines no variables of its own, then the global namespace is empty, and every variable lives behind a good fence.
Conclusion
How you use this or any naming conventions is largely a matter of personal or group preference. They are here to help you, not to confine you.Use them the way they were intended to be used, and write solid code.
Tuesday, February 4, 2014
A Command Extension Conundrum
So far as I know, the documented way to test whether command extensions are enabled is something like the following.
echo %0, Version 1.00 Starting
if CMDEXTVERSION 1 goto CHK_REQ1
echo.
echo This script requires command extensions to be enabled. Since they are
echo enabled, by default, they have been disabled, directly or by GPO.
goto ERR_DONE
:CHK_REQ1
For the first time in a long while, I temporarily disabled command extensions to perform a special test. To verify that command extensions were actually disabled, I ran a command script that begins by verifying that command extensions are enabled. I expected the error message listed above to be displayed. This is what actually happened.
C:\bin>CopyOneToAll.CMD
CopyOneToAll.CMD, Version 1.00 Starting
1 was unexpected at this time.
C:\bin>
This isn’t exactly what I expected.
Since I can complete the test without further investigation, the mystery has been put aside for now.
David A. Gray
Chief Wizard
WizardWrx
Email: dgray@wizardwrx.com
WWW: www.wizardwrx.com
Cell: +1 (817) 298-0867
Land: +1 (817) 812-3041
4014 Double Tree Trail
Irving, TX 75061-3936
Saturday, January 25, 2014
Retrieving Previous Version of File from Windows Recycle Bin
Today, when I went in search of a previous version of a file that I had just sent to the Recycle Bin, I noticed that the context menu has a Cut option. Since I had already reused the file name in the directory from which I deleted the other version, I decided to see what happened when I cut the file.
Much to my delight, I was able to paste the file into a different directory. This feat enabled me to preserve the good copy that I had just assigned, while allowing me to temporarily restore the discarded file, so that I could retrieve a snippet of text from it.
David A. Gray
Chief Wizard
WizardWrx
Email: dgray@wizardwrx.com
WWW: www.wizardwrx.com
Cell: +1 (817) 298-0867
Land: +1 (817) 812-3041
4014 Double Tree Trail
Irving, TX 75061-3936
Saturday, April 14, 2012
Basic for the 8080
Tuesday, August 23, 2011
If (!expression) Considered Harmful
Let me begin by saying that I don’t feel any need to apologize for adapting the well- known aphorism, “GOTO Considered Harmful,” because I think this little essay ranks with it in importance.
Since I began programming as an almost daily activity in 1977, I have learned and used, one way or another, and for an equally wide variety of reasons, many programming languages, including the following (in no particular order): Fortran, COBOL, IBM 307 Assembler, Intel/Microsoft Macro Assembler, Perl, Basic (everything from Bill Gates’ original Basic for the Intel 4004 through current day Visual Basic .NET 2010), C, C++, C#, SQL, the DataEase Query Language, and the Microsoft batch language in all its forms. Due to the nature of my work, I often use two or more languages in the span of a single work day, and they can run the gamut from basic DOS batch files to advanced C, back to back.
The advantage of this huge diversity is that I’ve seen and used dozens of idioms and design patterns, some of which appear in only one or a handful of the languages cited above. The case of interest today is the unless idiom, found only in Perl, so far as I know. There is a simple example in the section titled “and, or, & not” of Perl Idioms. Programming Perl, the Bible of Perl, covers it in 4.3. if and unless Statements.
In terms of demonstrating the power for clarification of the unless idiom, the examples usually cited to illustrate it are pretty lame. Nevertheless, since their aim is to demonstrate the idiom, itself, their simplicity is justified. Following are some examples from production Perl scripts.
if ( !$gmatches )
if ( !$_ ) { # Is line blank?
if ( !$paramisok ) { # Warn about invalid lines.
if ( !$appending )
if ( $_ =! /^.U / ) # Skip detail lines that list unencrypted files.
The last one is especially troublesome, because it joins the result of two expressions, both expressed in negative terms, with a logical and operator (&&), which has positive semantics.
Here are a few in C, which has a similar grammar.
if ( !CloseClipboard ( ) )
if ( !IsTextUnicode ( pBOMInfoP6C->lpCharBuff , ( int ) pBOMInfoP6C->dwNBytes , &intUnicodeTestMask ) )
if ( !( utpOutFile = fopen ( pszTestCipherTextFQFN , "wb" ) ) ) {
Wait a minute, I hear you saying. The whole idea of an if statement is to do something if a condition is true.
Exactly! That’s why the not operator (!) is so harmful.
Enter the unless Idiom
Now, let’s rewrite the above expressions, starting with the Perl examples.
unless ( $gmatches )
unless ( $_ ) { # Is line blank?
unless ( $paramisok ) { # Warn about invalid lines.
unless ( $appending )
if ( $_ =! /^.U / ) # Skip detail lines that list unencrypted files.
Next, let’s give the C statements the same treatment.
Unless ( CloseClipboard ( ) )
Unless ( IsTextUnicode ( pBOMInfoP6C->lpCharBuff , ( int ) pBOMInfoP6C->dwNBytes , &intUnicodeTestMask ) )
Unless ( ( utpOutFile = fopen ( pszTestCipherTextFQFN , "wb" ) ) ) {
Hold that thought. None of the above three statements is valid C. Indeed, out of the box, it is invalid in every version of C of which I am aware, which includes Microsoft Visual C++ 6.0 through Visual C# 2010, in addition to the open source standard, GCC.
Unless in C
Please excuse the pun, but the unless idiom can be implemented in C and (and C++).
All you need is a C preprocessor capable of expanding and substituting into a one-line macro.
Let’s have the drum roll, please. Here it comes.
#define Unless(pexpr) if ( !(pexpr) )
Make this one-line macro visible to all of your C and C++ code by including it into a header that you always include, and you can clarify the intent of your code, unless you really need all those obtuse if ( !expressions.
Wait, you say, my compiler doesn’t support macros. There is another way, which is just as straightforward, still fits on one line, and goes into that header that you always include.
__inline Unless(pexpr) { return ( !( pexpr ) ); }
Fight coding horrors, one expression at a time.
Wednesday, March 17, 2010
Better Batch Files Through Command Extensions
The first batch files arrived alongside the very first versions of MS-DOS and PC-DOS. If you have been around the personal computer for many years, you undoubtedly remember, perhaps not fondly, AUTOEXE.BAT. Power users probably had two or more of them, and some means by which to choose which one to use the next time they engaged their DOS boot diskette. Later came such features as batch files that displayed menus, read inputs from the beloved DOS prompt, and made processing decisions based on those inputs. More sophisticated batch jobs accepted command line arguments, just as did the commands that were built into DOS itself, and the dozens of utility programs, such as XCOPY, that came with it.
The Primitive Command Line Interface
While those early batch files accepted arguments, sometimes called parameters, the command parser was very picky. Testing the value of an argument was extremely limited, and followed a syntax that was familiar to C programmers, but alien to most other computer users, as shown in Listing 1.
- IF "%1" == "DE4DATA" GOTO CHECK_1
IF "%1" == "de4data" GOTO CHECK_1
IF "%1" == "De4data" GOTO CHECK_1
IF "%1" == "De4Data" GOTO CHECK_1 - Listing 1 shows the only way you could have any degree of case insensitivity in your command line argument tests in MS-DOS and PC-DOS.
Listing 2, below, shows a set of tests that gives more complete coverage.
- IF "%1" == "DE4DATA" GOTO CHECK_1
IF "%1" == "de4data" GOTO CHECK_1
IF "%1" == "DE4data" GOTO CHECK_1
IF "%1" == "DE4Data" GOTO CHECK_1
IF "%1" == "DE4DAta" GOTO CHECK_1
IF "%1" == "DE4DATa" GOTO CHECK_1 - IF "%1" == "dE4DATA" GOTO CHECK_1
IF "%1" == "de4DATA" GOTO CHECK_1
IF "%1" == "de4dATA" GOTO CHECK_1
IF "%1" == "de4daTA" GOTO CHECK_1
IF "%1" == "de4datA" GOTO CHECK_1 - IF "%1" == "De4Data" GOTO CHECK_1
- IF "%1" == "De4DaTa" GOTO CHECK_1
IF "%1" == "De4DatA" GOTO CHECK_1 - IF "%1" == "DE4data" GOTO CHECK_1
- IF "%1" == "DE4Data" GOTO CHECK_1
Listing 2 requires 16 tests to provide partial case sensitivity.
That is 16 lines of code to provide partial case sensitivity for one command line argument! Although the example above is extreme, it illustrates why batch programmers usually confined themselves to terse parameter values.
The Command Parser Grows Up
Windows NT brought with it a new command processor, CMD.EXE. Unlike its ancestor, COMMAND.COM, CMD.EXE is a full fledged 32 bit console mode Windows program. It can do everything that COMMAND.COM can do, and much more. One of its most useful new capabilities arises from command extensions, which are enabled by default. You can turn them off, but why?
- IF "%~1" EQU "" GOTO DO_ALL
- IF /I "%~1" EQU "DE4Data" GOTO CHECK_1
IF /I "%~1" EQU "Home_Office" GOTO CHECK_2
IF /I "%~1" EQU "Remote_IncrEase" GOTO CHECK_3
IF /I "%~1" EQU "Home_Office_IncrEase" GOTO CHECK_4
IF /I "%~1" EQU "UTIL" GOTO CHECK_5
IF /I "%~1" EQU "QB4_Source" GOTO CHECK_6 - Listing 3 does, in its second of only 7 lines, much more than the 16 lines shown in Listing 2, because it is fully case insensitive.
Listing 3, taken from a production batch file, illustrates several of the capabilities provided by command extensions.
- The second through seventh lines modify the IF command with a new /I switch, making its tests fully case insensitive.
- All seven lines take advantage of another command extension, the tilde (~), to strip away quotation marks around the command line argument ("%~1"). I'll say more about this shortly.
- The third command extension illustrated in Listing 3 is the mnemonic relational operator, EQU. Thanks to command extensions, the IF command now supports a complete set of relational operators.
Who Can Use Command Extensions?
By default, command extensions are enabled, and are available in all versions of Windows that derive from the Windows NT code base, starting with Windows NT 4. In addition to NT 4, this includes Windows 2000, Windows XP, Windows Server 2003 and 2008, Windows Vista, and Windows 7.
Why Keep the Quotation Marks?
Despite many improvements, the command parser still has a few quirks, one of which is that it sometimes gets confused by bare words that are neither internal commands, nor recognizable program or batch file names. Thus, it is usually best to keep the quotation marks around the text against which an argument is being compared. However, if the argument is enclosed in quotation marks because it contains embedded spaces, and the test, itself, is also enclosed in quotation marks, the result is that the comparison string is enclosed in two sets of quotation marks, as shown in Table 1.
Command Line Argument | “Document to Process.doc” |
Old Style Comparison | If “%1” == “This is the Document.doc” |
Outcome of Old Style Comparison | If ““This is the Document.doc”” == “This is the Document.doc” |
New Style Comparison | If “%~1” == “This is the Document.doc” |
Outcome of New Style Comparison | If “This is the Document.doc” == “This is the Document.doc” |
Table 1 illustrates the behavior of old style comparisons, done without the benefit of command extensions, and the new style, which leverages them to make the test work as you would expect.
Conclusion
While graphical interfaces and tools are nice, and they have a valuable place in our tool sets, so does the lowly command line interface. Batch files are fairly easy to write, test, and debug, run fast, are ready to run without a compiler or specialized interpreter, run without fuss on any machine, and can go where a program with a graphical is a waste, such as in logon and logoff scripts and scheduled tasks, none of which has a visible user interface.
While I do my share of graphical programming, batch files remain an essential part of my production environment, and occasionally become part of packages that I deliver to clients.
Use the following resources to learn more about command extensions, and start building flexible, powerful, modern batch files.
References
- http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/if.mspx?mfr=true is the official documentation of the modern IF command.
- http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/cmd.mspx?mfr=true is nominally about the new command processor, CMD.EXE, but it includes basic information about command extensions, including several ways to enable and disable them.
- http://www.robvanderwoude.com/local.php is a discussion of the SETLOCAL and ENDLOCAL commands, which includes one of several nifty tests that you can use to verify that command extensions are enabled.
- http://technet.microsoft.com/en-us/library/bb490920.aspx lists and documents the full set of relational operators that become available with command extensions enabled.