Talk:Formal grammar/Reductive grammar
This discussion was split into this page from the Formal grammar talk page, because it was growing too large and only had 3 participants. Rp (talk) 21:27, 4 September 2015 (UTC)
(On the need to include information on analytic or reductive grammars)
[edit]Analytic grammars are not covered enough per archive 1. Some literature refers to them as "recognitional" models, but that might be covered in "recognize". -- QTJ 06:06, 6 November 2006 (UTC)
- In formal language theory, there is no such thing as analytic grammar. (Formalisms to analyze languiage are known as automata.) I have renamed this page to make it 100% clear that it is about the notion of grammar known in formal language theory. Rp (talk) 17:51, 28 December 2007 (UTC)
- Nevertheless, the concept analytic grammar is interesting in its own right, and in my opinion deserves an article of its own (with a link from this article as well as from automata article(s)). In particular I think it needs to be clarified how generative and analytic grammars compare in formal strength. For example, I already know that it takes a context-sensitive analytic grammar to describe some "context-free languages". (This reflects for example the context-sensitive operation of an LR(1) parser.) 83.226.178.77 (talk) 11:56, 8 January 2012 (UTC)
- It would be nice if you could demonstrate this by showing that it is a concept in the first place, for instance, by providing a clear definition. Rp (talk) 23:51, 8 January 2012 (UTC)
- Quite true -- "In formal language theory, there is no such thing as analytic grammar" -- they are known as reductive grammars. see Reductive Grammar below. Steamerandy (talk) 05:44, 24 January 2015 (UTC)
- What you call a "reductive grammar" is the exactly the same as "context-free". And even if the proper term in the '60s was "reductive grammar", that does not make it a proper term today. — Arthur Rubin (talk) 06:08, 24 January 2015 (UTC)
- Reductive grammar: (computer science) A set of syntactic rules for the analysis of strings to determine whether the strings exist in a language."Sci-Tech Dictionary McGraw-Hill Dictionary of Scientific and Technical Terms, 6th edition, published by The McGraw-Hill Companies, Inc". Retrieved 15 January 2015.Steamerandy (talk) 20:27, 24 January 2015 (UTC)
- What you call a "reductive grammar" is the exactly the same as "context-free". And even if the proper term in the '60s was "reductive grammar", that does not make it a proper term today. — Arthur Rubin (talk) 06:08, 24 January 2015 (UTC)
- Quite true -- "In formal language theory, there is no such thing as analytic grammar" -- they are known as reductive grammars. see Reductive Grammar below. Steamerandy (talk) 05:44, 24 January 2015 (UTC)
- It would be nice if you could demonstrate this by showing that it is a concept in the first place, for instance, by providing a clear definition. Rp (talk) 23:51, 8 January 2012 (UTC)
- Nevertheless, the concept analytic grammar is interesting in its own right, and in my opinion deserves an article of its own (with a link from this article as well as from automata article(s)). In particular I think it needs to be clarified how generative and analytic grammars compare in formal strength. For example, I already know that it takes a context-sensitive analytic grammar to describe some "context-free languages". (This reflects for example the context-sensitive operation of an LR(1) parser.) 83.226.178.77 (talk) 11:56, 8 January 2012 (UTC)
- .. Reductive grammars have alternants. In effect allowing multiple rules for the same left non terminal. All though that defination is a linguist view. I have found in computer science that variable types make a language context sensitive. Thus making terminology here confusing.
- In a computer science pdf on Formal Grammars and Languages - Computer:
- Grammars can be divided into four classes by gradually increasing the restrictions on the form of the productions. Such a classification is due to Chomsky[Chomsky,1956,Chomsky,1963] and is called the Chomsky hierarchy.
- Definition 3.1
- Let G=(Σ,V,S,P) be a grammar.
- 1. G is also called a Type-0 grammar or an unrestricted grammar.
- 2. G is a Type-1 or context-sensitive grammar if each production α→β in P satisfies |α|≤|β|. By “special dispensation,” we also allow a Type-1 grammar to have the production. S→€, provided S does not appear on the right-hand side of any production.
- 3. G is a Type-2 or context-free grammar if each production α→β in P satisfies |α|=1; i.e. ,α is a single non-terminal.
- 4. G is a Type-3 or right-linear or regular grammar if each production has one of the following three forms: A→cB, A→c ,A→€, where A, B are nonterminals (with B=A allowed) and c is a terminal.
- The linguist defination: A Type-2 or context-free grammar if each production
α→β in P satisfies |α|=1;
- i.e. ,α is a single non-terminal.
- The linguist defination: A Type-2 or context-free grammar if each production
- Linguist rules do not have alteration within a rule. They use multipal rules of the same name. i.e.
A = b C C = x C = y
- In a reductive grammer could be written as:
A = b (x | y)
- That is not what I thought of as being context sensitive?? The linguest gramar rule form is different then computer grammar rules. This confusion comes from differances in the way we express grammer rules. I started out as a math physics major and got hooked on computer. So my view is from the computer science math logic view. Maybe I missed something in that document.
Steamerandy (talk) 00:44, 26 January 2015 (UTC)
- Whether "alternation" is expressed inline or with a new non-terminal seems insignificant. A legitimate definition of "reductive" would relate to the type of automaton(?) required to recognize the language. And declared variables takes the language out of the "context-sensitive" grouping into "general". — Arthur Rubin (talk) 18:30, 26 January 2015 (UTC)
- I was talking about the defination of contest free grammars. And the alternative operator is significant. You seamed to class reductive grammars as only defining contest free grammars. In the Chomsky grammar the linguist defination:
- A Type-2 or context-free grammar: if each production
α→β in P satisfies |α|=1;
- i.e. ,α is a single non-terminal. There is only 1 occurrence of a on the left side.
- That is a non-terminal can appear only once on the left side. a type-2 (context free) grammer can have only one rule of given name. But note thoes rules do not have an alternate operator. They use multipal rules instead. That means that having alternate operator provides the defining of context sensitive languages while non-terminals appear only once on the left.
α→b | c
is equilivant to:
- That is a non-terminal can appear only once on the left side. a type-2 (context free) grammer can have only one rule of given name. But note thoes rules do not have an alternate operator. They use multipal rules instead. That means that having alternate operator provides the defining of context sensitive languages while non-terminals appear only once on the left.
α→b α→c
- making |α|=2;
- Rubin I thought you to be talking the linguest side. Production rules generate valid strings of a language. Reductive rules break a language into its parts. What I see is that production rules do not generate higher structures of writing. Paragraph, chapters and volumes. For example in a programming language defining a block of code. That is basically a reductive rule. A lot of programming languages are defined with reductive rules. We seam to have adopted calling them productive when in computer science we do have reductive rules.
- I see a lot of whining about grammars not able to use left recursion. So the ???? what. You can program recursive loops in C, Pascal, C++ and many other languages. A programmer should be able to understand recursion and avoid a non-terminating recursive loop.Steamerandy (talk) 02:29, 27 January 2015 (UTC)
- This "whining" is in all graduate-level texts on formal language theory: LL(1) grammars (an example of a class that disallows left-recursion) can only recognize a strict subset of the deterministic context-free languages, which in turn are a strict subset of the context-free languages. The reason this stuff is called "theory" is that it works with an abstract, general, idealized mathematical description of the subject, that allows general and provably correct statements to be made, for instance, regarding the question: "Does allowing left-recursion in grammars add expressive power?" That is the whole point. Saying that good programmers know how to use left recursion is beside the point; this article is about grammars, not programmers. Rp (talk) 17:07, 28 January 2015 (UTC)
- I see a lot of whining about grammars not able to use left recursion. So the ???? what. You can program recursive loops in C, Pascal, C++ and many other languages. A programmer should be able to understand recursion and avoid a non-terminating recursive loop.Steamerandy (talk) 02:29, 27 January 2015 (UTC)
- Disallowing left recursion takes away nothing in these languages. If replaced with a loop operator.
expr = term | expr '+' term | expr '-' term; is equivalent to: expr = term (('+'|'-') term)*;
- The Kleene star * zero or more operator is used in place of recursion. The two rules express the identical same language construct. So how does disallowing left recursing reduce expressiveness? Can you explain or show were a loop isn't equivalent. ThanksSteamerandy (talk) 08:57, 18 December 2015 (UTC)
- Indeed: disallowing left recursion does not reduce expressiveness. Formal definitions of context-free grammars do not include alternatives or iterations as constructs within rules because rules that use them can be replaced with equivalent rules that do not use them. For instance, replace x = y | z with the two rules x = y and x = z and replace iterations x = y* with two rules x = (empty) and x = yx, and this doesn't reintroduce any left-recursion. What does reduce expressiveness is limiting parsing in certain ways, e.g. limiting it to top-down, left-to-right scanning without any backtracking. Left recursion is a problem for such parsers, but it is not the requirement of absence of left recursion by itself that limits their power. Rp (talk) 18:53, 29 December 2015 (UTC)
- The Kleene star * zero or more operator is used in place of recursion. The two rules express the identical same language construct. So how does disallowing left recursing reduce expressiveness? Can you explain or show were a loop isn't equivalent. ThanksSteamerandy (talk) 08:57, 18 December 2015 (UTC)
Terminology clarification requests continued
[edit]Continued from above. Indentation to deep to follow using phone. If this should have been done differently feel free to edit.
First. This topic is just as important in computer science. I am not sure if you fellows are talking linguistic point of view or computer science. You keep talking about parsers.
You even refer to grammar rules as analyzing a string. Production rule produce strings not analyze them. That is by their defination.
50 years ago reductive grammars were just as valid in defining a formal language as productive. At least that seams to be the case from references in old documents. The fact that it is in the McGraw Hill technical term dictionary 2006 leads me to believe it is still valid. Maybe not in linguistics.
Historically reductive grammsrs were dominating the landscape of compiler writing until Unix became popular. Unix was being used in most university Computer Science departments and it came with yacc. The Schorre metacompilers were being used in government think tanks. Systems Development Corporations was the major developer of metacompilers using reductive grammars. So now we have reductive grammar campiler compilers being proprietary or government classified technology and hidden away. While yuk prospers in the lime light. I ment yacc. Not really.
I would say that productive grammars can be transformed into reductive grammars. But from my point of view a productive grammar is hard pressed to generate higher order language structures from the bottom up point of view. Any time you have nested bounding you would be using a high level rule.
block = "{" statement $statement "}"; statement = block | asign_statement | if_statement ...
$ is the zero or more loop operator.
How do you get to that from rules that start from characters. And of course that rule is recursive. block may be referanced in statement. I can't say statement defination without inferring analysis. It looks analytical. It sounds analytical.
The left recursion winning is just in general when analytical grammars are discussed. There really is no reason it couldn't be alliwed in a reductive grammar. The only reason it isn't allowed in metacompilers is we won't absolute control of the parser.
Early in compiler development compilers for generative grammars were written. Their output was random strings in the grammar. They compiled production rules to do productions. I wonder how they handled left recursion.
A compiler can recognize left recursion if needed. I do not believe a reductive grammar prohibits left recursion. In their use as programming languages left recursion results in a recursive loop that does not end well.
But as a written grammar description that does not apply. My interest in this is describing metalanguage used by metacompilers.
Another way to look at this is a modern compiler whoes source language is described in BNF. In many cases that BNF productive specifications. is rewritten in a reductive form before a parser generator is run on it. So technically we now have the compiler language being defined in a reductive grammar. So technically it's not a formal language. Unless the two forms can be proven to define identically the same language. And if that is the case then either is just as valid a formal language specifications.
Let's open the hood of a compiler. It reads a stream of characters and decides if they belong to the language it is compiling. The defination of a reductive language. So we can say that a compiler is in part a reductive grammar engine. If it were a productive grammar engine it would output a character in the language. At some point we have implemented a reductive grammar. The actual compiler input language is specified by a reductive grsmmar.
What I am getting at is the leadin should be less specific so it covers all cases. The rules do not have to be specified as productive or of any specific nature. And then present the linguest productive point of view.
The fact that reductive is from 50 years ago doesn't make it unnecessary to explain. We have META II and TREE-META that in their documents talk about reductive grammars specifying a formal language.
The point of this section "Please feel free to list any potentially bothersome/burdensome/opaque terminology clarification requests here as the article stands, in order that editors not over-clarify the terms."
And the first thing is "I have renamed this page to make it 100% clear that it is about the notion of grammar known in formal language theory. Rp (talk)" Steamerandy (talk) 23:51, 30 January 2015 (UTC)
- You need to turn your phone sideways in order to read detailed indentation.....
- As for content, you have not stated a valid definition of a reductive language. It seems that what you call a reductive language is related to restrictions on the parsing algorithm. In terms of productions, your language-fragment could be described by the following rules:
block = "{" statement $statement "}" $statement → statement $statement $statement → Λ statement → block statement → asign_statement statement → if_statement statement → ...
- (Here, $statement and, later, +statement, are different nonterminals.)
(I don't know where the semi-colon is supposed to go; whether it's a terminal symbol, an "end-of-BNF-statement" symbol, or just an excess symbol. I chose the latter. ) However, I suspect the following would be easier to parse:
block = "{" +statement "};" +statement → statement +statement +statement → statement ...
- Still a context-free grammar, but I don't know what additional restrictions you want to place on it. — Arthur Rubin (talk) 09:17, 31 January 2015 (UTC)
- What is so hard to understand about the McGraw Hill reductive grammar defination. I think you are trying to put it into a sublevel under production grammar. It is a top level equally valid grammar. Equal to production grammar.
- It is most simple to understand the most basic diffarance:
- A production grammar is a set of rules that produce valid strings in the language.
- A reductive grammar is a set of rules that determine if a string is valid in the language.
- There is no chance of invalid strings in a production grammar. Invalid strings do not exist. Are not a concern in productive grammars. That makes tham of great use in linguest language studies. They are pure language descriptions. Production rules are rewrite rules combining strings produced by rules. They put character together to form words. Words are put together forming phrases etc etc.
- Reduction grammars rocogize strings of a language. The take as input a string and reduce it into its parts. A string can be recognized as being invalid.
- A production rule by its defination can not recognize any string. It can not reject an invalid string. It's input is not strings. It output strings.
- An example from CWIC (Compiler for Writing and Implementing Compilers) developed in the late 60's by Systems Development Corporation. This is an example of backtracking error reporting and recovery.
statement = (type1/type2/type3/type4)\ERRORX["Syntax error"] $(-";" .ANY) ";";
- In the above statement rule we have it looking for 4 possible types of statements. The / alternative operator separates the statement types. The \ is a backtracking alternative that allows partial or erroneous recognition. The state is reset to the state of entry to statement rule and the backtracking alternative is followed where ERRORX outputs an error and then recovery is attempted by repeatedly matching any character other than a ; $(-";" .ANY). zero or more match not ; match any character. A loop matching any character except ;. Once a ; is matched the loop ends.Steamerandy (talk) 23:34, 1 February 2015 (UTC)
- The McGraw-Hill definition would apply to any decidable language, which is probably not what was intended. What you are using is the same as a (productive) context-free language, as far as I can tell, with the addition of a preferred order of operation. (Almost) any context-free language can be parsed in that manner. (I say "almost" because I don't have a proof, and it's not obvious if there are productions producing an empty string. It's obvious for any CFG which is cycle-free and has no ε-productions.) — Arthur Rubin (talk) 04:29, 2 February 2015 (UTC)
- The $(-";" .ANY) clause is interesting. Production rules lack this. Is it a lookahead definition? ("At this point, the input may start with anything except a semicolon.") That would make it close to an LL(1), grammar but not quite the same. Or is it an exclusion for the whole clause? ("The contents of what is parsed here may contain anything except a semicolon.") Conversion to equivalent production rules is still straightforward, in that case, as long as only specifications of characters or tokens can appear there. Rp (talk) 15:18, 3 February 2015 (UTC)
- The - is a negative look ahead operator. -";" matched any character except a ;. In this case the ; is a statement terminator. The look ahead operators do not advance the input. The ".ANY" matches any character. It does fail at the end of file. So that expression sequence is skiping to the end of a statement. Hopefully also over the error. The ; at the end must then be matched. A ?';' would be successful if the next character is a ;. The input would not advance in any case and the ; would remain the first character of the input. The look ahead operators can be used with an expression. So a complex expression can be tested for without consuming it. I do not see how an equilivent production could replace the whole of what that is a part. The $(-";" .ANY) is skipping over text that is not defined in the language. It is just trying to get to a point were the compiler can continue. It is ignoring gibberish it doesn't umderstand. A production grammar doesn't, by defination, produce such strings. It only may generates valid strings of the language.
- The thing is, these language looks like grammar rules but they are a programming language. But being human the authors had the compulsion to classify it. They classed it as a reductive grammar language.
- The problem is not figuring out its type. That was done by the original authors 50 years ago. But it seams you linguest types no longer recognize reductive grammar as a valid grammar type. These compilers were used in government projects at least up to the mid 80s.
- CWIC had character class and token rules. These languages analyze a stream of character. Words are recognized by these metacompilers. The early versions used built in rules .ID, .NUM, and .STR or equilivants. CWIC had token rules. Simular to syntax rules. In token rules white space is skipped over on entering a token rule. Tokens matched by a token rule are automatically entered into the dictionary unless a conversion function is called. There are numeric and string conversion functions that prevent a symbol creation instead converting the parsed token.
- I implemented a metacompiler that was working in the early 70's. It was a full version of the CWIC system excluding MOL360. It was done on a DEC-System-10 at Cerritos Collage Norwalk Calif. The code generation was changed. CWIC used named sections that were basically blocks of memory and a pointer. Outputting code is called planting. CWIC being a IBM360 program planted 16 bit words. It directly generated 360 code from the generator languages. I drasticly changed that. I wanted a machine independent strategy. So I decided to generate PSEUDO code. My thinking was that pseudo instructions specifically tailored for the language being compiled made the most sense. But then the pseudo code had to get translated to binary code of some real machine. There we're several addressable memory unit sizes. Borrowers and Honeywell made 48 bit word machines. The DEC-10 was a 36 bit word machine as we're several IBM computers. The 8 bit machines were becoming common. The IBM 1400 and Honeywell H200 line were 6 bit machines. They had word mark and item mark bits on each character. Not to mention the PDP-8 a 12 bit minicomputer. So I decided to generate code into a bit addressable memory space and reformat it when linking. I made up an instruction defination lamguage. It could define instruction families, single instructions or data pseudo assembly opcodes. MORG aligned the plant to a modulo boundry. MORG 16 aligned to a 16 bit word. Binary output started with a radix code. H for hex, O octal, B binary, etc. These specified the output format to a pseudo assembly listing. A sequence of bit fields followed. (size):value; The value, a general expression was output to a bit field of the size specified. These could be relocatable values. Or cause a polish fix block could be generated if the expression contained undefined values. The PSEUDO instructions were writen in the same lisp 2 dialect used in the generation language. PSEUDO code planted was simply appended to the section code list. Flushing, CWIC terminology, wrote the section memery block in CWIC. In SLIC it executed the pseudo op list. Deleting them in the process. I planed to write an insequence optimizing language that would process a pseudo list when flushing before it execution. It would be able to recognize pseudo sequences or patterns and rearrange and/or replace parts of the recognized pattern. The generator contained an inplemention of lisp 2. A procedural list processing language.
- CWIC was a vary powerful advanced metacompiler. Many optimizations could be accomplished in the procedural list processing generator language.
- I was lucky in attending the ACM SegPlan meeting when CWIC was presented by Erin Book. I guess he recognized my interest. I was interested in compilers and CWIC was easy for me. He spent time answering my questions. Anyway after the meeting he gave me manuals for the syntax and generator languages. And his phone number at SDC.
- In the CWIC papers I have it is not stated that the syntax rules are any type of grammar. The problem has turned up here because TREE-META and META II say they produce compilers for formal languages described by their reductive grammar. CWIC is written in it's self. It is just as well defined as any language in a production grammar. For us programmers the reductive grammar is easy to read and understand. The CWIC syntax language is about the same as TREE-META. And TREE-META is mostly the same language as META II. These syntax recognition languages all have the basic features of META II. TREE generation was added in TREE-META. It had a simple generator lsnguage. CWIC added backtracking and changed the generator language to be a lisp dialect.
- At any rate you can not write a production that does not produce a valid strong else it would not be following the defination of a production grammar rule. By definition only valid string are produced. So an equilivent production to recognize input that is incorrect can not be done. A reductive grammar can recognize anything and determine its validity. There were two earlier metacompilers developed written in lisp. I Think they may have backtracking. But not sure.
- Looking forward to your production rule equilivants to the error recovery.Steamerandy (talk) 03:58, 4 February 2015 (UTC)
- Now that you've explained it more, I do agree that, with features such as lookahead and lookbehind, this is different from sets of production rules, and it is also used quite often in practice, so it deserves separate discussion in Wikipedia. But where? PEGs are essentially an attempt to formalize it, but most applications existed long before PEGs and cannot necessarily be described by means of it. Rp (talk) 23:30, 10 February 2015 (UTC)
- Not sure what look behind means. Are you referring to backtracking?
- I am thinking these shouldn't be classed as a grammar. They really are programming languages. They are Metaprogramming Parsing1/1 Grammars. There we have it MPG's. At any rate they can define any programming language I have ever known.
- You can not define COBOL with a production. grammar. Couldn't get through the identification division. It can contain anything except the next division name. It takes a rule that can recognize anything not the next division name. Like the skip to ';' above. Now I see why the Dragon sayers hate COBOL. It's a COBOL dragon they can not slay. They can not get past the first two words of a COBOL program with formal "productive" grammars.
- The Schorre metalanguage are parser programming languages. Looking at the rules as functions in a programming language explains why left recursion can not be used. As in any language allowing recursion. If there is no conditional to break out of a recursive loop it eventually runs out of stack and shit happens. I an quite aware of recursive algorithms. So have never had a problem with these top down parsing languages. That is a distinction between them and a grammer description on paper. As in any programming language. If the first thing a function does is call itself you have a problem. It's a programming error. Not the languages fault.
- So that being the case how does one explain it. It seams obvious they are programming languages but wouldn't it be personal research to say so?
- When I implemented SLIC I was programming in those languages. I was very aware of how they worked. That is the greatest thing about them. A rule name apearing in a rule is a call to that rule.
- The fact they are programming languages allows avoidance of ambiguities. An assignment statement and a procedure call both start with an identifier
statement = asign_statement | procedure_call; asign_statement = id '=' expression ';' ; procedure_call = id '(' arg_list ')' ';';
- The above has a problem. one could use a backtracking alternate in statement. Or more efficient factor id out:
statement = id ( asign_statement | procedure_call; asign_statement = '=' expression ';' ; procedure_call = '(' arg_list ')' ';';
- Programming gives me the control of using backtracking or factoring. And it documents the parsing algorithm. An optimizing compiler could do the factoring. The above could be done in a single rule:
statement = id ( '=' expression | '(' arg_list ')' )';';
Steamerandy (talk) 04:58, 16 February 2015 (UTC)
- Note formal grammar is referenced from other topics. See parsing for example. So either the parser topic is wrong or formal grammar is. Or COBOL is a fictional language for it can not be parsed according to the parser description here. Sense a parser can only parse a language specified in a formal grammar. Sense COBOL isn't fiction there is a problem. This is the problem I have been ranting about. This topic does not stand alone. It conflicts with others. Something needs to change. From my point of view a reductive or analytical grammar is just as valid in defining a linguage as is a production grammar. In a real language one needs to recognize gibberish to separate it out. To ignore gibberish is necessary. That brings up a question. How is a quoted text string recognized? How is a production grammar able to describe a quoted string?
string .. """" $(-"""" .ANY | """""" ,"""") """";
- The string rule above is CWIC's string rule. a string two " (quote characters) are used to make a simgle quote character. The $ loop accepts any character not a " (-"""" .ANY) character. If a " character is matched the alternative ("""""") would match a pare of (") quotes and the ,"""" puts a single " into the token. If """""" isn't matched the loop.terminates and the single " is then matched. Steamerandy (talk) 15:54, 16 February 2015 (UTC)
Do we need separate topics?
[edit]This topic starts by declaring "In formal language theory...". What about in computer science.
The way we look at things can make their understanding hard or easy. Sense the the 70 we have been slaying the dragon. Well lucky I learned to write compilers before they became dragons. I learned writing them in assembly code.
In general we've learned, or been taught to solve problem by breaking them down into simpler parts. That is exactly what a reductive grammar does. I first learned about them at an ACM SegPlan mating where Erwin Book gave a talk on CWIC. CWIC was a very advanced Metacompiler a distant decedent of META II. I did not know they used a reductive grammar until just reciently.
If you look closely at the original ALGOL 60 BNF language specification you will find it to be reductive in nature. Not productive. To be completely fair. There are aspects of both reduction and production. We find production rules at the lowest level defining symbols and constants.
I am not sure who uses or links to formal grammar most. Linguest topics or computer science topics.
Reductive rules do not disallow left recursive definations. They simply are a top down aproach. Starting off declaring a formal gramar to be production prohibits linking to it from topics that may employee a reductive or a combination of both reductive and productive grammars.
META II and TREE-META are basicly reductive grammar compilers. Metacompilers in general may employee either type.
So we need a neutral formal grammar leading or separate them. They may be old but metacompilers are still of intetest. Steamerandy (talk) 03:39, 27 January 2015 (UTC)
- You're wrong.
- The "reductive" rule
- a -> b | c
- Is identical to the "productions"
- a -> b
- a -> c
- — Arthur Rubin (talk) 22:38, 27 January 2015 (UTC)
- You are wrong in reading what i said. That is exactly the point. And exactly what I wrote?? I said they were the same. Having two production rules with the same name, a, on the left is a type 2, context sensitive language in the Chomsky grammar progression. You said reductive grammars are context free grammars.Steamerandy (talk) 04:14, 28 January 2015 (UTC)
- Arthur Rubin and I believe that your "reductive grammars" are probably a specific subclass of the deterministic context-free grammars, probably the class known in the literature as LL(1), maybe with a different notation. That is to say, not type-2 grammars, but a subclass of type-3 grammars. It would help if you give (or point out) a precise definition of "reductive grammar" so we can determine whether we are mistaken. Rp (talk) 17:15, 28 January 2015 (UTC)
- Reductive grammar: (computer science) A set of syntactic rules for the analysis of strings to determine whether the strings exist in a language."Sci-Tech Dictionary McGraw-Hill Dictionary of Scientific and Technical Terms, 6th edition, published by The McGraw-Hill Companies, Inc". Retrieved 15 January 2015.
- Reductive grammar is defined in the "McGraw-Hill Dictionary of Scientific and Technical Terms". So apparently it is a curent term. Or do you not accept McGraw-Hill as a valid referance?
- They are not confined to any spicific Chomsky type grammar by their defination. They seam to be what is being called analytic grammars. I do not know the early history. In the 60's metacompiler documents refereed to themselves as using a reductive grammar. I suspect the turm may be a bit older.
- I think reductive is simply the opsite of productive. A top down string reducing defination set of rules defining a language. The difference is how you look at the problem.
- They are the way we are first tought grammar in grade school. Like digramming a sentance. We learned to brake a sentance into its parts.
- The rules can be viewed as a functional programming language. Each rule is a function returning true (success) or false (failure). A rule is then a boolean expression. The input to a rule function is the input character stream. The functions have an input argument and produce the identical output for identical input.
- Steamerandy (talk) 22:34, 29 January 2015 (UTC)
- Well, you, Steamerandy, have not provided a definition. The definition in that dictionary could apply to any grammar for which there is a computable function which takes a string, and returns "no" if it is not in the "language", and a specific decomposition (if reductive) or derivation (if productive) of the string, if it is in the the language. I'm sure that's not what anyone would want. I (and Rp) suspect that what you want may be something like a deterministic context-free grammar. If it isn't, you need to make clear what it is you do want, and we (Wikipedians) can see whether it already has an article, or possibly should have an article. — Arthur Rubin (talk) 03:36, 30 January 2015 (UTC)
- If the language is finite, and the grammar is recursively enumerable, that's what we call a decidable language. — Arthur Rubin (talk) 03:40, 30 January 2015 (UTC)
- What do the rules look like? Saying they are rules is not specific enough. Can I have a rule that looks at the time and accepts a string only on Mondays? I don't think so; we suspect that your rules are in fact identical to rules of a productive grammar of a particular form, except that you've never realized this. Rp (talk) 17:18, 30 January 2015 (UTC)
- Can you tell me what productive rules look like? How many forms of productive rules are there? Is BNF productive or reductive? Reductive rules might look the same as productive rules. In any case it's not how the rules look. It is how they are used. But above we descovered one major diffarance: Reductive rules can be written to ignore strings that are not in the language. For example in a programming language "=" may be used as an assignment operator and "==" an equality operator in a conditional expression. So one might type "if a = b then" meaning to of typed "if a == b then" So we have a syntax error. That can be handled completely in a reductive grammar including recovery including finding the start of the next statement.
- Arthur Rubin and I believe that your "reductive grammars" are probably a specific subclass of the deterministic context-free grammars, probably the class known in the literature as LL(1), maybe with a different notation. That is to say, not type-2 grammars, but a subclass of type-3 grammars. It would help if you give (or point out) a precise definition of "reductive grammar" so we can determine whether we are mistaken. Rp (talk) 17:15, 28 January 2015 (UTC)
- You are wrong in reading what i said. That is exactly the point. And exactly what I wrote?? I said they were the same. Having two production rules with the same name, a, on the left is a type 2, context sensitive language in the Chomsky grammar progression. You said reductive grammars are context free grammars.Steamerandy (talk) 04:14, 28 January 2015 (UTC)
- Reductive rules are language description that are the opsite of production rules. They may even look the same. The difference is reduction rules are processing strings is input where production rules are producing strings. I don't know if the turm was used outside the computer field. In the 60's computers and programming was mainly a business education or engineering field. FORTRAN programming was taught as part of the math department curriculum. Assembly and COBOL was part of the business department's curriculum. Computer Science was in its infancy.
- But as I said above the programming metalanguages described as being a reductive grammar are programming languages not grammar rules. The same mathematical manipulations may be applied to reductive rules as to logic rules. Except reordering. Steamerandy (talk) 23:55, 4 February 2015 (UTC)
Reductive grammar
[edit]Formal grammer rules specifying a language can be productive or reductive. It is wrong to describe a "Formal Grammar" as being production rules. Reductive rules are just as valid in defining a formal grammar.
Reductive grammars are a top down set of rules. They are basicly what is being called analytical grammars. Thou the philosophy is different. Reductive being the opsite of productive makes more sense as the name. Reductive grammars describe the reduction of a language into its elements. Reductive grammars directly translate to analytical parsers. Several early ACM SIG-LAN papers talk about reductive grammars. See the TREE-META documents. The first META II paper and PROGRAMMING SYSTEMS & LANGUAGES - Saul Rosen, MCGRAW BOOK COMPANY.
A Formal Grammar is simply rules defining all valid strings of a languages.
This article should not lead off saying that Formal Grammars are production rules describing the generation of valid strings in the language.
Reductive rules define the valid structure of a language by successively reducing it to its basic elemants.
An example of using a reductive grammar is in defining the COOL language. COBOL is made of of spicific ordered parts called divisions.
COBOL= Identification_division Environment_division Data_dividion Procedure_division;
A reductive grammar successively reduces a language into its parts. Defining COBOL, as above, first as the divisions that make up the language is the first reduction. The divisions are ordered. Successive reduction rules would be applied to each division. In a reductive grammar each rule is unique. A rule name may not be over loaded. Each rule reduces the language into sub-parts.
People are calling these analytical grammars. The proper turm used in the 60's was "Reductive Grammer".
A reductive grammar can describe context sensitive grammars.
- Thanks! In the description I found online, the difference between productive and reductive (a.k.a analytic) rules is described on page D-2 (129 of 184 in the PDF). (The term analytic grammar does not occur.) It says productive rules are used to describe how the sentences of a language can be generated, while reductive rules describe how they can be recognized. That distinction is misleading: general methods exist to derive recognizers from arbitrary context-free grammars (sets of production rules), so you don't actually need to write a special set of rules for recognition. The text neglects to say how reductive rules are anything other than a special kind of productive grammar, and continues by discussing top-down parsing, so my guess would be that reductive rules are nothing other than productive grammars whose structure is restricted allow deterministic top-down parsing, i.e. LL(1) grammars or something close to it. While it is unnecessarily limiting, this restriction is indeed imposed in practice, e.g. in the SGML and XML world, where DTDs and XML schemas must be LL(1). I believe the more common name today for what the text calls a metacompiler is a parser generator or a compiler-compiler, and indeed, the more popular ones restrict grammars to be deterministic, although many allow more than LL(1); e.g. yacc requires LALR grammars, packrat parsing requires PEGs. Another possibility is that reductive/analytic grammars are some form of deterministic pushdown automaton.
- In short: my guess is that the concepts discussed in that text are covered in Wikipedia, but under different names.
- According to this analysis, an analytic grammar or reductive grammar is (roughly or exactly) some class of deterministic context-free grammar or pushdown automaton. But is this generally how people use these terms or is it just how it was used in that particular project? I have no idea.
- As to your remark that reductive grammars can describe context-sensitive languages: I believe you mean to say the following. Suppose we have a language with clauses the start of which is marked by some opening keyword. It may be the case that the possible contents of such a clause (and hence, the rules to describe them) are not completely determined by that opening keyword, but instead, also depend on context: where, within the whole, the clause occurs. If this is what you call a context-sensitive language, I believe it is the same thing as a language that is not simple deterministic. Indeed, LL(1) grammars and other classes of deterministic grammars can describe such languages. So unless I am mistaken, this is not the same thing as a context-sensitive language, which is a language that cannot be described at all by the kinds of reductive rules or production rules considered here. Rp (talk) 19:31, 24 January 2015 (UTC)
- COBOL was considered context sensitive by others in the 60. I am not sure what context sensitive ment back then. As I remember it was the picture data discriptors. i.e. 999.99 in the data division is a picture describing a 5 digit number having 2 decimal places. In the procedure division it would be a numeric constant. Lexical analysis is context sensitive in later metacompiler.
- TREE-META and other metacompilers using a reductive grammars is significant because they are programed parsers having code that transforms the immediate recognized elements. Later metacompilers based on META II first transformed input to abstract an syntax tree and at some point calling code generation functions passed the constructed tree to procedural functions written in specialized languages. A reductive grammar allows factoring avoiding ambiguous language constructs. The translation of productive grammars to parsers need look ahead or try many partial ambiguous paths determining the correct language or deciding an error has caused the problem. Parsers are overly complicated because the are using a productive grammar.
- If one first describes a language in a reductive grammar there is no need to transform a productive grammar.
- I was thinking LL(1) to be a parser type. But if it is a grammar then it is not a productive grammar but a reductive grammer as the LL(1) parser is analytical. All parsers are analytical. In the majority of metacompilers which are defined as metacompiler (those actually in their own documentation claiming to be metacompilers) META II, TREE-META, CWIC and others all compiled to executable code and could define them selves in their own language. NOTE they did not clame to compile their self. An interesting distinction. Erwin Book explained it this way: Why would one need to compile in existing compiler. If it is to make an addition or correction then technically it is not compiling it's self. Porting to a different machine is producing a different compiler. CWIC could compile context sensitive languages. The CWIC metacompiler produced IBM 360 machine code. It had programed controled backtricking. Using a / for an alternative (or) opetator and a \ for a backtracking alternative. Backtracking was to the most recent backtrack alternative. Backtrack points could be stacked. Backtracking is a kin to a long jump in C only stacked and a lot of other automated saving and restoration of state. Backtracking occurred when a recognition failed following one that succeed. Specifically one advancing the input. There was no limit on the depth or range of backtracking other then memory available.
X = "string" (R2/R3) \ error["error message"] garbol; R2 = a b / c;
- The above example illistrates the backtracking in CWIC. X first tests for "string" and if successful calls the rule R2. R2 calls a. if a fails c is tried, If c fails a normal fail is returned and R3 is tried.. If a is successful b is called. If a is successful and then b fails a backtrack failure occures to the most recent backtrack alternative in X, the error function. Note the R2 alternative is not tried on a backtracking failure. The backtracking alternative in X resets the compiler state to its entry in this case and tries the alternative. Grouping can be used to contain backtracking. CWIC had look ahead operators ? and - that tried the following. Backtracking was used to restore the state. ? tests for the following succeeding if matched. - the not operator. The error[...] function is a generator function call. These compilers did not produce spicific parsers. They compiled directly to executable code that is deterministic. You are hand coding the parser. These are programming languages. They are both a programming language and a grammar. For example a sequence (a b)\(a c) could be factored to a (b/c) avoiding backtracking, Sense CWIC had a symbol table and could produce code. Backtracking was a complicated operation. CWIC like TREE-META built a tree in its syntax language. Generators in CWIC were the unparse tree crawling rules of TREE-META combined with a LISP 2 procedural language dialect. Tree branchs could be placed in variables by placementof a variable name in an unparse rule. Generators could be called in unparse rules to process positional values in the rule returning results to local variabled. Having written SLIC that was a full implementation of CWIC only Changing code generation in the generator language to produce a pseudo (assembly macro like instruction) code that was later expanded into machine code. A MACHOP defined machine instructions as a series of bit fields. A MACHOP defined an instructions operands and their output to binary code. PSEUDO code generated by the generator language were appended to named section (lists) and executed by a flush section statement in the generator language. Sections were named. date and code could be separated using different named sections. An in sequence optimizer was planed to process pseudo code be for executing it. SLIC was used to write a COBOL cross compiler that ran on a DEC-System-10 producing code for a TI990 minicomputer. DEC-10 code generators produced DEC-10 code for the COBOL compuler. That SLIC compiled COBOL to DEC 36 bit word instructions. THE c COBOL compiler produced TI990 16 bit multi word instructiobs. The COBOL could also be linked we it DEC-10 pseudo and macho to generate DEC-10 code. But as far as I know a DEC-10 COBOL runtime library was never written. CWIC also had token rules that by default entered tokens into a symbol table. The COBOL compiler detected undefined or mismatched types and issued errors. CWIC and SLIC are context sensitive metacompilers. The above metacompilers are written in reductive grammars. The we're not translated from a productive gramer. Note I found reductive grammer defined in the MC-GRAW dictionary of technical terms. 2003 edition. I got a bit carried away there but the jest is that CWIC could handle context sensitive languages. Having a symbol it could make decesions based on a symbols defination. Or not being defined.
- Working for Kontron FutureData as head of the language development group. Having written many commercial compilers many of which compiled their self I do not hold with the definition that a metacompiler is defined by it compiling it's self. That is just plain demeaning. They are so much more. Using curent terminology. They compile a metasyntax program. In their documentation they were said to compile a metalanguage program. That metalanguage is a metasyntax describing a metalangusge. So there is were the prefix meta comes from. A metasyntax is a higher level of abstraction defining a metalanguage. Their metasyntax language could define their own metasyntax. META I was the first meta compiler, hand coded to compile it's self. However it did not produce exactly the same code so the produced compiler which compiles the identical source was called META II. META I was completely defined in its own language. But compiled META II.
- Anyway when I was in collage a formal grammar was simply a set of rules defining a language. It was not just productive grammars. Reductive grammars were not excluded. OK that was in the late 60's. But still there were valid programming languages defined by reductive grammars. I am thinking that reductive should be used instead of analytical sense we have a valid reference for reductive.
- In the 60's and 70's reductive languages were valid formal grammars. I do not have prof that all productive grammars can be converted to reductive forms. But for programming languages I am fairly sure they can. However the CWIC metacompiler can handle most any language. But they are programed parsers. If you do not avoid unnecessary backtracking by factoring they are going to be less efficient. It,s an important feature the separates them from other top down generated parsers. They are more like a LL or LR parser time eise. They can be vary efficient. The COBOL compiler write in SLIC on the average compiled 5 percent more lines per minute then the DEC COBOL compiler written in assembly.
- I should mention that the COBOL compiler was written in less then 5 man months. At that time COBOL compilers were taking man years to develop. And with the only problem being one the DEC compiler also had. And took 10 min to fix.
- The first PASCAL compiler we developed at FutureData took a bit over 4 man years. It produced optimized 8086 code. It was written in PASCAL. There is a whole lot more to COBOL then PASCAL.
- I tend to say a lot. You may think its not relevant. But the thing is there are old Mata compile topics here that use Formal grammar as used back in the 60s. I am not so sure why it need be nerrowed to productive grammars. There are several curent systems using reductive grammars. The thing is the these metacompilers were somehow lost in the Unix grneration. yacc became dominant and our curent compiler technology is now focused on productive grammars. Maybe the old Mata compilers were just to simple. People were writing simple compilers in hours using SLIC. Not weeks. It didn't take a great amount of compiler knowledge to write compilers using SLIC.
- Doing this on a Note 4. I do not like touch screen for typing. Sorry if I haven't fixed all my typos. My hard drive bit the dust. I'm stuck with my phone.
- I'm typing this on a Note 2....
- The difference between what you are calling a "productive grammar" and a context-free grammar is a parsing criterion like LL (1). The BNF (Bacchus? Normal Form) is the same whether the grammar is "productive" (and context-free) or "reductive". — Arthur Rubin (talk) 06:21, 25 January 2015 (UTC)
- I have seen a lot of literature, especially from the 1960s, that confuses grammars and parsers (including Wikipedia, which redirects [[LL(1) grammar] to LL parser and never bothers to properly explain the difference). Our question is not whether all productive grammars can be rewritten to reductive grammars, but whether the reverse is true. Rp (talk) 17:28, 28 January 2015 (UTC)
COBOL
[edit]I challenge the claim that COBOL is not normally represented as a "productive language". — Arthur Rubin (talk) 01:02, 17 February 2015 (UTC)
- Well then produce the grammar rules for COBOL. Prove it.
- I challange you, Arthur Rubin, to define a quoted string constant in a production grammar. It has occurred to me that there is all this clamoring about production rules when really most programming languages are not defined using productive grammars. Why? Well because we do not define the language from its character set in the grammar. It is defined from the token level. A lexer is used to analyze the characters. Usually with regular expressions that are analytical.
- Show me a production rule that can recognize
'isn''t'
as a string constant for isn'tstring .. '''' $(-'''' .any|'''''','''') '''';
The above is a reductive rule. This confuses many as the rule above defines the strings it uses to define itself. It first looks for the string''''
a single quote character by it own definition.''''
matches a single quote character. Then we won't to accept any character until we find a single quote at the end. We might just write:limited_strong .. '''' $.any '''';
But that wouldn't be able to match itself. In fact as above it wouldn't even work. .any would consume characters until it reched the end of input.limited_strong .. '''' $(-'''' .any) '''';
the''''
matches a single ' character. To match two characters we use''''''
two quote characters in a string is used to represent a single character. The .. rules are token rules by the way. This is a metasyntax rule for a string defining itself. There are operators that modify the action. Normally a string constant is just matched and not kept. There are three levels of rules. token rules as here match tokens of the language. They can reference character classes that are defined using character class rules. Syntax rules define the language generally in terms of its tokens. Syntax can use a character classes but it functions as a character constant in in syntax rules. The following is string opetators in token rules. Syntax rules have the same operators that are simular.- - is a nagation opetator.
- -';' matches any character except a semicolon character.
- + recognizes the following and keeps it
- +'*' match the character and it becomes part of the token. Normally quoted strings are not kept.
- ? tests but does not advance the input.
- A look ahead operator.
- , insert opetator.
- ,'xyz' inserts the xyx into the token.
- - is a nagation opetator.
- So now the
(-'''' .any | ''''','''')
can be seen to recognize-''''
not a '. and accept .any character not a '. but when a ' is matched it's negation fails and the alternative| '''''',''''
is atempted. In this case we have two consecutive ' characters that are matched. But matching a string does not make them part of the token. The comma, insert operator, is then used to insert a single character,''''
into the token string. Not shown in the rule is the conversion function. They intercede the normal token rule of processing the token as a symbol making it a dictionary object. A makestr() function would be called that intercede the dictionary processing and instead creates a string object. These metasyntax metalanguages are defined in their own language. I do contend that it is wrong to say they compile their self. It is the defining of their self in their own language that makes then metacompilers. I have used a metacompilers to compile a different usually more advanced metacompilers. Never ever, even one time, have I just run it on itself. Parts of code fragments for debugging and validation.
- We are defining a quoted string bounded by single quote marks like the string ..
'hay hay hay that''s all Arthur'
. So show me a production rule that recognizes two quote marked within a quoted string and translates it to one.Steamerandy (talk) 16:01, 18 February 2015 (UTC)
- I'm choosing to avoid using quoted strings in the definition for the rules; the definition of a "productive language" does not depend on the representation of the symbols of the language in ASCII. Production rules don't indicate the meaning of the string, and deliberately do not indicate the parsing method of the string, but (comments in brackets; bracket is not considered a symbol of the language in these definitions; spaces in the definitions do not indicate the space character, but are using for parsing only; and, to the extent the written rules use quoted strings, I use double-quotes.):
- <quoted-string> → ' <valid-quoted-character-string> '
- <valid-quoted-character-string> → <valid-quoted-character>
- <valid-quoted-character-string> → <valid-quoted-character> <valid-quoted-character-string>
- <valid-quoted-character> → <space>
- <valid-quoted-character> → 0
- .
- . [list of all characters other than "'"]
- .
- <valid-quoted-character> → ''
- This doesn't allow definition of a null string, but I don't think COBOL does. Looks like a "productive" rule to me, although a "dumb" automatic parser might have to backtrack. — Arthur Rubin (talk) 01:45, 19 February 2015 (UTC)
- I'm choosing to avoid using quoted strings in the definition for the rules; the definition of a "productive language" does not depend on the representation of the symbols of the language in ASCII. Production rules don't indicate the meaning of the string, and deliberately do not indicate the parsing method of the string, but (comments in brackets; bracket is not considered a symbol of the language in these definitions; spaces in the definitions do not indicate the space character, but are using for parsing only; and, to the extent the written rules use quoted strings, I use double-quotes.):
- That sure took a lot of production rules. And was not able to handle the escape translation of double characters to one. And did not allow carage return tabs or other non-displaying characters. The rule I gave above without the double quite translation could be written as (using " this time.):
- string .. """" $(""""""|-"""" .any) """";
- The rule is the string rule of CWIC. And a null string is valid.
- But on the other hand your rules using production syntax can be interpreted analytically. And I would venture to say your thinking was analytical when you wrote them. You wrote them top down. You had to define the start symbol. So in reality your production rules are top down analytical rules.
- That sure took a lot of production rules. And was not able to handle the escape translation of double characters to one. And did not allow carage return tabs or other non-displaying characters. The rule I gave above without the double quite translation could be written as (using " this time.):
- One point ASCII was not the character set used in CWIC on an IBM 360.
- So what have we learned here. Productive rules are reductive when defining actual language constructs. The start symbol is a top down rule. So really production/reduction is just a point of view or interpretation is it not padawan.
- Steamerandy (talk) 19:01, 19 February 2015 (UTC)
- What have we learned? There has been no definition presented of a reductive grammar, and the same rules can be productive or reductive depending on context. So what is a "reductive grammar"?
- There are a lot of rules because I have expanded the metarules allowing regular expressions to be converted into (context-free) production rules, and expanded the rules which would otherwise produce a null right hand side, making it a proper context-free grammar. Allowing the null text string could be performed by adding
- <quoted-string> → ' '
- The regular expression in question would be:
- <quoted-string> := ' ([^'] | )* '
- As the quotation marks are not special characters in regular expressions, no escaping is necessary. — Arthur Rubin (talk) 21:31, 19 February 2015 (UTC)
- Reductive rules follow the definition given: analyzes a sequence of characters and determines if it is valid in the language. It operates in the opsite direction. Acording to one paper I found a reductive grammar can describe an unrestricted grammar. It was a translated Russian document.
- You have shown productive to be reductive. At least in the string rule you gave. Wait. Ignoring the → direction, you wrote a reductive set of rules for a quoted string. They were defiantly reduction rules. So a language can be defined by a reductive grammar and it would be just as valid. Seams definition of formal grammar being a productive grammar doesn't hold up any better then one claiming reductive.
- My examples were from a programming language. Written in the metasyntax programming language of CWIC. Steamerandy (talk) 15:46, 20 February 2015 (UTC)
- I sure have been learing more linguistic theory then I ever wonted. How about I give the double quote production (with translation) a try:
<quoted-string> → ' <valid-quoted-character-string> ' <valid-quoted-character-string> → <valid-quoted-character> <valid-quoted-character-string> → <valid-quoted-character> <valid-quoted-character-string> <valid-quoted-character> → <space> ... <valid-quoted-character> → 0 <valid-quoted-character> → ' '' <valid-quoted-character> → '<valid-quoted-character> <valid-quoted-character> '' → <valid-quoted-character>'
- Was it that you don't know or won't to use a context-sensitive grammer that you didnot use trwrite tuls? Thinks for now I know that the CWIC's reductive rule matalanguage can describe context sensitive language constructs. The , output operator gives rewriting rule ability. Steamerandy (talk) 17:45, 20 February 2015 (UTC)
- Sorry my bad. I was thinking reductive. Production rules would be different.
<quoted-string> → ' <valid-quoted-character-string> ' <valid-quoted-character-string> → <valid-quoted-character> <valid-quoted-character-string> → <valid-quoted-character> <valid-quoted-character-string> <valid-quoted-character> → <space> ... <valid-quoted-character> → 0 <valid-quoted-character> → ' ' <valid-quoted-character> → ''<valid-quoted-character> <valid-quoted-character> ' → <valid-quoted-character>''
- Maybe not exactly right. But the idea is that a rewrite of a production is specified on the left containing multipal symbols. If I got the context sensitive example correct.Steamerandy (talk) 19:33, 20 February 2015 (UTC)
- Your "productive" rules are unnecessarily complicated; as a productive grammar, this is context free; it is unnecessary to introduce pretty-much-incorrect context sensitive reduction rules in order to hide the fact that the production rules are context-free. Unless you provide a meaningful definition of "reductive grammar", and indication that such are actually used, it should have at most a couple of sentences in this article. All computer languages that I've ever heard of, the the exception of type declarations, have a context free grammar. (Type declarations, not being near the variables typed, make the language close to an unrestricted lanaguage.)
- It may be the case that CWIC can describe a language which cannot be described by a context free grammar; I wouldn't know. But the rules you've written are (1) not correct (it includes the string containing 5 quotes) and (2) not "reduction rules". — Arthur Rubin (talk) 21:48, 20 February 2015 (UTC)
- Sorry to upset your idea that reductive grammars only describe context free languages. I based the production rules on this example context-sensitive grammar#Examples. I think the rewrite rules work as in that example. But I agree it's not correct. The rewrite rules do not terminate. The rewrite rule would keep generation double quotes and repeat the match on the the 's it gens. I had misunderstood what the notation |a| meant in that grammar document. I took it to mean you could have one rule for a.
- Maybe not exactly right. But the idea is that a rewrite of a production is specified on the left containing multipal symbols. If I got the context sensitive example correct.Steamerandy (talk) 19:33, 20 February 2015 (UTC)
- I think that if instead of looking at problem of generating a quoted string from the character level we look at producing valid sentence in side of quotes. If in a language we quote strings making its contents an object. The string may contain a valid language construct. The qouted string rule must take the character sequence and rewrite it. Now the words generated may of course contain single quote marks but in the context of a quoted string they must be doubled. The contraction for 'is not' is 'isn''t'. So to display the sentence one might write. print 'The contraction for ''is not'' is '' isn''''t''.'
- If nothing else the compactness of the grammar language is a significant point. Almost 3 dozen production rules to only come close to the single reduction rule is significant. There is more to the token rules. Not expressed but is a feature of the rule being a token rule. As a token rule white space is significant. White space is defined by the skip class characters. Token rules automatically skip skip_class characters ahead of the token. Skip class characters are then not ignored or skiped within the rule. Unless explecitly matched by the rule. Being a programming language the rules provide more then recognition of a token. The characters are placed in the token buffer and on successful compleation of the rule the token is converted to an object. A conversion function may be called to make an uncataloged objects. By default they are cataloged in the symbol dictionary.
- I think it is becoming vary vary clear the difference between production and reduction rules. In the case of production a single quote is translated to it's equilivant double quote form. A reductive rule is the reverse translation two quote marks to a single quote mark as it would be stored in memory. If I write a c function that reads a quoted string where two quotes are used within the string to represent one. It is stored in a char array. In memory the bounding quotes are not stored an the double quote is translated to one.Steamerandy (talk) 05:59, 21 February 2015 (UTC)
- The "compactness" in your "reductive" rules is due to shorthand. As I pointed out in specific instances above, regular expression matching can always be handled by adding production rules which preserve the language being context sensitive. The same can be said for whitespace rules. In other words, I see no benefit to using "reduction rules", either in real life, or in your sources. Matching variable uses tod eclaration can't even be done in a context-sensitive language; it requires a more general language class, although possibly could be defined by a fully general reductive grammar. I don't see a way of specifying reductive grammar rules which could handle it, though. — Arthur Rubin (talk) 14:04, 23 February 2015 (UTC)
- I had a misconception of production rules. When I began here. Perhaps others also. I put my experience down in the following. But in the process I found something of importance about BNF. The BNF description in the origional ACM ALGOL reports gives more importance to reductive grammars.
- Steamerandy (talk) 23:38, 25 February 2015 (UTC)
Reductive Grammar
[edit]Historically compiler-compilers have compiled metalanguages that are reductive and/or analytical as stated in their documentation.
These metalanguages are actually programming languages. The function of them is to analyze the input reducing it to tokens and operations. And translating them into a different equilavent representation. It is irrelevant that a production rule is equivalent. These metalanguages are programming languages that read a character stream. It does not produce the character stream. The analysis is programed in the metalanguage of the metacompiler. The metalanguage is compiled in to code that tests for the language component. The rule that analyzes a character sequence looking for a mathematical expression:
- expr = term $(('+':ADD|'-':SUB) term!2);
In this metalanguage the expr grammar rule above is a function that returns boolean success or failure. The first thing it does is call term. If term fails expr returns failure. If turm returns success the rule continues to the zero or more loop:
- $(('+':ADD|'-':SUB) term!2);
The $ operator loops, repeating the following operation or parentheized grouping in this case. A + or - character must be matched. If not the $ zero or more operator intercepts the failure and simply terminates the loop. Which puts the program at the end of the expr rule which is then successful. If a + character is matched an ADD node is created. Or if a - character a SUB node is created. And term is tried. If the call to turm fails a long failure occures. Some previous calling rule must catch a long fail using a backtracking alternative. Otherwise the compiler bails returning a sys-error. If term is successful the !2 operation takes to top two entries off the parse stack, combining them with the poped node making a three element list. The operation node and two branch terms.
The metalanguage grammar analyzes the input reduces it to language elements and recombines them into an abstract syntax tree.
My problem with the lead-off here is that historically these metalanguages are valid language specifications. And here only production grammars are clamed to be valid language specifications. At least in the way the topic leads off.
These metacompilers are of use today. And that's not just my opinion. They are not using production grammars. But this topic is related as the metalanguages are in part grammars. Linking a metacompiler topic to this grammar results in a contradiction.
I realize that curent computer science is enamored with production grammars. But really must programming languages are context free grammars.
It does seam that context sensitive has different meanings in computer science. I do not see variable typing being context sensitive by the linguest defination.
The way the wiki works topics can not be catagorized. One of the reasons for NPV.
I would say that production grammars dominate. But analytical or reductive are of interest. And thus the first paragraph should be written in a NPV. Steamerandy (talk) 10:17, 27 August 2015 (UTC)
- In regard variable declarations, I do not see any way that a language with different variable types, and with different valid operations, could be even context-sensitive. For the rest of this section, you are not discussing a "formal grammar"; you are talking about a compiler. The set of strings recognized by a compiler should be a "formal grammar", but8 a "formal grammar" does not deal with interpretation. Perhaps a completely different article on compilation would be appropriate. — Arthur Rubin (talk) 14:28, 28 August 2015 (UTC)
- This has been a learning process. But I believe reductive grammars used in early meta compilers are in a unique class. I found in reading about Parsing Expressing Grammars were they differ. The grammars I was conserved about are programming languages with semantic specifications. There is only one rule for a nontermonal symbol. Alternatives are separated by a |. Which was previously stated as no different then multiple rules NOT SO. Alternatives are tried in the order specifed. That one specification describes an analytical process. The accept the first match. Not random or longest. the first match. Of course left recursion is disalowed. They are a programming language. One can not expect a recursive loop with no (conditional) way of terminating the loop. Steamerandy (talk) 03:05, 10 December 2015 (UTC)
- What you describe is grammars as used by meta compilers (described here in Wikipedia as compiler-compilers. Such grammars, being developed for a specific meta compiler, assume a specific parsing technique, namely the one supported by the meta compiler in question. However, different meta compilers use different parsing techniques with different expressive power. What you describe seems a lot like the packrat parsing done on PEGs, but other meta compilers use parsing with backtracking or with techniques that explore multiple alternatives at a time. But in any case, in this context, it doesn't make much sense to distinguish between grammars and the way they are parsed. By contrast, formal language theory tries to make a strict separation between grammars and parsing techniques. Some classes of grammars strongly suggest a particular way of parsing, so in that sense you could classify them as being analytic or reductive, but formally speaking they are still defined as productive: as producing strings from the initial nonterminal. For the PEG-like grammars/parsing you describe, with ordered alternatives, that isn't really the case, but still when desscribing them, a distinction is made between the grammars themselves, as a notation that specifies languages, and the way these grammars are parsed. Rp (talk) 21:06, 29 December 2015 (UTC)
- This has been a learning process. But I believe reductive grammars used in early meta compilers are in a unique class. I found in reading about Parsing Expressing Grammars were they differ. The grammars I was conserved about are programming languages with semantic specifications. There is only one rule for a nontermonal symbol. Alternatives are separated by a |. Which was previously stated as no different then multiple rules NOT SO. Alternatives are tried in the order specifed. That one specification describes an analytical process. The accept the first match. Not random or longest. the first match. Of course left recursion is disalowed. They are a programming language. One can not expect a recursive loop with no (conditional) way of terminating the loop. Steamerandy (talk) 03:05, 10 December 2015 (UTC)
Question: Can we have two formal grammar topics? Separate Linguistic's and Computer Science Formal Grammar topics?
I have ran across Computer Science compiler topics saying that a programming language is specified by a formal grammar.
Metacompiler and compiler compiler have became a tangled up mess as to what they are. When yacc came along it corrupted what a compiler compiler is. There were parser generators before yacc. Most were part of complete compiler compiler system. The Saul Rosen book PROGRAMMING SYSTEMS AND LANGUAGES published in 1967 documents several compiler compiler systems. The book is a collection of early Computer Science papers.
It is in the old metacompiler manuals and papers that you find thir parser programming language described as reductive or analytical. I think a reductive grammar is one that specifies a longuage top down. They are specific to computer science. Like mathematical reductive problem solving methods that break down a complex problem into smaller and smaller parts to find a solution a reductive Grammer brakes down a language into smaller and smaller parts. Reductive problem solving involves analytical thought in breaking down a problem or a language. The parts are not randomly divided. Each sub part has it own soloution.
In the case of a programming language we might have declarations, functions, procedurrs, statements. specific types of statements, expressions, variables, numbers, strings, and characters.
A Chomsky grammar is like a search and replace:
<search pattern> <replacement string>
A context free Grammer has a single symbol <search pattern>. A context sensitive Grammer multiple symbols in the search pattern. A reductive Grammer is a completely different form:
<language sub part descriptive name> <is defined as operator> <pattern>
if_expr = "if" condition "then" expr 'else' expr;
One can say that is a production rule. But if we include transtorm operators:
if_expr = "if" condition "then" expr :IF!2 'else' expr:ELSE!2;
Now it is a parser programming rule producing a tree:
ELSE / \ / \ IF expr / \ / \ condition expr
The advanced metalanguage CWIC look ahead operators in their patterns.
$(-';' .ANY)
The $ sequence operator loops matching .ANY character until a ; is matched.
I would not call these parser generator languages or grammars. The parser is programed in these languaged.
program = $((declaration|".end" .BREAK)\ERRORX("syntax error") $(-';' .ANY) ';' ) ;
A program is a sequence of declaration followed by .end. The \ is a backtracking alternative. If a partial recognition of declaration occurs and then failure occurs the sequence resets to the point it started and the backtrack alternative taken. If there is no recognition the normal |".end" .BREAK alternative is taken. If the ".end" isn't matched failure again results in the error path. If ",end' is matched .BREAK terminates the loop and program returns success.
The parser is programed in the grammar language. There is no need to write a production grammar when it can be directly programed.
Look ahead operators:
'-' pattern not <pattern> '~' char match any character except char '?' pattern look ahead pattern not consumd.
pattern = char | string | '(' expr ')'; char .. .any MAKESTR(); string .. '"' $(~'"' .any|-"""""" ,"""") '"' MAKESTR();
A pattern may be a char or string or a parenthesized parsr expr. In the '-' pattern test the pattern is not consumed in any case. The result, success or failure, reversed. A successful parse fails.
A char is a single character bounded by single quote marks.
A string is any sequence of characters bounded by double quote marks. Except two double quote marks together within the a string is replaced by one double quote mark. """""",""""
The .. rule is a token rule. Token rules default to cataloging dictionary (Symbol table) entries. The MAKESTR function intercedes making a string object.
META II is extremely simple compared to CWIC. But the basic parsing concept is the same. META II compiled to an inturpitive stack machine. Compilers written in META II compile to a stack machine intupeters specific to the language. A subset of ALGOL was written in META II. It ran on a stack machine inturpiter written specifically for it. META II produced a parser exactly as coded in the META II metalanguage. It was limited to producing stack machine code as it output object assembly code directly out of the syntax rules.
EX3 = .ID .OUT('LD ' *)/'(' EX1 ')'; EX2 = EX3 $('*' EX3 .OUT('MLT')); EX1 = EX2 $('+' EX2 .OUT('ADD'));
The META II example above is from the original META II UCLA paper by D. Val Schorre. .ID .NUMBER and .STRING were built in recognizers. They followed ALGOL token rules. When an identifier is matched by .ID in rule EX3 a string 'LD ' and the last parsed entity recognized by .ID is written to the output assembly file by .OUT('LD ' *). .OUT a built in function writes to the output file. * writes the last parsed token (.ID, .NUMBER, OR .STRING).
The input sequence:
A + B * (C + D)
Would output:
Assembly
Output |
tack | Rule | Operation |
LD A | A | EX3 | push A |
LD B | B,A | EX3 | push B |
LD C | C,B,A | EX3 | push C |
LD D | D,C,B,A | EX3 | push D |
ADD | C+D,B,A | EX1 | Top stack entry D poped and added to next entry D |
MPY | B*(C+D),A | EX2 | Multiply top two stack entries |
ADD | A+B*(C + D) | EX1 | Add top two stack entries |
The example is greatly simplified for explanatin. The EX1 and EX2 rules can be expanded to handle subtraction and division:
EX3 = .ID .OUT('LD ' *)/'(' EX1 ')'; EX2 = EX3 $('*' EX3 .OUT('MLT')/'/' EX3 .OUT('DIV')); EX1 = EX2 $('+' EX2 .OUT('ADD')/'-' EX2 .OUT('SUB'));
The syntax sweetness of grouping and alternative operators allows us to avoid backtracking.
These are programming languages that are grammars. Not grammars translated it to some specific type of parser algorithm.
The are basicly a recursive decent parser. However you can not implement the backtracking in c or c++. Unless you go outside the language. i.e. using assembly. Backtracking failure fails any number of stacked rules. Rules are functions that are call other rules and return success or failure. Most every thing is a test that evaluates to success or failure. A rule is a test. A literal string is a test. A grouped sequence of tests is itself a test.
X = A (B C D) E;
X is a rule. A test function returning success or failure. A B C D and E are test rule calls. (B C D) is a group test.
If in any sequence of tests the first test is successful and a subsequent test fails a backtrack failure results.
In the X rule if A is successful And B or E fails a long backtrack failure results from X. If C or D fails a long fail results from within the group.
X = A (B C | D) \ E;
Now if A, C, or D fails backtrack occurs and E is tried. If B fails D is tried. Assuming that A, B C and D fail nonbacktracking. If any sub goals within A B C or D backtrack fail or any rules they call fail, control is returned to E. A backtrack alternative causes a backtrack stack frame to be created before it's first alternative is atempted. Generally backtracking is used to recover and report syntax errors.If there no backtrack alternative a backtrack failure can still occure and would result in a compile failure being reported. A special backtrack stack frame is on the call stack when to first driving is called. It however has no saved state or restore any. It catches an uncounted for backtrack.
With the compiler compiler there is no costly backtracking or look ahead unless programed. Backtracking is programmer controled. A CWIC inturpiter illiterates backtracking error handling.
.SYNTAX PR0GRAM = $(ID ('=' DECLARATI0N | ':=' ST) | '.END' .FINI) \ ERRORX['???'] GARBOL); DECLARATI0N = $((NUM ';':EQU!2 DECL[*1] .FINI | ERRORX['Numeric expected'] GARBOL) \ ERRORX['; assumed'] NUM:EQU!2 DECL[*1] .FINI ); ST = $((EXP ';' :STORE!2 C0MPILE[ *1] | ERRORX['Syntax error'] GARBOL) \ ERRORX['missing ; assumed'] EXP:STORE!2 COMPILE[*1] .FINI ); GARBOL = $(-';' .ANY) ';'; EXP = TERM $(('+':ADD/'-':SUB ) TERM !2); TERM = FACTOR $(('*':MPY/'/':DIV FACTOR !2); FACTOR = ID / '(' EXP ')' /NUM; LET: 'A'/ 'B'/ 'C'/ 'D'/ 'E'/ 'F'/ 'G'/ 'H'/ 'I'/ 'J'/ 'K'/ 'L'/ 'M'/ 'N'/ 'O'/ 'P'/ 'Q'/ 'R'/ 'S'/ 'T'/ 'U'/ 'V'/ 'W'/ 'X'/ 'Y'/ 'Z'; DGT: '0'/ '1'/ '2'/ '3'/ '4'/ '5'/ '6'/ '7'/ '8'/ '9'; ALPHNUM: LET / DGT; ID .. LET $ALPHNUM; NUM .. DGT $DGT MAKENUM[]; .F1N15H .STOP SETUP PROGRAM
What kind grammar is being used?
None of this was translated from a production grammar. It was coded directly as any program would.
The problem is we have topics saying a language is defined by a formal grammar. So how can a language defined by by a grammer that is not a formal grammar. That is the basic problem Anyway the is the proble.Steamerandy (talk) 13:48, 15 January 2016 (UTC)
Grammar rules explained
[edit]I, maybe like others, confused grammar rules with mathematical engineering rules. Such as Ohms law.
E = I•R
I already had this preconceived notation of a rule.
<name> <is defined as> <expression>
Above is a BNF like <expression>. Angle bracketed terms like <name> and <expression> above express symbolicly a language element or a class. Class is the descriptive name given in the origional ALGOL working reports. A Chomsky grammar production rule is a different anamal. The left is a pattern to be replaced by the right side.
<pattetn><replace by op><replacement string>
The general form has a pattern on the left side to be matched against an existing string. That string is then replaced by its right side.
The <pattern> may contain terminals and non-terminals. The fact that the left is a pattern is not well explained up front in Chomsky linguistics papers I read. So if one does not look further that point is not apparent. Anyway it wasn't to me. Not until I looked up an example of Chomsky context sensitive grammars and found an example pattern matching rewrite rule. The explanation of the start symbol just added to the confusion. The start symbol is not the rule as is usually shown when explaining it. But the initial state of the string to be generated. We are not starting with an empty string but one that has some symbol that primes the pump so to speak.
In a production grammar we have a set of unordered rewrite rules. Each rule consisting of a pattern matched against the existing string and a replacement string.
<pattern>→<replacement string>
Replacement rules are then applied repeatedly to the string. A production rule is much like a search and replace command in a text editor. Especially one that allows regular expressions in the search string.
In the process of writing this I pulled out one of my oldest books that describes the original BNF metalanguage from ALGOL 58 and 60 working reports. I found out something interesting about BNF. Maybe I wasn't so confused. The BNF history shows that BNF was not intended to be a production grammar. Who changed that and when was it changed? When did the Chomskynite sneak in. Anyway see the early BNF history.
Reductive grammars are not rewrite rules. They instead define the language constructs or the formations forming a language. English can be confusing having different meanings for the same word. In the BNF context formation is a pattern, an aircraft flight formation or the formation of a building complex. BNF language formation rules, like marching formations or air craft formations, have names. BNF called the formation a class and delimited a class name in <>. Formation in the BNF context is a language pattern description.
BNF rules were designed to recognize contest free grammars. But I think because they are formula pattern recognizing rules the language recognizing power has been confused. Reductive rules are not limited by the single class name. Left side is not a pattern but a class name defined by the right side expressing Their grammar recognizing power is determined by the pattern's expression matching operators.
It was said that McGraw-Hill Technical Term dictionary did not explain a reductive grammar.
Well how about the ACM ALGOL 58 and 60 reports. They are milestone in computer programming language development. Steamerandy (talk) 23:37, 25 February 2015 (UTC)
BNF History
[edit]BNF was first used in the ALGOL 58 and ALGOL 60 reports In ALGOL 58 report is the definition of Backus Normal Form. The following is from a book from the McGraw-Hill Computer science series:
- PROGRAMMING SYSTEMS & LANGUAGES
- COMPUTER
- SCIENCE
- SERIES
- SCIENCE
- COMPUTER
- SAUL ROSEN Professor of Mathematics and computer Science Purdue University Copyright 1967. LCCCN 66-29754
In the this book the ALGOL 58 report is one chapters. Another chapter goes over the ALGOL 60 report. And contains the full ALGOL 60 report. BNF is explained in this book as originaly defined by John Backus.
The ALGOL 60 report states: syntax of a language is a set of rules by means of which it is possible to determine wether any given string is a valid string in the language.
All right then. From the book:
- The metasymbols used in BNF are ::= , | , < , >. There are only four symbols defined. The , and . are part of the metalanguage, English, in which we are describing the Backus Normal Form.
- In BNF we write:
- <digit> ::= 0|1|2|3|4|5|6|7|8|9
- The metasymbols < > are used as delimiters to enclose the name of a class. The metasymbols ::= may be read as "is defined as" or "consists of". The | is read as "or". The above phrase defines a class <digit>.
They are defined as formation rules defining the formal structure or arrangement of language elements. The definition of formation as used is nerrowied down by how the ::= operator's meaning "is defined as" or "consists of" above. Following is the meaning of formation as used above.
- Formation, a structure or arrangement of something.
- "a cloud formation"
- synonyms: configuration, arrangement, pattern, array, alignment, positioning, disposition, order
- "the aircraft were flying in tight formation"
This old BNF description is the opposite description as given here in the wiki. It describes a reductive, or analytical, rule. Defining a pattern that can be tested. They had no problem with a left recursive definition in BNF. So sorry that I went to collage in the 60s and learned compiler terminology from pre-dragon age books and ACM SegPlan meetings.
The ACM ALGOL report explains that complex rules are formed of lesser complex classes. Formations of formations. They are top down specifications of the language.
This is the opsite of Chomsky production rules.
Now after checking my old collage books I find that BNF was origionally specified by Backus and Naur as a reductive grammar in the ALGOL 58 report.
Steamerandy (talk) 23:37, 25 February 2015 (UTC)
- You may be right as to history, but, with that definition, any BNF description-set can be mapped to a "context-free" set of production rules, and any "context-free" set of production rules can be mapped to a BNF description-set. That "complex rules are formed of lesser complex classes" is not well-defined; if it were, it could be coded in terms of requiring non-terminals on the RHS of production rules to be of lower rank than the non-terminal on the LHS; however, that would forbid the definition of an expression in C or in FORTRAN, so it seems unlikely. I don't think that
- <base> ::= <variable> | (<expression>)
- <term> ::= <base> | <term> * <base> | <term> / <base>
- <expression> ::= <term> | <expression> + <term> | <expression> - <term>
- encoding not only valid expressions, but order of precedence of operators, would fit that definition of reductive. — Arthur Rubin (talk) 06:37, 26 February 2015 (UTC)
- I described BNF as described in the ALGOL reports. But in usage rules were recursive. So I took lesser as describing top down rules.
<arithmetic expression> from the ALGOL 60 report: <arithmetic expression> ::= <simple arithmetic expression>| <if clause><simple arithmetic expression> else <arithmetic expression> <if clause>::=if <Boolean expression> then <simple arithmetic expression> ::= <term>| <adding operator><term>|<simple arithmetic expression> < adding operator><term> <term>::=<factor>|<term><multiplying operator><factor> <factor> ::= <primary>|<factor>^<primary> <primary> ::= <unsigned number>|<variable>| <function designator>|(<arithmetic expression>) <multiplying operator> ::= ×|/|÷ <adding operator> ::= +|-
- The problem is with the English metalanguage descriptions. The word formation may be an action or pattern description. Or maybe both as are rock formations. Programming languages need precise defination as to their meaning. As used in the ALGOL report: Left recursion was only self referancing in the ALGOL BNF specifications. Nested unterminating recursion was not present. It seams the ALGOL BNF was written with parser implementations in mind. Left self referancing recursion is easy to translate into a loop.
- The defination of production rules is no more specific as to how rules are written. I was confused for a long time in thinking of the left side a rule as an identifier. Or a <class> is in the origion of my understanding rules. That book and the ALGOL reports is were I first learned BNF grammar rules for programming languages. BNF was ment to describe context free languages. It was stated that BNF could not be used to define COBOL due to its context sensitive elements. BNF was not ment to describe context sensitive languages.
- It doesn't matter that you can translate a Context Free Reductive Gramar to a Context Free Productive Grammar. The fact remains that the milestone language ALGOL was specified by a reductive grammar. ALGOL is the grandparent of all block structured languages. That includes PASCAL, C, C++, JAVA and others.
- The ALGOL report's BNF descreption came from its originators John Backus and Peter Naur. It is made quite clear that a BNF rule is a named class describing a language formation (pattern of a language construct). And as you say they can be translated to equilivant production rules making reductive rules an equilivant valid way to define a fromal language. Just because we do not have documented context sensitive or unrestricted reductive grammars does not mean they can not exist. History is clear that the formal ALGOL programming language was defined by reductive (analytical) BNF grammar rules.
- The thing that makes reductive, analytical, grammars of interest is the Matacompilers that implement them as programming languages they also include transformation operators. You can translate CFG production rules to a parser. But CFG reduction rules directly translates to a programed recognizer giving the programmer control over the parsing. Transformations can be included in the grammar rules. CWIC included parse tree, list construction operators and procedure calls to other programming sub-languages. A generator function could return success or failure giving CWIC the ability to do procedural analysis of the tree and it elemental objects. Token attributes examination, type checking etc.
expression = term $(("+":ADD / "-":SUB) term!2); term = factor $(("*":MPY / "/":DIV)factor!2); factor = id / number / "(" expression ")"; assignment = id "=" expression ";" :ASSIGN!2 gen[*1] \ error["syntax error"];
- The above recognizes terms of an arithmetic expressions. And builds what CWIC called a parse tree. To day it is closer to an abstract parse tree. Lists could also be created. The : operator specifies a tree node and the ! operator puts the tree to gether. You can check out TREE-META tree building operators. It did not have the same list constructors. Its node operator is the :. The tree building operator was [ ] instead of !. [2] is equilivant to CWIC's !2 operator. CWIC pops the top node stack and top 2 parse stack entries combining them to form a tree and push it on the parse stack. That can not be translated into a production grammar or can a production grammer be embellished with transforming tree or code production. The full assignment statement expressing mathematical operator preceedance, building an abstract tree, and passing it to a generator gen[*1] is shown. *. represent the top parse stack entry that is poped and passed to the code generator gen. The \ is the CWIC backtracking alternative operator. Backtracking occurs when some part of a sequence is recognized. When any part of a sequence other then the first fails. Wether is right to classify such a metalanguage as a grammer is a consern. They are programming languages used to write language parsing translators. But they can be looked as a language specification. *<number> was used to referancing a parse stack entry in TREEMETA. I think it was used to referancing a relative parsed enity in META II.
- The metalanguages could easily be striped of their transform constructs leaving a pure reduction grammar specification of the language.
- Steamerandy (talk) 02:38, 27 February 2015 (UTC)
- I don't understand you. What you are now calling a "reductive grammar" is quite different from what you were calling it before, and the fact that any reductive grammar can be mapped to a productive grammar and any productive grammar can be mapped to a reductive grammar, recognizing the same strings and having essentially[note 1] the same parse tree(s), means that the difference is philosophical, rather than falling into the realm of computer science or of language theory. The historical differences, which I am not really qualified to comment on, may make interesting reading in a subarticle, or isolated section of this article, but probably should not be in the mainline of discussion.
- As an aside, I don't see how typed languages with type declarations can be recognized by a BNF or reasonably coherent set of production rules. The best I can think of is something like, (using pattern recognition symbolism; I'm not sure this even ends up being context sensitive) (where A, B, C, D, etc. indicate strings of terminals with certain characteristics, and X represents an arbitrary symbol (or sometimes metasymbol), including non-terminals); parenthesis are metsymbols, and braces indicate optional
- <type declaration> → (typename:A) <sp> (varname:B); {<typebegin> A:B <typeend>}
- <typebegin> C <typeend> X → X <typebegin> C <typeend>
- <typebegin> A:B <typeend> <typeplaceholder> A <typeplaceholderend> → B {<typebegin> A:B <typeend>}
- (in addition to the rule implied by the preceding)
- <typebegin> A:B <typeend> <typeplaceholder> C <typeplaceholderend> → <typeplaceholder> C <typeplaceholderend> <typebegin> A:B <typeend>
- , whether or not C is A
- (in addition to the rule implied by the preceding)
- I don't see how you can do it more simply using reduction rules, and I'm not sure this can be simulated by even a context-sensitive grammar.
- I'm starting to reach the conclusion that the article you want to write, about grammars describing computer languages, shouldn't be in this article, although it seems to be a legitimate article. — Arthur Rubin (talk) 19:28, 27 February 2015 (UTC)
- It seams you agree that reduction and production are basicly equilevent. At least as a specification of a CFG. That is the point I have been trying to make.
- The lead paragraph of Formal Gremmar should be a nutrial POV. Reductive grammars are equilivent. Therefore just as valid a language defination. I put up in my personal space a description of the more advanced Schorre metacompilers. Schorre metacompilers are specified in their own reductive grammar. The reason I am conserned is that the grammar link doesn't apply to compillers And formal grammar doesn't work for metacompilers using reductiver grammar metalangusges. The metacompiler topic linked to formal grammar saying they used a grammar. They are reductive grammars. We do not have a nurtral grommar topic. This artical is it. We have META II, TREE-META, and Metacompiler topics that use reductive grammars. And now I have that BNF is actually grammar. It has a rule
- <class identifier> ::= <pattern>
- syntax. I used <class identifier> there to make the point clear. It's clear now how these grammar are different. Sense the left side is an identifier (not a pattern) it is the pattern that needs to change to recognize context sensitive languages. There is no reasion that <pattern> syntax can be defined that can recognize any language class. A reductive grammar rule recognizing the classic CFG example:
- The lead paragraph of Formal Gremmar should be a nutrial POV. Reductive grammars are equilivent. Therefore just as valid a language defination. I put up in my personal space a description of the more advanced Schorre metacompilers. Schorre metacompilers are specified in their own reductive grammar. The reason I am conserned is that the grammar link doesn't apply to compillers And formal grammar doesn't work for metacompilers using reductiver grammar metalangusges. The metacompiler topic linked to formal grammar saying they used a grammar. They are reductive grammars. We do not have a nurtral grommar topic. This artical is it. We have META II, TREE-META, and Metacompiler topics that use reductive grammars. And now I have that BNF is actually grammar. It has a rule
- CFGE = +[$a]+ +[$b]+ +[$c]+
- {length(*1)==length(*2) && length(*1)==length(*3)};
- CFGE = +[$a]+ +[$b]+ +[$c]+
- The +[...]+ is make list operation. So the above rule recognizes any number of a and gethers them into a list. likewise for b and c. The top three *stack entries are the lists. The {} brace bounds actionso thar test the list lengths are equal. Failure of the action tests would cause the rule to fail. The actions are actually a part of the META II language that were kept generally for attribute seting and testing. They were also used for debuging some times. I found them in my old SLIC manual. I can not say if in CWIC. I added a few features to SLIC. In the above I used C++ boolean and compare operators for simplicity. CWIC and SLIC used .AND. .EQUAL. I just prefer to use C++ now. CWIC was implemented on an IBM 360 and was restricted to a keypunch character set. SLIC implemented on a DEC-System-10 used the ASCII character set.
- Steamerandy (talk) 18:09, 19 March 2015 (UTC)
- Sorry I was wrong grammar doesn't link to formal grammar. But still there's the problem. There is no nutrial computer science grammar topic. It is the topic referancing links that is.my consern. There are, I believe, both types of grammars used by metacompilers and compiler construction kets etc. That is a great power of computer based referancing systems. Linking to explanation of term used. But not starting with a.nurtral.POV is a disservice to the reader that doesn't know the subjects.
- Steamerandy (talk) 18:27, 19 March 2015 (UTC)
My summary and conclusion of the above
[edit]I think the sheer amount of text above makes it unproductive to add further comments to individual paragraphs. It is becoming a big mess with lots of redundant and repeating material. I don't want to add to that any further.
Hence, this attempt to continue by presenting my views and suggesting what to do next.
First, a personal note: I find it heartwarming to see so much passion displayed for this technology. I share some of it, having done my master's thesis in computer science on essentially the same technology - not META II or CWIC, but later software based on the same principles,
As far as I'm concerned, the discussion is not on whether to cover this technology in Wikipedia; the discussion is on how to cover it and where, and how to avoid describing the same things in different places by different names.
Steamerandy, I fully agree with you that the reductive grammars you describe are the technology that parser generators and similar pieces of software actually use when describing and recognizing languages; the generative grammars currently described in this article do not correspond with this technology.
However, your idea on the relationship between generative and reductive grammars is different from how I, Arthur Rubin and standard compiler technology textbooks see it. I would prefer for Wikipedia articles to follow the standard presentation where possible.
So what is this difference? According to you, Chomsky's generative grammars and the reductive grammars used by the parser generators you have seen are two different kinds of things; you grant that there is overlap in what they do, but they are fundamentally different in nature. This is because Chomsky grammars take the grammar's start symbol and generate the strings of a language from it, while reductive grammars reduce the strings of a language until the start symbol remains. This is, in fact, exactly how the textbooks on formal language theory (the mathematical foundation behind parsing technology) describe it: except that instead of reductive grammars, they introduce automata as the means to reduce strings of languages, and they don't call it reduction but recognition of the strings of a language.
To be precise: the textbooks prove:
- every regular (type-3) language can be generated by a regular grammar and recognized by a finite automaton;
- every context-free (type-2) language can be generated by a context-free grammar and recognized by a pushdown automaton;
- every context-sensitive (type-1) language can be generated by a context-sensitive grammar and recognized by a linear bounded automaton;
- every arbitrary (type-0) language can be generated by an unrestricted grammar and recognized by a Turing machine.
Initially, Chomsky considered unrestricted grammars to be a good basis for the description and machine translation of natural languages. He changed his mind when it was discovered that they have unrestricted descriptive power (any language that can be formally described at all can be described by an unrestricted grammar) and an undecidable recognition problem (it is impossible to create an algorithm that given an arbitrary unrestricted grammar creates a Turing machine, or any other kind of program, that given a string decides whether it can be generated by that grammar).
For context-sensitive languages, recognition is decidable, but hard (PSPACE-complete in general).
So these two variants were never seriously considered as the basis for parsing languages.
Meanwhile, more and more computer languages were being developed, and it became clear that nested syntactic structures with compositional semantics, as exhibited by natural languages, would also be very useful for computer languages. After attending and reading some of Chomsky's presentations, John Backus, a major force in IBM's programming language development, adopted Chomsky's context-free grammar rules as the basis for describing the nested structures in the language he was working on at the time, Algol. BNF is the very same thing as Chomsky's context-free grammars. There is no difference, other than (extended) BCF allowing alternation and iteration in right hand sides:
- instead of two rules A :== B and A :== C, you can write A := B | C
- instead of two rules A :== C and A :== CA, you can write A := B+
- instead of two rules A :== <empty> and A :== BA, you can write A := B*
But the reverses of these three rules also hold; this is why this new syntax is syntactic sugar: it does not add any expressive power, only more conciseness.
But, as you rightly say, as soon as you start to use grammars to describe languages, the need arises to create recognizers for that language, and generative grammars in general are not written with recognition (parsing) in mind; they do not tell you how to parse strings into syntax trees according to their rules.
What to do? Your answer: we need something other than generative grammars, we need reductive grammars, which are specially designed for parsing. But you call BNF an example of reductive grammars, when I just called them the exact same thing as context-free generative grammars. Why, then, do you call them reductive?
My claim is that what is reductive is not the BNF rules themselves, but the parsers that are automatically generated from them. This was the innovation brought about in 1962, apparently by META II: the automatic generation of parsers from grammars.
How to do this? This wasn't familiar territory in 1962. It was known that if you take a naive approach to parsing, allowing arbitrary context-fee grammars will get you into trouble.
If you do top-down, left-to-right parsing, and you allow left recursion:
A := A B | C
you get stuck into an infinite loop. All left recursion must be eliminated. (It was later discovered that this is impossible to do in general.)
Another problem is ambiguity along the way: when doing top-down, left-to-right parsing, these rules
A :== B C | D D := B E
present a problem: you don't know whether to interpret A as B C or as B E as they both start with B. This can be solved by backtracking, but such a parser can take an exponential time to run.
A more radical approach is to simply disallow all combinations of rules that may produce such problems. For instance: require that all right hand sides start with terminals (i.e. tokens, not variables) and hat all right hand sides of the same nonterminal (variable) start with different terminals. So
A := a A A := b B a C B b
is fine, but
A := a A A := a C
is not, and neither is
A := B
This type of grammar is known as simple deterministic or LL(1). For these, parsing can be done top-down, and it takes no more than linear time: you can proceed at each token and you never need to backtrack.
The drawback of this approach is that it reduces expressive power: these grammars cannot describe all context-free grammars. They can only describe a subset (which, naturally, is know as the set of LL(1) languages).
This makes such grammars unsuitable for use in natural language processing: natural language contains non-LL(1) constructs. When designing a new artificial language, you are free to choose its syntax such that an LL(1) grammar (or similar) can describe it.
People have done this a lot. The resulting languages are fast and easy to parse, but often curbed in expressive power. For instance, when I encounter an XML-based languages, I want to express its allowable syntax with DTDs or XML Schema; I have met cases where it couldn't be done because DTDs and XML Schema are required to be LL(1), while the syntax I needed to describe wasn't.
So do we have to choose between crippled languages and exponential parsing times? Not at all.
Instead of doing top-down parsing we can use bottom-up parsing (see LR parser and LALR parser), a smarter technique that supports more grammars than top-down parsing (but still not everything), and takes quadratic time rather than exponential time. Parser generators such as Yacc are based on this.
Furthermore, we can parse arbitrary context-free grammars in cubic time, using some kind of extension of bottom-up parsing such as an Earley parser or GLR parser. Instead of backtracking, we can process all alternative parses at once, if our bookkeeping is smart enough. This essentially turns parsing into a variant of matrix multiplication.
Where does this leave reductive grammars? I suspect that META II generates LL(1) parsers and therefore requires LL(1) grammars. So you might define reductive grammars as LL(1) grammars. Later parser generators use bottom-up parsers - are the grammars that these allow reductive, too? Even later parser generators support general context-free parsing; now, all context-free grammars have become "reductive". Hence my statement: the generated parser is reductive, not the grammar itself.
However, you mentioned another thing that I've ignored thus far: the reductive grammars support additional constructs in their rules that take their expressive power out of the context-free realm. You mentioned how this is necessary for COBOL, but C has a similar problem. If additional constructs in rules can indeed express such things, it is indeed a feature of grammars and worth being mentioned in this article. Furthermore, such features are probably going to be "reductive" in the sense that their interpretation assumes a particular parsing algorithm.
Furthermore, rules written for parser generators also have actions associated with them, code written in a programming language. Such actions are also "reductive", they also (probably) assume a particular parsing algorithm. If you include the action code in your notion of what a reductive grammar is, clearly a reductive grammar is far more than a generative grammar. The action code helps produce sensible syntax trees for the sentences being parsed, but it can just as well perform additional syntax checking, such as verifying whether every variable has been declared prior to use. So in practice, parsers do recognize non-context-free features in languages. This must certainly be explained in an article about parsing and translation, but to what extent does it belong in the present article? I'm not sure. I was more or less assuming it would be confined to the notion of grammar defined by Chomsky. But should it be?
There is much to improve in the article as it is now, but I'd like to see its scope defined more clearly before I attempt to make any changes. Rp (talk) 00:35, 5 September 2015 (UTC)
What About
[edit]I would first like to point out that what I am talking about is a programming languages. META II is a metacompiler not a parser generator. It takes a metal(programming)language as input.
META II is not presented as a standard language, but as a point of departure from which a user may develop his own META "language". - Dewey Val Schorre[1]
To be specific they are parser programming languages.
In these parser programming languages a top-down reductive method of syntax analysis is programed. The main goal is to match a complete program module with the starting program formula.
program = $((declaration // A program is a sequence of declarations | .EOF .STOP) // terminated by End Of File \ ERRORX["Error"] // Otherwise backtrack report error // flagging the furthest parse point. $(-(';'|.EOF .STOP)) ';');// Error - find a ; to continue.
The recogniser achieves this main "program" goal by looking for sub-goal entities, from left to right and down, as individual subgoals. The above, using the $ (zero or more loop operator), specifies a program to be a sequence of declarations that terminates successfully on end of file ".EOF".
- $((declaration | .EOF .STOP) \ ERRORX["Error"] $(-(';'|.EOF .STOP) .ANY);
The parser is programed in a grammar defining language. A language that has specific rules. The early languages do not allow backtracking. Later CWIC added a backtracking alternative operator. The syntax formula:
- x = a b | a c;
is in not valid as alternatives may not begin with the same test as a previous alternative. And thus factoring is used:
- x = a (b|c);
Sense syntax formula:
- <name> = <parsing expression>;
are compiled into functions having the given <name> that must be unique; There can be only one rule having that <name> identifier.
- x = a b;
- x = a c;
- ^ *ERROR* the symbol x may not be redefined.
The restriction on backtracking is in the META II document. Schorre was on the CWIC development team.
As a note. Studying the Algol report and having been at many LA ACM SegPLan meetings in the 1960s. Been on field trips to IBM facilities. And having interviewed for a job at IBM. I do not believe Backus was influenced by Chomsky in developing BNF. Instead being based on Algebraic Normal Form. He worked at IBM were ANF was being used in circuit design. He was evasive when asked about its inspiration. It was probably an internal IBM development he couldn't talk about. He probably had a non-disclosure agreement with IBM.
The fact is that META II was defined in the META II language. The CWIC SYNTAX and GENERATOR languages were defined in the CWIC SYNTAX language. There is no other grammar specification other than the parser program written in the SYNTAX language.
META II is not a parser generator. It is no less a compiler then any other programming language compiler. When we speak of the c or c++ compilers we are referring to the c language compiler or c++ language compiler. META II is a programming language. Like other programming languages there is a META II (language) compiler. Like other programming languages we refer to the language and compiler by the language name.
META II is not a declarative language although it appears declarative. It is semantically an imperative procedural language. Read the META II document. Available from the ACM or UCLA archives. Ignoring that it is a parser program it can be viewed as a declarative language specification.
CWIC token rule are more efficient then regular expressions used by lexers.
bin: '0'|'1'; oct: bin|'2'|'3'|'4'|'5'|'6'|'7'; dgt: oct|'8'|'9'; hex: dec|'A'|'B'|'C'|'D'|'E'|'F' |'a'|'b'|'c'|'d'|'e'|'f'; number .. '.B' bin $bin MAKEBIN[] |'.O' oct $oct MAKEOCT[] |('.H'|'.X') hex $hex MAKEHEX[] | dgt $dgt MAKENUM[];
Calling number if successful returns a integer object on the parse stack. Generally the testing for a character, bin, oct, hex, dgt, in the above is a test instruction and jump on condition. Using the current character as an index into a table a class mask anded with the indexed entry determines membership. A non-zero result indicates membership.
The power of these languages is demonstrated by the fact that a good percentage of the ALGOL 60 language was implemented in META II on a 16K IBM 1401 computer. That is 16K of six bit character memory. There was a 7th word mark bit.
A 1970 ACM paper on CWIC gives a little information. It was running in 1969 when I got manuals for the SYNTAX and GENERATOR languages at an ACM meeting. Its negative look ahead features do not translate into a Chomsky production grammar. Nor do its first match left to right and down order of evaluation(execution order). These grammar defining/programing language are technically different Chomsky grammars.
The tree built by these languages is not a parse tree. They are more like an abstract syntax tree. The trees are lists who's first element is a node object. Node object are created an pushed on the node stack using:
:<node string> :ADD. :SUB :NODE EXP = TERM $(('+':ADD|'-':SUB)TERM);
SLIC developed on a DEC-10 had an interactive debugger. Sense underlying the system was a lisp 2 implementation. Break points could be set and the user could display the node and parse stack entries. Watch points could be set that display the line(s) around the parse point on entry and/or exit of a parse rule. Assembly debugging was also available. The generator and pseudo language debugging allowed statement break points and displaying variables.
A machine instruction is defined as a sequence of bit fields. Conditional statements of course could select different field sequences. Conditional expressions could be used in deciding a fields content.
These are syntax driven compilers. The number formula is a prime example. It parses, or lexis, the numeric string. and calls a conversion generator function that converts the recognized string into a numeric object, returning the object on the parse stack. The parser language could be considered a string processing language.
The trees built by these languages are abstract syntax trees. In SLIC nodes could have attributes. In both CWIC and SLIC dictionary entries have attributes. Other objects generally do not have attached attributes. All objects have a type attribute. Objects are managed and created by token formula or in the generator language.
Steamerandy (talk) 06:36, 20 March 2017 (UTC)Steamerandy (talk) 01:30, 20 March 2017 (UTC) Steamerandy (talk) 20:54, 7 August 2019 (UTC)
Further info found
[edit]I have found several papers referring to reductive grammar. These were philosophy topics dealing with language inturpitation. But haven't found the original references online. No dates so cannot say the reductive mentioned in old metacompiler documents were based on the same concept described. They do conform to the idea of having semantics in the grammar.
In the article I found reductive Grammer has to do with the meaning. It's described as breaking language constructs into parts having meaning. They seam to be saying that a reductive grammar includes semantic processing.
The metacompiler metalanguage do just that when correctly programed. META II was limited to producing code for a stack machine. It generated code directly while parsing. TREE META changed that by adding stacks and tree building operators. We are all used to seeing BNF of an arithmetic expression. I think confusion is caused because a reductive grammar may look much the same as a productive grammar. One thing separating them is the reductive grammar includes semantics within some rules. Reductive grammars parse input and the semantics are determined in the grammar language.
META II and yacc are simular in having code inclusions in their rules. Except META II code production implement the language being compiled. The meaning is determined and code produced to implement its intent. Personally I have not used yacc. But I understand the code in a yacc rule is meant to be placed in the parser code produced. When first I looked at yacc I had already implemented several compilers using a metacompiler that I wrote. It was based on SDC's CWIC (Compiler for Writing and Implementing Compilers). Sorry but yacc didn't impress me much. CWIC is a fifth generation metalanguage in the Schorre line of metacompilers: META II (1964), TREE-META (1968), (Two metacompler written in LISP developed at SDC), CWIC-360 (1970 developed at SDC).
My additions to CWIC isolated target machine code generation.
CWIC had a syntax and generator language. The syntax language had been extended from META II and TREEMETA to include token rules. Token rues are sort of lexing rules. Token rules parsed characters into a string that became a structure. A string structure for example. A number into a numeric object. Symbols cataloged into dictionaries. Every pice of data became an object. The string became a string object. Number objects could be numerically minulipated in ecpressions. The generator language was based on lisp II, having an ALGOL like block structure. CWIC maintained a symbol table. Attributes could be tested and assigned to a symbol in the generator language.
Anyway what I found is that in some modern language philosophy books they talked about reduction grammars reducing a language in to meaningful sub structures. But the references they gave I could not find on-line. Some of the articles seamed to be old.
The metacompilers have parsing languages that look like a grammar. It's hard to describe their Grammer type today. They are really a programming language. And that must be kept in mind when using them. If I code an expression as:
- exp = term $(('+':ADD|'-':SUB) term !2);
I have a mental image of the parsing machine in my mind that has three stacks. A call stack, a parse stack, and a node stack. (CWIC model) The parse stack is where I am building an abstract syntax tree. The node stack holds nodes created with the :<node name> operator. :ADD and :SUB in the rule above. The tree building operator !<number> creates a list combining the top node stack entry with parse stack entries given by the <number> argument. !2 builds a binary tree.
It defining the parsing of an expression. An expression (exp) is a term or a sequence of terms separated addition (+) or subtraction (-) operators.
These are more then a simple grammar. It assigns meaning producing an abstract tree representation of the input.
As a programmar I control the translation. I can program a right or left handed tree or produce a sum list. The above produces a left handed tree. Using a recursive rule generates a right handed tree
- exp = term (('+':ADD|'-':ADD) exp !2);
Generating a sumation list.
- exp = +[term $('+' term|'-' term :NEG!1)]+ :SUM!1;
if a+b-c+x is processed by the above the results are..
Left handed:
- ADD[SUB[ADD[a,b],c]x]
Right
- ADD[a,SUB[b,ADD[c,x]]]
List:
- SUM[a,b,NEG[C],x]
Maybe this is what sets a reductive apart.
Steamerandy (talk) 11:43, 17 September 2015 (UTC)
Reductive grammar concept
[edit]A reductive grammar is like sentence diagramming. Being analytical in nature it's rules define language constructs that express contextual meaning.
A reductive grammar of an arithmetic expression expresses it's hierarchical nature. The operator precedence is expressed in the way rules are written.
- expr = term | expr addop term
- term = factor | term mpyop factor
Were I first found the terminology reductive grammar used. It also said that the grammar was not analytical or productive but a combanation of both. I took that to mean that different parts had those properties. But I think now it means they may be both in one.
The expr and term rule can be taken either way. They work both ways. A parser is easly written for them.
Metacompiler
[edit]META II did not compile into a specific type of parser. Most metacompiler predate modern parser terminology.
They would most closely be classified as recursive decent. Each rule is compiled into a function of it's name. The are boolean functions that return success or failure. (true or false). There is also a top driving rule.
They replace left recursion with a loop. A Kleene star equilivant.
Advanced version had backtracking. All had alternatives operators. The is no separate lexer pass. Tokens are parsed on the fly. Backtracking was a alternative operator. The / character was used as the normal alternative (or operator). CWIC used the character \ for a backtracking alternative.
META II and TREE-META would most likely be considered LR(1) parsers. CWIC with it's backtracking was far more powerful.
Backtracking changed the game. Normally in parsing alternatives could be specified. Alternatives could be a sequence of course. When parsing a sequence a failure of a first token match resulted in trying an alternative. But once a match advancing input is made a failure of the sequence resulted in a long fail backtracking. If no backtrack was set it was caught by a default resulting compilation failure.
- ru1 = rua \ rub;
The above backtracking alternative \ would reset the compile to the state before the call to rua and call rub. If rub long failed (backtracking) it wouldn't be handled in this rule. A common programming error not accounting for the last alternative long failing. A double == rule added in SLIC cough the long fail backtracking and returning failure. In CWIC one could write the equivalent ad:
- ru1 = rua \ rub \ .FAIL;
Unlike yacc these languages compiled to machine code.
With backtracking came look ahead operators - and ?. The - (negate not operator) nagated the out come of its operand. The ? tested evaluating success or failure but without advancing the parse. These used in complex cases, backtracking to look ahead and reset the parse.
These are syntax directed compilers. The syntax language of TREE-META and later metacompilers called generator functions (unparse rules) passing them arguments. Usually the top parse stack entry.
I started out here trying get this topic field independent. One may say that BNF is a Chomsky contest free grammar. But in use with yacc and originally in the ALGOL 60 report it is a reductive grammar. A meta linguistic variable, or class name, names a language construct and its rule defines its form. <term>,<factor>,<integer>,<block>,<ifst> and so forth name language elements or constructs.
Linguistics is not so much concerned with naming elements that make up a language.
A metacompiler takes as input a metalanguage program. The fact the input is a programming metalanguage puts some restraints on the language. A program has a start point. With a syntax directed program that means there is a driving top rile. Even is the language is an unstructured command language. i.e. Each command is complete in and of it's self. There has to be a loop the inputs these commands.
- input = $command;
These are programming languages. So recursion without some test to break out of a recursive loop is a programming error. An immediate left recursion can easy be detected by the compiler. A nested left recursion in most cases is easly detected. A sequence whose first element may be optional (has a .EMPTY alternative) requires a lot more overhead.
Anyway from what I can find reductive is more then just analytical. They are a subset of analytical that breaks down language in to specific parts. Like deciding the object in a sentance. Distinguish a command from an acknowledgement of understanding the command. Or of completion of command execution.
I first ran across reductive grammar in the TREE-META manual. It wasn't defined there. It implied reductive and analytical to be the same. It stated that TREE-META used a reductive, or analytical, grammar. I remembered reductive being used in reference to CWIC's syntax language. I didn't find it mentioned in the short ACM paper on CWIC though. TREE-META also stated its language was neither analytical or a productive but a combination of both.
I found a book: "Reductive grammars with context control by M. G. Gontsa" that may be a reference. It's Catagory "Computer Science" validates the catagory. Translation published in 1978.
We are stuck with using current terminology. At one time a compiler compiler actually produced a compiler. Not just a parser front end as yacc does. Early compilers-compilers were taking BNF input along with other specialized transform languages and producing running compilers. META II was the first to use a specialized syntax-transforming language. It translated to assembly source code. The assembly language was of a stack architecture pseudo machine. Different pseudo machines were implement for different languages.
TREE-META changed the grammar language transformation to produce a tree that was then passed to unparse rules. This allowed the translation to common hardware machine code instructions. CWIC changed the unparse rules to have named unparse groups with LISP II block structured actions. A named unparse group was called a generator. With this also came variables in unparse rules replacing the *<number> relative stack reference notation of META II and TREE-META. There are at least 8 metacompilers I know of that call themselves metacompilers using these same ideas and vary simular metalanguage grammars. All complete compiler-compilers that produce a complete running compiler. With the exception of runtime libraries written in assembly or other lowlevel MOLs. Steamerandy (talk) 19:20, 18 September 2015 (UTC)
Compiler grammar specification
[edit]The problem seams to be the different uses of grammar in the various diciplans conserned. Formal linguistics has been adopted into computer science but with different intent. A linguistic grammar theory is completely different then writing a compiler to a grammar specification. I began writing compilers in 1967. There wasn't all this linguistic baggage then. A few META compilers had been implemented by 1968. CWIC was announced in 1970. Early CWIC documents were available in late 1969. TREE-META documentation is dated 1968.
The classification of the Schorre META languages is the problem. Computer science says a programming language is defined by a grammar. The META compiler grammar is actually a programming language used in programming the parser. I can not express more: that you program the parser in the language that appears to be a grammar.
I think one has to learn the programming of a META language grammar to grasp this. So I will try to explain them from that point of view. I can not explain them in a linguistic context. Tried and it just gets jumbled up. Rp thinks I keep changing what I say. If someone dosn't get what I am trying to express I try it in a different way. Sorry I am not changing my point of view. Just trying to get the concept across. I am usually find myself trying to explain something to someone with nil notion of the subject. So do not take my different tactics as changing my point. I am trying to explain the same thing from a different angles. So to speak.
These META compiler are prime examples of reductive or analytical grammars. I say that because it is said in their documentation. They are very simular to what are now called parsing expression grammars. I wish I could see the original PEG paper. But the term Parsing Expression Grammer would be a very good fit for the META compiler grammar languages.(The three word description. Not necessarily its defination) Calling them a grammar or syntax does not really describe them. They are programming or more to the point metaprogramming languages. They are grammars having specific samantic inturpertations. In those META language compilers, a grammar rule is a parsing transform function. i.e. Rules of the form (excluding transforms):
expr = term (('+'|'-') term)*; tern = factor (('*'|'/') factor)*; factor = '(' expr ')' | id | number;
can be argued to be production rules. No problem. But as a programming language we say that expr, term, and factor are to leave an object on the parse stack. There are three stacks. The parse stack hold's parsed objects and list structures of combined parsed objects and lists. The node stack holds node objects created with a ':' node operstor. The call stack, like most modern programming languages, is used to hold the return address on a function call. Ok. So to start expr is a function that first calls term which calls factor which, having three alternatives may call expr, id or number. Evaluated left to roght '(' calls a string test function. If it returns success then expr is called. If not id is called and then number if id returns failure. If id is successful an id symbol object is pushed on the parse stack. id and number are token rules. There are three rule types: character class, token, and syntax. Token rules recognize character sequences and create token objects: numbers, strings, and symbols. Alternatives separated by '|' or '\' are attempted in the order written until one is successful or there are no more alternatives and failure is returned to the caller. Grouping allows basicly an LR parse. A recursive decent parser is talked about as if it's a somplistic, limited inferior thing. But many programming languages are nested structures. Like the rules above describs.
In writing a rule like:
expr = term (('+':ADD|'-':SUB) term!2)*;
including tree building operators, there is the expectation that term will leave a single object on the parse stack just as expr will if tern does its job. The rule first calls term which recognizes term1. Then if a '+' follows a ADD node is pushed on the node stack. Likewise a '-' pushes a SUB node. If nether is found expr is successful just leaving a term on the parse stack. If a add or sub operator were rcognize we expect to find another term. If so a 2 branch tree (by the !2 tree branch specifying operator) is built from the top node stack entry and top two parse stack entries leaving the created tree on the parse stack after removing the 2 terms. We write a parser rule with the expectation that the parser functions operate in a specific way. i.e. The parse stack top 2 entries to be objects representing the two recognized terms. On an iteration the expectation is that term will leave a single object to be combined with the tree created on the previous iteration. Though one can write rules that leave multiple items on the parse stack it is not common. An indefinite list is created using +[ and ]+ operators.
arglist = +[ arg $(',' arg) ]+
The expectation is that arg leave 1 item on the parse stack. There can be 1 or more arguments that are assembled into the list leving the list object on the parse stack. arg is a rule leaving a single tree. A tree that represents the calculation or retrieval of the argument.
Taking another look at expr with tree building operators added:
expr = term (('+':ADD|'-':SUB) term!2)*; term = factor (('*':MPY|'/':DIV) factor!2)*; factor = '(' expr ')' | id | number;
Given the expression a/(b+2)-5 when expr is called it calls term, term calls factor, factor calls cmpstr('('). cmpstr returns failure to factor which calls id. a is matched as being an id. An "a" symbol object is created and puched on the parse stack. id returns success to factor which returns success to term. term calls cmpstr('*') and gets failure back the alternate '/' is passed to cmpstr with success. The :DIV pushes the DIV node on the node stack. Continuing factor is called with "(b+2)-5" remaining. The '(' matched and expr called with b+2)-5 remaining. We decend down though matching b as an id in factor again. returning to term isn't matched to '/' or '*' The * zero or more operator is successful and we return to expr with +2)-5 remaining. And a,b on the parse stack and DIV on the node stack. The '+' is matched and ADD pushed on the node stack. 2)-5 remaining Going down through the rule functions the 5 is recognized by number and pushed on the parse stack. )-5 remains. Returning to expr !2 creates a 3 element list [ADD,b,2] popping the ADD node and top 2 parse stack entries. The list is pushed onto the parse stack. Returning through the call stack we again come to the !2 in term and pop node and parse stack entries creating [DIV,a,[ADD,b,2]] The list is a tree representation of the parsed expression. Eventually we have the parsed expression in a tree form.
a/(b+2)-5 SUB / \ DIV 5 / \ a ADD / \ b 2
Recursion in processing a nested expression allows the building of the parse tree matching the semantic computational intent.
In the context of a programming language the grammar is a reductive analytical process. CWIC compiled the grammar language into IBM-360 binary code. I used the Kleene star '*' zero or more operator and '|' for normal non-backtracking alternative operator. CWIC used '$' and '/'. The $ preceeded its argument. The language has several look ahead operators. ? looks ahead using any language constructs and backtracks to initial state returning match status success or failure. The - not operator looks ahead returning the complement condition. -';' any character except a ; is successful. Strings can be placed in a token or pushed on the parse stack using the , operator. ,"str"
I think the META compiler metaprogramming grammar language has to be viewed as analytical in order to program a parser in it. A rule is like a mathematical function eveluated according to semantic specifications. Alternatives are evaluated left to right. Nested language constructs have nested rules defining them. Rules may be recursive if the constructs are recursive.
A language like the early FORTRAN for example would little recursion. Only the computational expressions would have recursion.
FORTRAN = $(line cont statment $-.break);
That's from memory. I think you had a line number continuation flag and the statements.
Sorry got carried away with the following. But it does help make the point that these systems were real compiler compilers.
I wrote SLIC on a DEC-System-10 while at Cerritos Collage (Norwalk CALIF). I changed jobs where SLIC was used to write a COBOL cross compiler for a TI990 mini-computer. It took only 3 months for me and one other programmer. And most of my time was spent on other duties. The SLIC linker had to be written. The SLIC compiler was completed but generating DEC-10 object files. SLIC object files were working but a big part of the linker was unfinished. Figuring man months. It took between 3 to 4 man months to write a COBOL cross compiler. Another programmer used SLIC to write a cross assembler. Both running on a DEC-10 producing working executable TI990 programs. The 990 COBOL used exactly the identical DEC-10 COBOL syntax. COBOL programs could be written debuged and tested on the DEC-System-10 before compiling them for the TI990. The SLIC produced COBOL compiler ran faster then the DEC supplied compiler. It took several man years for DEC's COBOL as stated by a DEC source.
SLIC was based on CWIC. It had three additional languages. One the MACHOP defines machine instructions and their assembly nmunomic opcode and operands. Abstract machine pseudo instruction called the MACHOP to output binary code. PSEUDO procedures resembled assembly macro definations. Only were executable procedures. The generator language was a LISP 2 like language only creating pseudo instruction lists instead of byte by byte binary code as with CWIC. Sections in both systems allowed generation of code into named sections. Data and code separation for example. SLIC includes a linker that has polish fix up blocks. Unresolved references generated fixup objects that were output to the linker when all values were defined. One pass compilation. An In Sequence Optimizer language was planed. It would operate on pseudo code before it's execution. Pseudo code is appended to a section list and under program control were flushed(Executed and deleted). A procedure for example might be compiled to pseudo code and then flushed, the whole program or a single statement flush. MACHOPs were defined as a sequence of bit fields. No specific word, character or byte size. The memory model of the MACHOPS is bit addressable. This allowed code to be generated for a DEC-10 36 bit word architecture or a byte addressable microprocessor and 16 bit word addressable instruction architecture TI 990 mini computer. A .MORG instruction in a .MACHOP aligned to a modulo bit boundry. .MORG 36 or .MORG 8. MACHOPs also directed the pseudo assembly listing. .MACHOP was specified as a sequence of bit fields.
H(8): x; (4): I; (2): f0; (2): f1;
The above defines 4 fields having separately specified values. The number in ( ) is the field size in bits. The H on the first field species hex output to the assembly listing. Following fields not having a radix format are combined together when output. Above a 16 bit value is output in 4 hex digits. The fields could be complex expressions. Including if ecpressions. Fields could be conditional. Assembly instructions could be defined for any machine rxisting at that time. The translation of a high level language to machine code was a process of 5 transformations. The source code transformed to abstract parse tree structures. The tree structures transformed to pseudo code. Pseudo code transfermed by the insequence optimizer. Pseudo expansion to machine code. And MACHOPs to relocatable binary. SLIC provided a linker framework to witch parameters and specific code for the target load file format had to be written. Sense SLIC usually generated code, constant, and data sections the linker needed them organized to load on the object machine.
The MACHOP language could be used to create an assembler. A common assembly language for different machine instruction sets. Assembly syntax, macro defs and directives being common across machines. Steamerandy (talk) 01:09, 25 November 2015 (UTC)
Cite error: There are <ref group=note>
tags on this page, but the references will not show without a {{reflist|group=note}}
template (see the help page).
- ^ META II A SYNTAX-ORIENTED COMPILER WRITING LANGUAGE (Dewey Val Schorre UCLA Computing Facility 1964)