The PpLexer module represents the user side view of pre-processing. This tutorial shows you how to get going.
First let’s get some demonstration code to pre-process. You can find this at cpip/demo/ and the directory structure looks like this:
\---demo/
| cpip.py
|
\---proj/
+---src/
| main.cpp
|
+---sys/
| system.h
|
\---usr/
user.h
In proj/ is some source code that includes files from usr/ and sys/. This tutorial will take you through writing cpip.py to use PpLexer to pre-process them.
First lets have a look at the source code that we are preprocessing. It is a pretty trivial variation of a common them, but beware, pre-processing directives abound!
The file demo/proj/src/main.cpp looks like this:
#include "user.h"
int main(char **argv, int argc)
{
#if defined(LANG_SUPPORT) && defined(FRENCH)
printf("Bonjour tout le monde\n");
#elif defined(LANG_SUPPORT) && defined(AUSTRALIAN)
printf("Wotcha\n");
#else
printf("Hello world\n");
#endif
return 1;
}
That includes a file user.h that can be found at demo/proj/usr/user.h:
#ifndef __USER_H__
#define __USER_H__
#include <system.h>
#define FRENCH
#endif // __USER_H__
In turn that includes a file system.h that can be found at demo/proj/sys/system.h:
#ifndef __SYSTEM_H__
#define __SYSTEM_H__
#define LANG_SUPPORT
#endif // __SYSTEM_H__
Clearly since the system is mandating language support and the user is specifying French as their language of choice then you would not expect this to write out “Hello World”, or would you?
Well you are in the hands of the pre-processor and that is what CPIP knows all about. First we need to create a PpLexer.
This is the template that we will use for the tutorial, it just takes a single argument from the command line sys.argv[1]:
1 2 3 4 5 6 7 8 | import sys
def main():
print('Processing:', sys.argv[1])
# Your code here
if __name__ == "__main__":
main()
|
Of course this doesn’t do much yet, invoking it just gives:
$ python cpip.py proj/src/main.cpp
Processing: proj/src/main.cpp
We now need to import and create and PpLexer.PpLexer object, and this takes at least two arguments; firstly the file to pre-process, the secondly an include handler. The latter is need because the C/C++ standards do not specify how an #include directive is to be processed as that is as an implementation issue. So we need to provide an defined implementation of something that can find #include'd files.
CPIP provides several such implementations in the module IncludeHandler and the one that does what, I guess, most developers expect from a pre-processor is IncludeHandler.CppIncludeStdOs. This class takes at least two arguments; a list of search paths to the user include directories and a list of search paths to the system include directories. With this we can construct a PpLexer object so our code now looks like this:
import sys
from cpip.core import PpLexer, IncludeHandler
def main():
print('Processing:', sys.argv[1])
myH = IncludeHandler.CppIncludeStdOs(
theUsrDirs=['proj/usr',],
theSysDirs=['proj/sys',],
)
myLex = PpLexer.PpLexer(sys.argv[1], myH)
if __name__ == "__main__":
main()
This still doesn’t do much yet, invoking it just gives:
$ python cpip.py proj/src/main.cpp
Processing: proj/src/main.cpp
But, in the absence of error, shows that we can construct a PpLexer.
To get PpLexer to do something, we need to make the call to PpLexer.PpTokens(). This function is a generator of preprocessing tokens.
Lets just print them out with this code:
import sys
from cpip.core import PpLexer, IncludeHandler
def main():
print('Processing:', sys.argv[1])
myH = IncludeHandler.CppIncludeStdOs(
theUsrDirs=['proj/usr',],
theSysDirs=['proj/sys',],
)
myLex = PpLexer.PpLexer(sys.argv[1], myH)
for tok in myLex.ppTokens():
print(tok)
if __name__ == "__main__":
main()
Invoking it now gives:
$ python cpip.py proj/src/main.cpp
Processing: proj/src/main.cpp
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
...
PpToken(t="int", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="main", tt=identifier, line=True, prev=False, ?=False)
PpToken(t="(", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="char", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="*", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="*", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="argv", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=",", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="int", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="argc", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=")", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="{", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="printf", tt=identifier, line=True, prev=False, ?=False)
PpToken(t="(", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t=""Bonjour tout le monde\n"", tt=string-literal, line=False, prev=False, ?=False)
PpToken(t=")", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t=";", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="return", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="1", tt=pp-number, line=False, prev=False, ?=False)
PpToken(t=";", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="}", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
The PpLexer is yielding PpToken objects that are interesting in themselves because they not only have content but the type of content (whitespace, punctuation, literals etc.). A simplification is to change the code to print out the token value by changing a line in the code from:
print tok
To:
print tok.t
To give:
Processing: proj/src/main.cpp
int main ( char * * argv , int argc )
{
printf ( "Bonjour tout le monde\n" ) ;
return 1 ;
}
It is definately pre-processed and although the output is correct it is rather verbose because of all the whitespace generated by the pre-processing (newlines are always the consequence of pre-processing directives).
We can clean this whitespace up very simply by invoking PpTokens.ppTokens() with a suitable argument to reduce spurious whitespace thus: myLex.ppTokens(minWs=True). This minimises the whitespace runs to a single space or newline. Our code now looks like this:
import sys
from cpip.core import PpLexer, IncludeHandler
def main():
print('Processing:', sys.argv[1])
myH = IncludeHandler.CppIncludeStdOs(
theUsrDirs=['proj/usr',],
theSysDirs=['proj/sys',],
)
myLex = PpLexer.PpLexer(sys.argv[1], myH)
for tok in myLex.ppTokens(minWs=True):
print(tok.t, end=' ')
if __name__ == "__main__":
main()
Invoking it now gives:
Processing: proj/src/main.cpp
int main ( char * * argv , int argc )
{
printf ( "Bonjour tout le monde\n" ) ;
return 1 ;
}
This is exactly the result that one would expect from pre-processing the original source code.
So far, so boring because any pre-processor can do the same, PpLexer can do far more than this. PpLexer keeps track of a large amount of significant pre-processing information and that is available to you through the PpLexer APIs.
For a moment lets remove the minWs=True from myLex.ppTokens() so that we can inspect the state of the PpLexer at every token (rather than skipping whitespace tokens that might represent pre-processing directives).
Changing the code to this shows the include file hierarchy every step of the way:
for tok in myLex.ppTokens():
print myLex.fileStack
Gives the following output:
$ python cpip.py proj/src/main.cpp
Processing: proj/src/main.cpp
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp', 'proj/usr/user.h', 'proj/sys/system.h']
['proj/src/main.cpp', 'proj/usr/user.h', 'proj/sys/system.h']
['proj/src/main.cpp', 'proj/usr/user.h', 'proj/sys/system.h']
['proj/src/main.cpp', 'proj/usr/user.h', 'proj/sys/system.h']
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp']
...
Changing the code to this:
for tok in myLex.ppTokens(condLevel=1):
print myLex.condState
Produces this output:
Processing: proj/src/main.cpp
(True, '')
...
(True, '')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(True, '')
...
(True, '')
A more common use case is to query the PpLexer after processing the file. The following code example will:
Here is the code, named cpip_07.py:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | import sys
from cpip.core import PpLexer, IncludeHandler
def main():
print('Processing:', sys.argv[1])
myH = IncludeHandler.CppIncludeStdOs(
theUsrDirs=['proj/usr',],
theSysDirs=['proj/sys',],
)
myLex = PpLexer.PpLexer(sys.argv[1], myH)
tu = ''.join(tok.t for tok in myLex.ppTokens(minWs=True))
print()
print(' Translation Unit '.center(75, '='))
print(tu)
print(' Translation Unit END '.center(75, '='))
print()
print(' File Include Graph '.center(75, '='))
print(myLex.fileIncludeGraphRoot)
print(' File Include Graph END '.center(75, '='))
print()
print(' Conditional Compilation Graph '.center(75, '='))
print(myLex.condCompGraph)
print(' Conditional Compilation Graph END '.center(75, '='))
print()
print(' Macro Environment '.center(75, '='))
print(myLex.macroEnvironment)
print(' Macro Environment END '.center(75, '='))
print()
print(' Macro History '.center(75, '='))
print(myLex.macroEnvironment.macroHistory(incEnv=False, onlyRef=False))
print(' Macro History END '.center(75, '='))
if __name__ == "__main__":
main()
|
Invoking this code thus:
$ python3 cpip_07.py ../src/main.cpp
Gives this output:
Processing: ../src/main.cpp
============================= Translation Unit ============================
int main(char **argv, int argc)
{
printf("Bonjour tout le monde\n");
return 1;
}
=========================== Translation Unit END ==========================
============================ File Include Graph ===========================
../src/main.cpp [43, 21]: True "" ""
000002: #include ../usr/user.h
../usr/user.h [10, 6]: True "" "['"user.h"', 'CP=None', 'usr=../usr']"
000004: #include ../sys/system.h
../sys/system.h [10, 6]: True "!def __USER_H__" "['<system.h>', 'sys=../sys']"
========================== File Include Graph END =========================
====================== Conditional Compilation Graph ======================
#ifndef __USER_H__ /* True "../usr/user.h" 1 0 */
#ifndef __SYSTEM_H__ /* True "../sys/system.h" 1 4 */
#endif /* True "../sys/system.h" 6 13 */
#endif /* True "../usr/user.h" 7 20 */
#if defined(LANG_SUPPORT) && defined(FRENCH) /* True "../src/main.cpp" 5 69 */
#elif defined(LANG_SUPPORT) && defined(AUSTRALIAN) /* False "../src/main.cpp" 7 110 */
#else /* False "../src/main.cpp" 9 117 */
#endif /* False "../src/main.cpp" 11 124 */
==================== Conditional Compilation Graph END ====================
============================ Macro Environment ============================
#define FRENCH /* ../usr/user.h#5 Ref: 1 True */
#define LANG_SUPPORT /* ../sys/system.h#4 Ref: 2 True */
#define __SYSTEM_H__ /* ../sys/system.h#2 Ref: 0 True */
#define __USER_H__ /* ../usr/user.h#2 Ref: 0 True */
========================== Macro Environment END ==========================
============================== Macro History ==============================
Macro History (all macros):
In scope:
#define FRENCH /* ../usr/user.h#5 Ref: 1 True */
../src/main.cpp 5 38
#define LANG_SUPPORT /* ../sys/system.h#4 Ref: 2 True */
../src/main.cpp 5 13
../src/main.cpp 7 15
#define __SYSTEM_H__ /* ../sys/system.h#2 Ref: 0 True */
#define __USER_H__ /* ../usr/user.h#2 Ref: 0 True */
============================ Macro History END ============================
This is simple to the point of crude as the PpLexer supplies a far richer data seam than just text.
File Include Graph interface is described here: FileIncludeGraph Tutorial
There are several ways that you can inspect pre-processing with PpLexer:
The PpLexer constructor allows you to change the behaviour of pre-processing is a number of ways, effectively these are hooks into pre-processing that can:
When an #include directive is encountered a compliant implementation is required to search for and insert into the Translation Unit the content referenced by the payload of the #include directive.
The standard does not specify how this should be accomplished. In CPIP the how is achieved by an implementation of an cpip.core.IncludeHandler.
It is entirely acceptable within the standard to have an #include system that does not rely on a file system at all. Perhaps it might rely on a database like this:
#include "SQL:spam.eggs#1284"
An include handler could take that payload and recover the content from some database rather than the local file system.
Or, more prosaically, an include mechanism such as this:
#include "http:://some.url.org/spam/eggs#1284"
That leads to a fairly obvious way of managing that #include payload.
If you want to create a new include mechanism then you should sub-class the base class cpip.core.IncludeHandler.CppIncludeStd [reference documentation: IncludeHandler].
Sub-classing this requires implementing the following methods :
Given an Translation Unit Identifier this should return a class FilePathOrigin or None for the initial translation unit. As a precaution this should include code to check that the stack of current places is empty. For example:
if len(self._cpStack) != 0:
raise ExceptionCppInclude('setTu() with CP stack: %s' % self._cpStack)
Given an HcharSeq/Qcharseq and a searchpath this should return a class FilePathOrigin or None.
As examples there are a couple of reference implementations in cpip.core.IncludeHandler:
The PpLexer can be supplied with an ordered list of file like objects that are pre-include files. These are processed in order before the ITU is processed. Macro redefinition rules apply.
For example CPIPMain.py can take a list of user defined macros on the command line. It then creates a list with a single pre-include file thus:
import io
from cpip.core import PpLexer
# defines is a list thus:
# ['spam(x)=x+4', 'eggs',]
myStr = '\n'.join(['#define '+' '.join(d.split('=')) for d in defines])+'\n'
myPreIncFiles = [io.StringIO(myStr), ]
# Create other constructor information here...
myLexer = PpLexer.PpLexer(
anItu, # File to pre-process
myIncH, # Include handler
preIncFiles=myPreIncFiles,
)
You can pass in to PpLexer a diagnostic object, this controls how the lexer responds to various conditions such as warning error etc. The default is for the lexer to create a CppDiagnostic.PreprocessDiagnosticStd.
If you want to create your own then sub-class the PreprocessDiagnosticStd class in the module CppDiagnostic.
Sub-classing PreprocessDiagnosticStd allows you to override any of the following that might be called by the PpLexer:
There are a couple of implementations in the CppDiagnostic module that may be of interest:
You can pass in a specialised handler for #pragma statements [default: None]. This shall sub-class PragmaHandlerABC and can implement:
Have a look at the core module PragmaHandler for some example implementations.