Manuals
At the core of DHParser lies a parser generator for parsing expression grammars. As a parser generator it offers similar functionality as pyparsing or lark. But it goes far beyond a mere parser generator by offering rich support of the testing an debugging of grammars, tree-processing (always needed in the XML-prone Digital Humanities ;-) ), fail-tolerant grammars and some (as of now, experimental) support for editing via the language server protocol.
DHParser is both suitable for small projects or “on the fly” use of parsing expression grammar as a more powerful substitute for regular expressions and for big projects. (The workflow for the latter is described in the A Step by Step Guide.) The usage and API of DHParser is (or will be) described with many examples in the doctrings of its various modules. The following reading-order is recommended to understand DHParser:
- ebnf - Although DHParser also offers a Python-interface for specifying
grammers (similar to pyparsing), the recommended way of using DHParser is by specifying the grammar in EBNF. Here it is described how grammars are specified in EBNF and how parsers can be auto-generated from these grammars and how they are used to parse text.
- nodetree - Syntax-trees are the central data-structure of any
parsing system. The description to this modules explains how syntax-trees are represented within DHParser, how they can be manipulated, queried and serialized or deserialized as XML, S-expressions or json.
- transform - It is not untypical for digital humanities applications
that document tress are transformed again and again to produce different representations of research data or various output forms. DHParser supplies the scaffolding for two different types of tree transformations, both of which a variations of the visitor pattern. The scaffolding supplied by the transform-module allows to specify tree-transformations in a declarative style by filling in a dictionary of tag-names with lists of transformation functions that are called in sequence on a node. A number of transformations are pre-defined that cover the most needed cases that occur in particular when transforming concrete syntax trees to more abstract syntax trees. (An example for this kind of declaratively specified transformation is the
EBNF_AST_transformation_tablewithin DHParser’s ebnf-module.)- compile - The compile-module offers an object-oriented scaffolding
for the visitor pattern that is more suitable for complex transformations that make heavy use of algorithms as well as transformations from trees to non-tree objects like program code. (An example for the latter kind of transformation is the :py:class`~ebnf.EBNFCompiler`-class of DHParser’s ebnf-module.)
With the documentation of these four modules you should have enough knowledge to realize projects that follow the workflow described in the A Step by Step Guide. In most cases there will be no need to interact with the other modules directly.
parse- contains the parsing algorithms and thePython-Interface for defining parsers. DHParser features a packrat-parser for parsing-expression-grammars with full left-recursion support as well configurable error catching an continuation after error. The Python-Interface allows to define grammars directly as Python-code without the need to compile an EBNF-grammar first. This is an alternative approach to defining grammars similar to that of pyparsing.
dsl- contains high-level functions for compilingebnf-grammars and domain specific languages “on the fly”.
preprocess- provides support for DSL-preprocessors as well as sourcemapping of (error-)locations from the preprocessed document to the original document(s). Preprocessors are a practical means for adding features to a DSL which are difficult or impossible to define with context-free-grammars in EBNF-notation, like for example scoping based on indentation (as used by Python) or chaining of source-texts via an “include”-directive.
error- defines theError-class, the objects of which describeerrors in the source document. Errors are defined by - at least - an error code (indicating at the same time the level of severity), a human readable error message and a position in the source text.
testing- provides functions for unit-testing of grammars. Usually,developers will not need to interact with this module directly, but rely on the unit-testing script generated by the “dhparser.py” command-line tool.
trace- Apart from unit-testing DHParser offers “post-mortem”debugging of the parsing process itself - as described in the A Step by Step Guide. This is helpful to figure out why a parser went wrong. Again, there is little need to interact with this module directly, as it functionality is turned on by setting the configuration variables
history_trackingand, for tracing continuation after errors,resume_notices, which in turn can be triggered by calling the auto-generated -Parser.py-scripts with the parameter--debug.log- logging facilities for DHParser as well as tracking of theparsing-history in connection with module
trace.configuration- the central place for all configuration settings ofDHParser. Be sure to use the
access,setandgetfunctions to change presets and configuration values in order to make sure that changes to the configuration work when used in combination with multithreading or multiprocessing.server- In order to avoid startup times or to provide a languagesever for a domain-specific-language (DSL), DSL-parsers generated by DHParser can be run as a server. Module
serverprovides the scaffolding for an asynchronous language server. The -Server.py”-script generated by DHParser provides a minimal language server (sufficient) for compiling a DSL. Especially if used with the just-in-time compiler pypy using the -Server.py script allows for a significant speed-up.lsp- (as of now, this is just a stub!) provides data classes thatresemble the typescript-interfaces of the language server protocol specification.
stringview- defines a low level class that provides views on slicesof strings. It is used by the
parse-module to avoid excessive copying of data when slicing strings. (Python always creates a copy of the data when slicing strings as a design decision.) If any, this module can significantly be sped up by compiling it with cython. (Use thecythonize_stringview-skript in DHParser’s main directory or, even better, compile (almost) all modules with thebuild_cython-modules-skript. This yields a 2-3x speed increase. The fastest way to run DHParser, however, is pypy, which yields a 4-5x speed increase, albeit only in the long run.)toolkit- various little helper functions for DHParser. Usually,there is no need to call any of these directly.