Reference ========= At the core of DHParser lies a parser generator for parsing expression grammars. As a parser generator it offers similar functionality as pyparsing_ or lark_. But it goes far beyond a mere parser generator by offering rich support of the testing an debugging of grammars, tree-processing (always needed in the XML-prone Digital Humanities ;-) ), fail-tolerant grammars and some (as of now, experimental) support for editing via the `language server protocol`_. :py:mod:`ebnf` Although DHParser also offers a Python-interface for specifying grammars (similar to pyparsing_), the recommended way of using DHParser is by specifying the grammar in EBNF_. Here it is described how grammars are specified in EBNF_ and how parsers can be auto-generated from these grammars and how they are used to parse text. :py:mod:`nodetree` Syntax-trees are the central data-structure of any parsing system. The description to this modules explains how syntax-trees are represented within DHParser, how they can be manipulated, queried and serialized or deserialized as XML, S-expressions or json. :py:mod:`transform` It is not untypical for digital humanities applications that document tress are transformed again and again to produce different representations of research data or various output forms. DHParser supplies the scaffolding for two different types of tree transformations, both of which a variations of the `visitor pattern`_. The scaffolding supplied by the transform-module allows to specify tree-transformations in a declarative style by filling in a dictionary of tag-names with lists of transformation functions that are called in sequence on a node. A number of transformations are pre-defined that cover the most needed cases that occur in particular when transforming concrete syntax trees to more abstract syntax trees. (An example for this kind of declaratively specified transformation is the ``EBNF_AST_transformation_table`` within DHParser's ebnf-module.) :py:mod:`compile` offers an object-oriented scaffolding for the `visitor pattern`_ that is more suitable for complex transformations that make heavy use of algorithms as well as transformations from trees to non-tree objects like program code. (An example for the latter kind of transformation is the :py:class`~ebnf.EBNFCompiler`-class of DHParser's ebnf-module.) :py:mod:`pipeline` offers support for "processing-pipelines" composed out of "junctions" A processing pipe-line consists of a series of tree-transformations that are applied in sequence. "Junctions" declare which source-tree-stage is transformed by which transformation-routine into which destination tree-stage. Processing-pipelines can contain bifurcations, which are needed if from one source-document different kinds of output-data shall be derived. :py:mod:`testing` provides a rich framework for unit-testing of grammars, parsers and any kind of tree-transformation. Usually, developers will not need to interact with this module directly, but rely on the unit-testing script generated by the "dhparser.py" command-line tool. The tests themselves a specified declaratively in test-input-files (in the very simple ".ini"-format) that reside by default in the "test_grammar"-directory of a DHParser-project. :py:mod:`preprocess` provides support for DSL-pre-processors as well as source mapping of (error-)locations from the preprocessed document to the original document(s). Pre-processors are a practical means for adding features to a DSL which are difficult or impossible to define with context-free-grammars in EBNF-notation, like for example scoping based on indentation (as used by Python) or chaining of source-texts via an "include"-directive. :py:mod:`parse` contains the parsing algorithms and the Python-Interface for defining parsers. DHParser features a packrat-parser for parsing-expression-grammars with full left-recursion support as well configurable error catching an continuation after error. The Python-Interface allows to define grammars directly as Python-code without the need to compile an EBNF-grammar first. This is an alternative approach to defining grammars similar to that of pyparsing_. :py:mod:`dsl` contains high-level functions for compiling ebnf-grammars and domain specific languages "on the fly". :py:mod:`error` defines the ``Error``-class, the objects of which describe errors in the source document. Errors are defined by - at least - an error code (indicating at the same time the level of severity), a human readable error message and a position in the source text. :py:mod:`trace` Apart from unit-testing DHParser offers "post-mortem" debugging of the parsing process itself - as described in the :doc:`StepByStepGuide`. This is helpful to figure out why a parser went wrong. Again, there is little need to interact with this module directly, as it functionality is turned on by setting the configuration variables ``history_tracking`` and, for tracing continuation after errors, ``resume_notices``, which in turn can be triggered by calling the auto-generated -Parser.py-scripts with the parameter ``--debug``. :py:mod:`log` logging facilities for DHParser as well as tracking of the parsing-history in connection with module :py:mod:`trace`. :py:mod:`configuration` the central place for all configuration settings of DHParser. Be sure to use the ``access``, ``set`` and ``get`` functions to change presets and configuration values in order to make sure that changes to the configuration work when used in combination with multithreading or multiprocessing. :py:mod:`server` In order to avoid startup times or to provide a language sever for a domain-specific-language (DSL), DSL-parsers generated by DHParser can be run as a server. Module :py:mod:`server` provides the scaffolding for an asynchronous language server. The -Server.py"-script generated by DHParser provides a minimal language server (sufficient) for compiling a DSL. Especially if used with the just-in-time compiler `pypy`_ using the -Server.py script allows for a significant speed-up. :py:mod:`lsp` (as of now, this is just a stub!) provides data classes that resemble the typescript-interfaces of the `language server protocol specification`_. :py:mod:`stringview` defines a low level class that provides views on slices of strings. It is used by the :py:mod:`parse`-module to avoid excessive copying of data when slicing strings. (Python always creates a copy of the data when slicing strings as a design decision.) If any, this module can significantly be sped up by compiling it with cython_. (Use the ``cythonize_stringview``-script in DHParser's main directory or, even better, compile (almost) all modules with the ``build_cython-modules``-script. This yields a 2-3x speed increase.) :py:mod:`toolkit` various little helper functions for DHParser. Usually, there is no need to call any of these directly. Module ``ebnf`` --------------- .. automodule:: ebnf :members: Module ``nodetree`` ------------------- .. automodule:: nodetree :members: Module ``transform`` -------------------- .. automodule:: transform :members: Module ``compile`` ------------------ .. automodule:: compile :members: Module ``parse`` ---------------- .. automodule:: parse :members: Module ``dsl`` -------------- .. automodule:: dsl :members: Module ``preprocess`` --------------------- .. automodule:: preprocess :members: Module ``error`` ---------------- .. automodule:: error :members: Module ``testing`` ------------------ .. automodule:: testing :members: Module ``trace`` ---------------- .. automodule:: trace :members: Module ``log`` -------------- .. automodule:: log :members: Module ``configuration`` ------------------------ .. automodule:: configuration :members: Module ``server`` ----------------- .. automodule:: server :members: Module ``lsp`` -------------- .. automodule:: lsp :members: Module ``stringview`` --------------------- .. automodule:: stringview :members: Module ``toolkit`` ------------------ .. automodule:: toolkit :members: Module ``versionnumber`` ------------------------ .. automodule:: versionnumber :members: .. _pyparsing: https://github.com/pyparsing/pyparsing/ .. _lark: https://github.com/lark-parser/lark .. _cython: https://cython.org/ .. _`language server`: https://langserver.org/ .. _`language server protocol`: https://microsoft.github.io/language-server-protocol/ .. _`language server protocol specification`: https://microsoft.github.io/language-server-protocol/specifications/specification-current/ .. _EBNF: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form .. _`visitor pattern`: https://en.wikipedia.org/wiki/Visitor_pattern .. _pypy: https://www.pypy.org/