History

k4lizen 192a87cf41 Various documentation fixups (#3042 ) * pwndbg isn't an lldb plugin, mention bata24/gef, update sizes * rephrase one readme line * make what about... section more concise * add a reference to original blog post * lengthen description for dev-dump-instruction * unbork go debugging page * rename misc -> tutorials * ida integration * update context command description * add video example to context docs * add more clear separation in configuration docs * proper capitalization on GDB, Pwndbg, LLDB in docs * lint :P * upd readme line * fix tests and more clear grammar * use `pwndbg` in gif instead of `gdb --quiet` * update contributing/making-a-gif		6 months ago
..
README.md	Debugger agnostic documentation generation (#2981 )	7 months ago
__init__.py	Debugger agnostic documentation generation (#2981 )	7 months ago
build-all-docs.sh	make the doc scripts runnable from anywhere (#3040 )	6 months ago
build_command_docs.py	Show differences in debuggers in the documentation (#2996 )	7 months ago
build_configuration_docs.py	Various documentation fixups (#3042 )	6 months ago
build_function_docs.py	Various documentation fixups (#3042 )	6 months ago
command_docs_common.py	Debugger agnostic documentation generation (#2981 )	7 months ago
configuration_docs_common.py	Debugger agnostic documentation generation (#2981 )	7 months ago
extract-all-docs.sh	make the doc scripts runnable from anywhere (#3040 )	6 months ago
extract_command_docs.py	Debugger agnostic documentation generation (#2981 )	7 months ago
extract_configuration_docs.py	Debugger agnostic documentation generation (#2981 )	7 months ago
extract_function_docs.py	Debugger agnostic documentation generation (#2981 )	7 months ago
function_docs_common.py	Debugger agnostic documentation generation (#2981 )	7 months ago
gen_docs_generic.py	Debugger agnostic documentation generation (#2981 )	7 months ago

README.md

How the documentation is generated

Overview

To reduce maintenance burden, most of the documentation is automatically generated by extracting values from the source. This is done dynamically i.e. it requires pwndbg to be loaded. For pwndbg to be initialized properly it needs to be loaded in the context of some debugger. However, pwndbg state can depend very much on which debugger is being used. Importantly some debuggers may not see all commands/configuration options/convenience functions - it may be possible that no debugger sees everything (although that is incorrect at the time of writing as GDB does in fact see everything). Some discussion on this topic can be seen in issue #2955.

To get around this what we do is run pwndbg in every debugger we support and extract all the information that debuggers sees. Then we run scripts that combine all that extracted information and build documentation markdown files from them. These scripts don't need pwndbg to be loaded, and they aren't ran in the context of any debugger, but rather as standalone python scripts.

Architecture

The way ./scripts/generate-docs.sh works is by invoking a script that performs extraction over all doc-relevant information and a script that builds the markdown from that information. The ./scripts/verify-docs.sh script only differs in the fact that it sets an environment variable that tells the build pipeline to verify instead of update (the extraction pipeline isn't affected).

Extraction is performed by running the extract_*_docs.py scripts from each supported debugger. Each extraction script operates in three phases. First, it collects all relevant pwndbg-defined objects using its extract_[commands/params/functions]() function. Next it cleans up that data and packages it up in a specialized dataclass (e.g. ExtractedParam) using its distill_sources() function. Finally, the dataclasses are converted to dictionaries and saved into a json file (in total, 2*3=6 json files are generated). Using a dataclass as a middle-step makes sure all our json files are well-formed.

Building is performed by running the build_*_docs.py scripts. Each of them is run only once as a normal python script, not once-per-debugger. At the start, the json files are read and loaded into the specialized dataclass type. All the data is combined and checked for any inconsistancies between the debuggers. Then, markdown files are generated using that data. If the build scripts are operating in update-mode they will overwrite the markdown files on disk, if they are operating in verify-mode they will exit with a non-zero exit status if the contents of the files on disk isn't the same as the markdown that the script generated. An exception to this rule is that command documentation file have a special section which allows for hand-written documentation that appears only on the website and not in any debugging session.

The code isn't well-optimized, but the function of each part of its pipeline should be relatively easy to understand with the current architecture.