|
|
6 months ago | |
|---|---|---|
| .. | ||
| README.md | 7 months ago | |
| __init__.py | 7 months ago | |
| build-all-docs.sh | 6 months ago | |
| build_command_docs.py | 7 months ago | |
| build_configuration_docs.py | 6 months ago | |
| build_function_docs.py | 6 months ago | |
| command_docs_common.py | 7 months ago | |
| configuration_docs_common.py | 7 months ago | |
| extract-all-docs.sh | 6 months ago | |
| extract_command_docs.py | 7 months ago | |
| extract_configuration_docs.py | 7 months ago | |
| extract_function_docs.py | 7 months ago | |
| function_docs_common.py | 7 months ago | |
| gen_docs_generic.py | 7 months ago | |
README.md
How the documentation is generated
Overview
To reduce maintenance burden, most of the documentation is automatically generated by extracting values from the source. This is done dynamically i.e. it requires pwndbg to be loaded. For pwndbg to be initialized properly it needs to be loaded in the context of some debugger. However, pwndbg state can depend very much on which debugger is being used. Importantly some debuggers may not see all commands/configuration options/convenience functions - it may be possible that no debugger sees everything (although that is incorrect at the time of writing as GDB does in fact see everything). Some discussion on this topic can be seen in issue #2955.
To get around this what we do is run pwndbg in every debugger we support and extract all the information that debuggers sees. Then we run scripts that combine all that extracted information and build documentation markdown files from them. These scripts don't need pwndbg to be loaded, and they aren't ran in the context of any debugger, but rather as standalone python scripts.
Architecture
The way ./scripts/generate-docs.sh works is by invoking a script that performs extraction over all doc-relevant information and a script that builds the markdown from that information. The ./scripts/verify-docs.sh script only differs in the fact that it sets an environment variable that tells the build pipeline to verify instead of update (the extraction pipeline isn't affected).
Extraction is performed by running the extract_*_docs.py scripts from each supported debugger. Each extraction script operates in three phases. First, it collects all relevant pwndbg-defined objects using its extract_[commands/params/functions]() function. Next it cleans up that data and packages it up in a specialized dataclass (e.g. ExtractedParam) using its distill_sources() function. Finally, the dataclasses are converted to dictionaries and saved into a json file (in total, 2*3=6 json files are generated). Using a dataclass as a middle-step makes sure all our json files are well-formed.
Building is performed by running the build_*_docs.py scripts. Each of them is run only once as a normal python script, not once-per-debugger. At the start, the json files are read and loaded into the specialized dataclass type. All the data is combined and checked for any inconsistancies between the debuggers. Then, markdown files are generated using that data. If the build scripts are operating in update-mode they will overwrite the markdown files on disk, if they are operating in verify-mode they will exit with a non-zero exit status if the contents of the files on disk isn't the same as the markdown that the script generated. An exception to this rule is that command documentation file have a special section which allows for hand-written documentation that appears only on the website and not in any debugging session.
The code isn't well-optimized, but the function of each part of its pipeline should be relatively easy to understand with the current architecture.