Tidy up some development documentation (#3004)

* fixup some markdown issues in improving-annotations.md * tidy up dev notes * Update docs/contributing/improving-annotations.md --------- Co-authored-by: Disconnect3d <dominik.b.czarnota@gmail.com>
7 months ago · 50912d40af
parent 5c596ef8f4
commit 50912d40af
4 changed files with 90 additions and 58 deletions
--- a/docs/contributing/2-dev-notes.md
+++ b/docs/contributing/2-dev-notes.md
@ -1,16 +1,5 @@
 # Developer Notes

-## Triggers
-
-TODO: If we want to do something when user changes config/theme - we can do it defining a function and decorating it
-with `pwndbg.config.Trigger`.
-
-## Porting public tools
-
-If porting a public tool to pwndbg, please make a point of crediting the original author. This can be added to
-[CREDITS.md](https://github.com/pwndbg/pwndbg/blob/dev/CREDITS.md) noting the original author/inspiration, and linking to the original tool/article. Also please
-be sure that the license of the original tool is suitable to porting into pwndbg, such as MIT.
-
 ## Random developer notes

 Feel free to update the list below!
@ -23,13 +12,13 @@ Feel free to update the list below!

 * Use `aglib` instead of `gdblib`, as the latter is [in the process of being removed](https://github.com/pwndbg/pwndbg/issues/2489). Both modules should have nearly identical interfaces, so doing this should be a matter of typing `pwndbg.aglib.X` instead of `pwndbg.gdblib.X`. Ideally, an issue should be opened if there is any functionality present in `gdblib` that's missing from `aglib`.

-* We have our own `pwndbg.config.Parameter` - all of our parameters can be seen using `config` or `theme` commands.
+* We have our own `pwndbg.config.Parameter` - read about it in [Adding a Configuration Option](adding-a-parameter.md).

 * The dashboard/display/context we are displaying is done by `pwndbg/commands/context.py` which is invoked through GDB's and LLDB's prompt hook, which are defined, respectively, in `pwndbg/gdblib/prompt.py` as `prompt_hook_on_stop`, and in `pwndb/dbg/lldb/hooks.py` as `prompt_hook`.

 * We change a bit GDB settings - this can be seen in `pwndbg/dbg/gdb.py` under `GDB.setup` - there are also imports for all pwndbg submodules.

-* Pwndbg has its own event system, and thanks to it we can set up code to be invoked in response to them. The event types and the conditions in which they occurr are defined and documented in the `EventType` enum, and functions are registered to be called on events with the `@pwndbg.dbg.event_handler` decorator. Both the enum and the decorator are documented in `pwndbg/dbg/__init__.py`.
+* pwndbg has its own event system, and thanks to it we can set up code to be invoked in response to them. The event types and the conditions in which they occurr are defined and documented in the `EventType` enum, and functions are registered to be called on events with the `@pwndbg.dbg.event_handler` decorator. Both the enum and the decorator are documented in `pwndbg/dbg/__init__.py`.

 * We have a caching mechanism (["memoization"](https://en.wikipedia.org/wiki/Memoization)) which we use through Python's decorators - those are defined in `pwndbg/lib/cache.py` - just check its usages

@ -46,17 +35,17 @@ Feel free to update the list below!

 ## Support for Multiple Debuggers

-Pwndbg is an tool that supports multiple debuggers, and so using debugger-specific functionality
+Pwndbg is a tool that supports multiple debuggers, and so using debugger-specific functionality
 outside of `pwndbg.dbg.X` is generally discouraged, with one imporant caveat, that we will get into
-later. When adding code to Pwndbg, one must be careful with the functionality being used.
+later. When adding code to pwndbg, one must be careful with the functionality being used.

 ### The Debugger API

 Our support for multiple debuggers is primarily achieved through use of the Debugger API, found
 under `pwndbg/dbg/`, which defines a terse set of debugging primitives that can then be built upon
-by the rest of Pwndbg. It comprises two parts: the interface, and the implementations. The interface
+by the rest of pwndbg. It comprises two parts: the interface, and the implementations. The interface
 contains the abstract classes and the types that lay out the "shape" of the functionality that may
-be used by the rest of Pwndbg, and the implementations, well, _implement_ the interface on top of each
+be used by the rest of pwndbg, and the implementations, well, _implement_ the interface on top of each
 supported debugger.

 As a matter of clarity, it makes sense to think of the Debugger API as a debugger-agnostic version
@ -86,27 +75,64 @@ functionality that is both too broad for a single command, and that can be share
 debuggers. Things like QEMU handling, ELF and dynamic section parsing, operating system functionality,
 disassembly with capstone, heap analysis, and more, all belong in `aglib`.

-In order to facilitate the process of porting Pwndbg to the debugger-agnostic interfaces, and also
+In order to facilitate the process of porting pwndbg to the debugger-agnostic interfaces, and also
 because of its historical roots, `aglib` is intended to export the exact same functionality provided
 by `gdblib`, but on top of a debugger-agnostic foundation.

 If it helps, one may think of `aglib` like a `pwndbglib`. It takes the debugging primitives provided
 by the Debugger API and builds the more complex and interesting bits of functionality found in
-Pwndbg on top of them.
+pwndbg on top of them.

 ### Mappings from GDB and LLDB to the Debugger API

-Here are some things one may want to do, along with how they can be achieved in the LLDB, GDB, and
-Pwndbg Debugger APIs.
-
-| Action | GDB/ | LLDB | Debugger API[^1]                                      |
-| ------ | --- | ---- |-------------------------------------------------------|
-| Setting a breakpoint at an address | `gdb.Breakpoint("*<address>")` | `lldb.target.BreakpointCreateByAddress(<address>)` | `inf.break_at(BreakpointLocation(<address>))`         |
-| Querying for the address of a symbol | `int(gdb.lookup_symbol(<name>).value().address)` | `lldb.target.FindSymbols(<name>).GetContextAtIndex(0).symbol.GetStartAddress().GetLoadAddress(lldb.target)` | `inf.lookup_symbol(<name>)`                           |
-| Setting a watchpoint at an address | `gdb.Breakpoint(f"(char[{<size>}])*{<address>}", gdb.BP_WATCHPOINT)` | `lldb.target.WatchAddress(<address>, <size>, ...)` | `inf.break_at(WatchpointLocation(<address>, <size>))` |
-
-[^1]: Many functions in the Debugger API are accessed through a `Process` object, which is usually
-obtained through `pwndbg.dbg.selected_inferior()`. These are abbreviated `inf` in the table.
+Here are some things one may want to do, along with how they can be achieved in the GDB, LLDB, and
+pwndbg Debugger APIs.
+
+=== "GDB"
+    Setting a breakpoint at an address:
+    ```python
+    gdb.Breakpoint("*<address>")
+    ```
+    Querying for the address of a symbol:
+    ```python
+    int(gdb.lookup_symbol(<name>).value().address)
+    ```
+    Setting a watchpoint at an address:
+    ```python
+    gdb.Breakpoint(f"(char[{<size>}])*{<address>}", gdb.BP_WATCHPOINT)
+    ```
+
+=== "LLDB"
+    Setting a breakpoint at an address:
+    ```python
+    lldb.target.BreakpointCreateByAddress(<address>)
+    ```
+    Querying for the address of a symbol:
+    ```python
+    lldb.target.FindSymbols(<name>).GetContextAtIndex(0).symbol.GetStartAddress().GetLoadAddress(lldb.target)
+    ```
+    Setting a watchpoint at an address:
+    ```python
+    lldb.target.WatchAddress(<address>, <size>, ...)
+    ```
+
+=== "Debugger API"
+    ```python
+    # Fetch a Process object on which we will operate.
+    inf = pwndbg.dbg.selected_inferior()
+    ```
+    Setting a breakpoint at an address:
+    ```python
+    inf.break_at(BreakpointLocation(<address>))
+    ```
+    Querying for the address of a symbol:
+    ```python
+    inf.lookup_symbol(<name>)
+    ```
+    Setting a watchpoint at an address:
+    ```python
+    inf.break_at(WatchpointLocation(<address>, <size>))
+    ```

 ### Exception to use of Debugger-agnostic interfaces

@ -115,6 +141,10 @@ it is generally okay to talk to the debugger directly. However, they must be pro
 debugger-specific and their loading must be properly gated off behind the correct debugger. They
 should ideally be placed in a separate location from the rest of the commands in `pwndbg/commands/`.

+## Porting public tools
+
+If porting a public tool to pwndbg, please make a point of crediting the original author. This can be added to [CREDITS.md](https://github.com/pwndbg/pwndbg/blob/dev/CREDITS.md) noting the original author/inspiration, and linking to the original tool/article. Also please be sure that the license of the original tool is suitable to porting into pwndbg, such as MIT.
+
 ## Minimum Supported Versions

 Our goal is to fully support all Ubuntu LTS releases that have not reached end-of-life, with support for other
--- a/docs/contributing/3-writing-tests.md
+++ b/docs/contributing/3-writing-tests.md
@ -85,7 +85,7 @@ additional customization to GDB settings before starting the binary, but if you

 ## QEMU Tests

-Our `gdb-tests` run in x86. To debug other architectures, we use QEMU for emulation, and attach to its debug
+Our `gdb-tests` run in x86. To debug other architectures, we use QEMU for emulation and attach to its debug
 port. These tests are located in [`tests/qemu-tests/tests/user`](https://github.com/pwndbg/pwndbg/tree/dev/tests/qemu-tests/tests/user). Test creation is
 identical to our x86 tests - create a Python function with a Pytest fixture name as the parameter (it matches
 based on the name), and call the argument to start debugging a binary. The `qemu_assembly_run` fixture takes in
--- a/docs/contributing/adding-a-parameter.md
+++ b/docs/contributing/adding-a-parameter.md
@ -22,7 +22,7 @@ Values explained:
    enum_sequence=[DISABLED, DISABLED_DEADLOCK, ENABLED],
 )
 ```
-To understand it, let's also look at the signature of the `Config.add_param` function defined in `pwndbg.lib.config.py`:
+To understand it, let's also look at the signature of the `Config.add_param` function defined in `pwndbg/lib/config.py`:
 ```python
    def add_param(
        self,
@ -55,29 +55,30 @@ It is therefore recommended to use a noun phrase rather than describe an action.
 The `set_show_doc` argument should be short because it is displayed with the `config` family of commands.
 ```text
 pwndbg> config
-Name                               Documentation                                                            Value (Default) 
+Name                               Documentation                                                            Value (Default)
 ----------------------------------------------------------------------------------------------------------------------------
-ai-anthropic-api-key               Anthropic API key                                                        '' 
-ai-history-size                    maximum number of questions and answers to keep in the prompt            3 
-ai-max-tokens                      the maximum number of tokens to return in the response                   100 
-ai-model                           the name of the large language model to query                            'gpt-3.5-turbo' 
-ai-ollama-endpoint                 Ollama API endpoint                                                      '' 
-ai-openai-api-key                  OpenAI API key                                                           '' 
-ai-show-usage                      whether to show how many tokens are used with each OpenAI API call       off 
-ai-stack-depth                     rows of stack context to include in the prompt for the ai command        16 
-ai-temperature                     the temperature specification for the LLM query                          0 
-attachp-resolution-method          how to determine the process to attach when multiple candidates exists   'ask' 
-auto-explore-auxv                  stack exploration for AUXV information; it may be really slow            'warn' 
-auto-explore-pages                 whether to try to infer page permissions when memory maps are missing    'warn' 
-auto-explore-stack                 stack exploration; it may be really slow                                 'warn' 
-auto-save-search                   automatically pass --save to "search" command                            off 
-bn-autosync                        whether to automatically run bn-sync every step                          off 
+ai-anthropic-api-key               Anthropic API key                                                        ''
+ai-history-size                    maximum number of questions and answers to keep in the prompt            3
+ai-max-tokens                      the maximum number of tokens to return in the response                   100
+ai-model                           the name of the large language model to query                            'gpt-3.5-turbo'
+ai-ollama-endpoint                 Ollama API endpoint                                                      ''
+ai-openai-api-key                  OpenAI API key                                                           ''
+ai-show-usage                      whether to show how many tokens are used with each OpenAI API call       off
+ai-stack-depth                     rows of stack context to include in the prompt for the ai command        16
+ai-temperature                     the temperature specification for the LLM query                          0
+attachp-resolution-method          how to determine the process to attach when multiple candidates exists   'ask'
+auto-explore-auxv                  stack exploration for AUXV information; it may be really slow            'warn'
+auto-explore-pages                 whether to try to infer page permissions when memory maps are missing    'warn'
+auto-explore-stack                 stack exploration; it may be really slow                                 'warn'
+auto-save-search                   automatically pass --save to "search" command                            off
+bn-autosync                        whether to automatically run bn-sync every step                          off
+[...]
 ```
 Because of the various contexts in which a parameter can be show, the first letter of the `set_show_doc` string should be lowercase (unless the first word is a name or an abbreviation) and there should be no punctuation at the end. This way, pwndbg and gdb can more easily modify the string to fit it into these contexts.
 ## help_docstring
 While `help_docstring` is not mandatory, it is highly recommended to use it. Put a detailed explanation of what the parameter does here, and explain any caveats. This string does not have a size limit and is shown with the following command in GDB:
 ```text
-pwndbg> help set gdb-workaround-stop-event 
+pwndbg> help set gdb-workaround-stop-event
 Set asynchronous stop events to improve 'commands' functionality.
 Note that this may cause unexpected behavior with pwndbg or gdb.execute.

@ -87,7 +88,7 @@ Values explained:
 + `disabled-deadlock` - Disable only deadlock detection; deadlocks may still occur.
 + `enabled` - Enable asynchronous stop events; gdb.execute may behave unexpectedly (asynchronously).

-Default: 'disabled'  
+Default: 'disabled'
 Valid values: 'disabled', 'disabled-deadlock', 'enabled'
 ```
 Note that the last two lines are automatically generated by pwndbg.
@ -96,7 +97,7 @@ Note that the last two lines are automatically generated by pwndbg.

 When writing this explanation, it is important to take into account how it will be displayed [in the documentation](https://pwndbg.re/pwndbg/dev/configuration/) after being parsed as markdown. See what `gdb-workaround-stop-event` looks like here: https://pwndbg.re/pwndbg/dev/configuration/config/#gdb-workaround-stop-event . If there wasn't an empty line between `Values explained:` and ``+ `disabled`..`` the list wouldn't have rendered properly.
 ## param_class
-This argument describes the type of the parameter. It will be used by GDB to perform input validation when the parameter is being set so it is important to set this to the correct value. The possible values are defined in `pwndbg.lib.config`, use the most restrictive one that fits:
+This argument describes the type of the parameter. It will be used by GDB to perform input validation when the parameter is being set so it is important to set this to the correct value. The possible values are defined in `pwndbg/lib/config.py`, use the most restrictive one that fits:
 ```python
 # Boolean value. True or False, same as in Python.
 PARAM_BOOLEAN = 0
@ -171,4 +172,4 @@ def fmt_debug(self, val: str, default: str = "") -> str:
 	else:
 		return default
 ```
-Though you will also see `generateColorFunction(debug_color)(val)` being used in the code to the same effect.
+Though you will also see `generateColorFunction(debug_color)(val)` being used in the code to the same effect.
--- a/docs/contributing/improving-annotations.md
+++ b/docs/contributing/improving-annotations.md
@ -45,9 +45,9 @@ When possible, we code aims to use emulation as little as possible. If there is
 3.     mov    rax, rsi
 ```

-Instruction #1, the `lea` instruction, is already in the past - we pull our enhanced PwndbgInstruction for it from a cache.
+Instruction 1, the `lea` instruction, is already in the past - we pull our enhanced PwndbgInstruction for it from a cache.

-Instruction #2, the first `mov` instruction, is where the host process program counter is at. If we did `stepi` in GDB, this instruction would be executed. In this case, there is two ways we can determine the value that gets written to `rsi`.
+Instruction 2, the first `mov` instruction, is where the host process program counter is at. If we did `stepi` in GDB, this instruction would be executed. In this case, there is two ways we can determine the value that gets written to `rsi`.

 1. After stepping the emulator, read from the emulators `rsi` register.
 2. Given the context of the instruction, we know the value in `rsi` will come from `rax`. We can just read the `rax` register from the host. This avoids emulation.
@ -56,7 +56,7 @@ The decision on which option to take is implemented in the annotation handler fo

 The reason we could do the second option, in this case, is because we could reason about the process state at the time this instruction would execute. This instruction is about to be executed (`Program PC == instruction.address`). We can safely read from `rax` from the host, knowing that the value we get is the true value it takes on when the instruction will execute. It must - there are no instructions in-between that could have mutated `rax`.

-However, this will not be the case while enhancing instruction #3 while we are paused at instruction #2. This instruction is in the future, and without emulation, we cannot safely reason about the operands in question. It is reading from `rsi`, which might be mutated from the current value that `rsi` has in the stopped process (and in this case, we happen to know that it will be mutated). We must use emulation to determine the `before_value` of `rsi` in this case, and can't just read from the host processes register set. This principle applies in general - future instructions must be emulated to be fully annotated. When emulation is disable, the annotations are not as detailed since we can't fully reason about process state for future instructions.
+However, this will not be the case while enhancing instruction 3 while we are paused at instruction 2. This instruction is in the future, and without emulation, we cannot safely reason about the operands in question. It is reading from `rsi`, which might be mutated from the current value that `rsi` has in the stopped process (and in this case, we happen to know that it will be mutated). We must use emulation to determine the `before_value` of `rsi` in this case, and can't just read from the host processes register set. This principle applies in general - future instructions must be emulated to be fully annotated. When emulation is disable, the annotations are not as detailed since we can't fully reason about process state for future instructions.

 ## What if the emulator fails?

@ -123,7 +123,7 @@ Capstone will expose the most "simplified" one possible, and the underlying list

 When encountering an instruction that is behaving strangely (incorrect annotation, or there is a jump target when one shouldn't exist, or the target is incorrect), there are a couple routine things to check.

-1. Use the `dev-dump-instruction` command to print all the enhancement information. With no arguments, it will dump the info from the instruction at the current address. If given an address, it will pull from the instruction cache at the corresponding location.
+1\. Use the `dev-dump-instruction` command to print all the enhancement information. With no arguments, it will dump the info from the instruction at the current address. If given an address, it will pull from the instruction cache at the corresponding location.

 If the issue is not related to branches, check the operands and the resolved values for registers and memory accesses. Verify that the values are correct - are the resolved memory locations correct? Step past the instruction and use instructions like `telescope` and `regs` to read memory and verify if the claim that the annotation is making is correct. For things like memory operands, you can try to look around the resolved memory location in memory to see the actual value that the instruction dereferenced, and see if the resolved memory location is simply off by a couple bytes.

@ -150,7 +150,7 @@ mov qword ptr [rsp], rsi at 0x55555555706c (size=4) (arch: x86)
        Call-like: False
 ```

-2. Use the Capstone disassembler to verify the number of operands the instruction groups.
+2\. Use the Capstone disassembler to verify the number of operands the instruction groups.

 Taken the raw instruction bytes and pass them to `cstool` to see the information that we are working with:

@ -160,11 +160,12 @@ cstool -d mips 0x0400000c

 The number of operands may not match the visual appearance. You might also check the instruction groups, and verify that an instruction that we might consider a `call` has the Capstone `call` group. Capstone is not 100% correct in every single case in all architectures, so it's good to verify. Report a bug to Capstone if there appears to be an error, and in the meanwhile we can create a fix in Pwndbg to work around the current behavior.

-3. Check the state of the emulator.
+3\. Check the state of the emulator.

 Go to [pwndbg/emu/emulator.py](https://github.com/pwndbg/pwndbg/tree/dev/pwndbg/emu/emulator.py) and uncomment the `DEBUG = -1` line. This will enable verbose debug printing. The emulator will print it's current `pc` at every step, and indicate important events, like memory mappings. Likewise, in [pwndbg/aglib/disasm/arch.py](https://github.com/pwndbg/pwndbg/tree/dev/pwndbg/aglib/disasm/arch.py) you can set `DEBUG_ENHANCEMENT = True` to print register accesses to verify they are sane values.

 Potential bugs:
+
 - A register is 0 (may also be the source of a Unicorn segfault if used as a memory operand) - often means we are not copying the host processes register into the emulator. By default, we map register by name - if in pwndbg, it's called `rax`, then we find the UC constant named `U.x86_const.UC_X86_REG_RAX`. Sometimes, this default mapping doesn't work, sometimes do to differences in underscores (`FSBASE` vs `FS_BASE`). In these cases, we have to manually add the mapping.
 - Unexpected crash - the instruction at hand might require a 'coprocessor', or some information that is unavailable to Unicorn (it's QEMU under the hood).
 - Instructions are just no executing - we've seen this in the case of Arm Thumb instructions. There might be some specific API/way to invoke the emulator that is required for a certain processor state.