Tidy up some development documentation (#3004)

* fixup some markdown issues in improving-annotations.md

* tidy up dev notes

* Update docs/contributing/improving-annotations.md

---------

Co-authored-by: Disconnect3d <dominik.b.czarnota@gmail.com>
pull/3013/head
k4lizen 7 months ago committed by GitHub
parent 5c596ef8f4
commit 50912d40af
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -1,16 +1,5 @@
# Developer Notes
## Triggers
TODO: If we want to do something when user changes config/theme - we can do it defining a function and decorating it
with `pwndbg.config.Trigger`.
## Porting public tools
If porting a public tool to pwndbg, please make a point of crediting the original author. This can be added to
[CREDITS.md](https://github.com/pwndbg/pwndbg/blob/dev/CREDITS.md) noting the original author/inspiration, and linking to the original tool/article. Also please
be sure that the license of the original tool is suitable to porting into pwndbg, such as MIT.
## Random developer notes
Feel free to update the list below!
@ -23,13 +12,13 @@ Feel free to update the list below!
* Use `aglib` instead of `gdblib`, as the latter is [in the process of being removed](https://github.com/pwndbg/pwndbg/issues/2489). Both modules should have nearly identical interfaces, so doing this should be a matter of typing `pwndbg.aglib.X` instead of `pwndbg.gdblib.X`. Ideally, an issue should be opened if there is any functionality present in `gdblib` that's missing from `aglib`.
* We have our own `pwndbg.config.Parameter` - all of our parameters can be seen using `config` or `theme` commands.
* We have our own `pwndbg.config.Parameter` - read about it in [Adding a Configuration Option](adding-a-parameter.md).
* The dashboard/display/context we are displaying is done by `pwndbg/commands/context.py` which is invoked through GDB's and LLDB's prompt hook, which are defined, respectively, in `pwndbg/gdblib/prompt.py` as `prompt_hook_on_stop`, and in `pwndb/dbg/lldb/hooks.py` as `prompt_hook`.
* We change a bit GDB settings - this can be seen in `pwndbg/dbg/gdb.py` under `GDB.setup` - there are also imports for all pwndbg submodules.
* Pwndbg has its own event system, and thanks to it we can set up code to be invoked in response to them. The event types and the conditions in which they occurr are defined and documented in the `EventType` enum, and functions are registered to be called on events with the `@pwndbg.dbg.event_handler` decorator. Both the enum and the decorator are documented in `pwndbg/dbg/__init__.py`.
* pwndbg has its own event system, and thanks to it we can set up code to be invoked in response to them. The event types and the conditions in which they occurr are defined and documented in the `EventType` enum, and functions are registered to be called on events with the `@pwndbg.dbg.event_handler` decorator. Both the enum and the decorator are documented in `pwndbg/dbg/__init__.py`.
* We have a caching mechanism (["memoization"](https://en.wikipedia.org/wiki/Memoization)) which we use through Python's decorators - those are defined in `pwndbg/lib/cache.py` - just check its usages
@ -46,17 +35,17 @@ Feel free to update the list below!
## Support for Multiple Debuggers
Pwndbg is an tool that supports multiple debuggers, and so using debugger-specific functionality
Pwndbg is a tool that supports multiple debuggers, and so using debugger-specific functionality
outside of `pwndbg.dbg.X` is generally discouraged, with one imporant caveat, that we will get into
later. When adding code to Pwndbg, one must be careful with the functionality being used.
later. When adding code to pwndbg, one must be careful with the functionality being used.
### The Debugger API
Our support for multiple debuggers is primarily achieved through use of the Debugger API, found
under `pwndbg/dbg/`, which defines a terse set of debugging primitives that can then be built upon
by the rest of Pwndbg. It comprises two parts: the interface, and the implementations. The interface
by the rest of pwndbg. It comprises two parts: the interface, and the implementations. The interface
contains the abstract classes and the types that lay out the "shape" of the functionality that may
be used by the rest of Pwndbg, and the implementations, well, _implement_ the interface on top of each
be used by the rest of pwndbg, and the implementations, well, _implement_ the interface on top of each
supported debugger.
As a matter of clarity, it makes sense to think of the Debugger API as a debugger-agnostic version
@ -86,27 +75,64 @@ functionality that is both too broad for a single command, and that can be share
debuggers. Things like QEMU handling, ELF and dynamic section parsing, operating system functionality,
disassembly with capstone, heap analysis, and more, all belong in `aglib`.
In order to facilitate the process of porting Pwndbg to the debugger-agnostic interfaces, and also
In order to facilitate the process of porting pwndbg to the debugger-agnostic interfaces, and also
because of its historical roots, `aglib` is intended to export the exact same functionality provided
by `gdblib`, but on top of a debugger-agnostic foundation.
If it helps, one may think of `aglib` like a `pwndbglib`. It takes the debugging primitives provided
by the Debugger API and builds the more complex and interesting bits of functionality found in
Pwndbg on top of them.
pwndbg on top of them.
### Mappings from GDB and LLDB to the Debugger API
Here are some things one may want to do, along with how they can be achieved in the LLDB, GDB, and
Pwndbg Debugger APIs.
| Action | GDB/ | LLDB | Debugger API[^1] |
| ------ | --- | ---- |-------------------------------------------------------|
| Setting a breakpoint at an address | `gdb.Breakpoint("*<address>")` | `lldb.target.BreakpointCreateByAddress(<address>)` | `inf.break_at(BreakpointLocation(<address>))` |
| Querying for the address of a symbol | `int(gdb.lookup_symbol(<name>).value().address)` | `lldb.target.FindSymbols(<name>).GetContextAtIndex(0).symbol.GetStartAddress().GetLoadAddress(lldb.target)` | `inf.lookup_symbol(<name>)` |
| Setting a watchpoint at an address | `gdb.Breakpoint(f"(char[{<size>}])*{<address>}", gdb.BP_WATCHPOINT)` | `lldb.target.WatchAddress(<address>, <size>, ...)` | `inf.break_at(WatchpointLocation(<address>, <size>))` |
[^1]: Many functions in the Debugger API are accessed through a `Process` object, which is usually
obtained through `pwndbg.dbg.selected_inferior()`. These are abbreviated `inf` in the table.
Here are some things one may want to do, along with how they can be achieved in the GDB, LLDB, and
pwndbg Debugger APIs.
=== "GDB"
Setting a breakpoint at an address:
```python
gdb.Breakpoint("*<address>")
```
Querying for the address of a symbol:
```python
int(gdb.lookup_symbol(<name>).value().address)
```
Setting a watchpoint at an address:
```python
gdb.Breakpoint(f"(char[{<size>}])*{<address>}", gdb.BP_WATCHPOINT)
```
=== "LLDB"
Setting a breakpoint at an address:
```python
lldb.target.BreakpointCreateByAddress(<address>)
```
Querying for the address of a symbol:
```python
lldb.target.FindSymbols(<name>).GetContextAtIndex(0).symbol.GetStartAddress().GetLoadAddress(lldb.target)
```
Setting a watchpoint at an address:
```python
lldb.target.WatchAddress(<address>, <size>, ...)
```
=== "Debugger API"
```python
# Fetch a Process object on which we will operate.
inf = pwndbg.dbg.selected_inferior()
```
Setting a breakpoint at an address:
```python
inf.break_at(BreakpointLocation(<address>))
```
Querying for the address of a symbol:
```python
inf.lookup_symbol(<name>)
```
Setting a watchpoint at an address:
```python
inf.break_at(WatchpointLocation(<address>, <size>))
```
### Exception to use of Debugger-agnostic interfaces
@ -115,6 +141,10 @@ it is generally okay to talk to the debugger directly. However, they must be pro
debugger-specific and their loading must be properly gated off behind the correct debugger. They
should ideally be placed in a separate location from the rest of the commands in `pwndbg/commands/`.
## Porting public tools
If porting a public tool to pwndbg, please make a point of crediting the original author. This can be added to [CREDITS.md](https://github.com/pwndbg/pwndbg/blob/dev/CREDITS.md) noting the original author/inspiration, and linking to the original tool/article. Also please be sure that the license of the original tool is suitable to porting into pwndbg, such as MIT.
## Minimum Supported Versions
Our goal is to fully support all Ubuntu LTS releases that have not reached end-of-life, with support for other

@ -85,7 +85,7 @@ additional customization to GDB settings before starting the binary, but if you
## QEMU Tests
Our `gdb-tests` run in x86. To debug other architectures, we use QEMU for emulation, and attach to its debug
Our `gdb-tests` run in x86. To debug other architectures, we use QEMU for emulation and attach to its debug
port. These tests are located in [`tests/qemu-tests/tests/user`](https://github.com/pwndbg/pwndbg/tree/dev/tests/qemu-tests/tests/user). Test creation is
identical to our x86 tests - create a Python function with a Pytest fixture name as the parameter (it matches
based on the name), and call the argument to start debugging a binary. The `qemu_assembly_run` fixture takes in

@ -22,7 +22,7 @@ Values explained:
enum_sequence=[DISABLED, DISABLED_DEADLOCK, ENABLED],
)
```
To understand it, let's also look at the signature of the `Config.add_param` function defined in `pwndbg.lib.config.py`:
To understand it, let's also look at the signature of the `Config.add_param` function defined in `pwndbg/lib/config.py`:
```python
def add_param(
self,
@ -55,29 +55,30 @@ It is therefore recommended to use a noun phrase rather than describe an action.
The `set_show_doc` argument should be short because it is displayed with the `config` family of commands.
```text
pwndbg> config
Name Documentation Value (Default)
Name Documentation Value (Default)
----------------------------------------------------------------------------------------------------------------------------
ai-anthropic-api-key Anthropic API key ''
ai-history-size maximum number of questions and answers to keep in the prompt 3
ai-max-tokens the maximum number of tokens to return in the response 100
ai-model the name of the large language model to query 'gpt-3.5-turbo'
ai-ollama-endpoint Ollama API endpoint ''
ai-openai-api-key OpenAI API key ''
ai-show-usage whether to show how many tokens are used with each OpenAI API call off
ai-stack-depth rows of stack context to include in the prompt for the ai command 16
ai-temperature the temperature specification for the LLM query 0
attachp-resolution-method how to determine the process to attach when multiple candidates exists 'ask'
auto-explore-auxv stack exploration for AUXV information; it may be really slow 'warn'
auto-explore-pages whether to try to infer page permissions when memory maps are missing 'warn'
auto-explore-stack stack exploration; it may be really slow 'warn'
auto-save-search automatically pass --save to "search" command off
bn-autosync whether to automatically run bn-sync every step off
ai-anthropic-api-key Anthropic API key ''
ai-history-size maximum number of questions and answers to keep in the prompt 3
ai-max-tokens the maximum number of tokens to return in the response 100
ai-model the name of the large language model to query 'gpt-3.5-turbo'
ai-ollama-endpoint Ollama API endpoint ''
ai-openai-api-key OpenAI API key ''
ai-show-usage whether to show how many tokens are used with each OpenAI API call off
ai-stack-depth rows of stack context to include in the prompt for the ai command 16
ai-temperature the temperature specification for the LLM query 0
attachp-resolution-method how to determine the process to attach when multiple candidates exists 'ask'
auto-explore-auxv stack exploration for AUXV information; it may be really slow 'warn'
auto-explore-pages whether to try to infer page permissions when memory maps are missing 'warn'
auto-explore-stack stack exploration; it may be really slow 'warn'
auto-save-search automatically pass --save to "search" command off
bn-autosync whether to automatically run bn-sync every step off
[...]
```
Because of the various contexts in which a parameter can be show, the first letter of the `set_show_doc` string should be lowercase (unless the first word is a name or an abbreviation) and there should be no punctuation at the end. This way, pwndbg and gdb can more easily modify the string to fit it into these contexts.
## help_docstring
While `help_docstring` is not mandatory, it is highly recommended to use it. Put a detailed explanation of what the parameter does here, and explain any caveats. This string does not have a size limit and is shown with the following command in GDB:
```text
pwndbg> help set gdb-workaround-stop-event
pwndbg> help set gdb-workaround-stop-event
Set asynchronous stop events to improve 'commands' functionality.
Note that this may cause unexpected behavior with pwndbg or gdb.execute.
@ -87,7 +88,7 @@ Values explained:
+ `disabled-deadlock` - Disable only deadlock detection; deadlocks may still occur.
+ `enabled` - Enable asynchronous stop events; gdb.execute may behave unexpectedly (asynchronously).
Default: 'disabled'
Default: 'disabled'
Valid values: 'disabled', 'disabled-deadlock', 'enabled'
```
Note that the last two lines are automatically generated by pwndbg.
@ -96,7 +97,7 @@ Note that the last two lines are automatically generated by pwndbg.
When writing this explanation, it is important to take into account how it will be displayed [in the documentation](https://pwndbg.re/pwndbg/dev/configuration/) after being parsed as markdown. See what `gdb-workaround-stop-event` looks like here: https://pwndbg.re/pwndbg/dev/configuration/config/#gdb-workaround-stop-event . If there wasn't an empty line between `Values explained:` and ``+ `disabled`..`` the list wouldn't have rendered properly.
## param_class
This argument describes the type of the parameter. It will be used by GDB to perform input validation when the parameter is being set so it is important to set this to the correct value. The possible values are defined in `pwndbg.lib.config`, use the most restrictive one that fits:
This argument describes the type of the parameter. It will be used by GDB to perform input validation when the parameter is being set so it is important to set this to the correct value. The possible values are defined in `pwndbg/lib/config.py`, use the most restrictive one that fits:
```python
# Boolean value. True or False, same as in Python.
PARAM_BOOLEAN = 0
@ -171,4 +172,4 @@ def fmt_debug(self, val: str, default: str = "") -> str:
else:
return default
```
Though you will also see `generateColorFunction(debug_color)(val)` being used in the code to the same effect.
Though you will also see `generateColorFunction(debug_color)(val)` being used in the code to the same effect.

@ -45,9 +45,9 @@ When possible, we code aims to use emulation as little as possible. If there is
3. mov rax, rsi
```
Instruction #1, the `lea` instruction, is already in the past - we pull our enhanced PwndbgInstruction for it from a cache.
Instruction 1, the `lea` instruction, is already in the past - we pull our enhanced PwndbgInstruction for it from a cache.
Instruction #2, the first `mov` instruction, is where the host process program counter is at. If we did `stepi` in GDB, this instruction would be executed. In this case, there is two ways we can determine the value that gets written to `rsi`.
Instruction 2, the first `mov` instruction, is where the host process program counter is at. If we did `stepi` in GDB, this instruction would be executed. In this case, there is two ways we can determine the value that gets written to `rsi`.
1. After stepping the emulator, read from the emulators `rsi` register.
2. Given the context of the instruction, we know the value in `rsi` will come from `rax`. We can just read the `rax` register from the host. This avoids emulation.
@ -56,7 +56,7 @@ The decision on which option to take is implemented in the annotation handler fo
The reason we could do the second option, in this case, is because we could reason about the process state at the time this instruction would execute. This instruction is about to be executed (`Program PC == instruction.address`). We can safely read from `rax` from the host, knowing that the value we get is the true value it takes on when the instruction will execute. It must - there are no instructions in-between that could have mutated `rax`.
However, this will not be the case while enhancing instruction #3 while we are paused at instruction #2. This instruction is in the future, and without emulation, we cannot safely reason about the operands in question. It is reading from `rsi`, which might be mutated from the current value that `rsi` has in the stopped process (and in this case, we happen to know that it will be mutated). We must use emulation to determine the `before_value` of `rsi` in this case, and can't just read from the host processes register set. This principle applies in general - future instructions must be emulated to be fully annotated. When emulation is disable, the annotations are not as detailed since we can't fully reason about process state for future instructions.
However, this will not be the case while enhancing instruction 3 while we are paused at instruction 2. This instruction is in the future, and without emulation, we cannot safely reason about the operands in question. It is reading from `rsi`, which might be mutated from the current value that `rsi` has in the stopped process (and in this case, we happen to know that it will be mutated). We must use emulation to determine the `before_value` of `rsi` in this case, and can't just read from the host processes register set. This principle applies in general - future instructions must be emulated to be fully annotated. When emulation is disable, the annotations are not as detailed since we can't fully reason about process state for future instructions.
## What if the emulator fails?
@ -123,7 +123,7 @@ Capstone will expose the most "simplified" one possible, and the underlying list
When encountering an instruction that is behaving strangely (incorrect annotation, or there is a jump target when one shouldn't exist, or the target is incorrect), there are a couple routine things to check.
1. Use the `dev-dump-instruction` command to print all the enhancement information. With no arguments, it will dump the info from the instruction at the current address. If given an address, it will pull from the instruction cache at the corresponding location.
1\. Use the `dev-dump-instruction` command to print all the enhancement information. With no arguments, it will dump the info from the instruction at the current address. If given an address, it will pull from the instruction cache at the corresponding location.
If the issue is not related to branches, check the operands and the resolved values for registers and memory accesses. Verify that the values are correct - are the resolved memory locations correct? Step past the instruction and use instructions like `telescope` and `regs` to read memory and verify if the claim that the annotation is making is correct. For things like memory operands, you can try to look around the resolved memory location in memory to see the actual value that the instruction dereferenced, and see if the resolved memory location is simply off by a couple bytes.
@ -150,7 +150,7 @@ mov qword ptr [rsp], rsi at 0x55555555706c (size=4) (arch: x86)
Call-like: False
```
2. Use the Capstone disassembler to verify the number of operands the instruction groups.
2\. Use the Capstone disassembler to verify the number of operands the instruction groups.
Taken the raw instruction bytes and pass them to `cstool` to see the information that we are working with:
@ -160,11 +160,12 @@ cstool -d mips 0x0400000c
The number of operands may not match the visual appearance. You might also check the instruction groups, and verify that an instruction that we might consider a `call` has the Capstone `call` group. Capstone is not 100% correct in every single case in all architectures, so it's good to verify. Report a bug to Capstone if there appears to be an error, and in the meanwhile we can create a fix in Pwndbg to work around the current behavior.
3. Check the state of the emulator.
3\. Check the state of the emulator.
Go to [pwndbg/emu/emulator.py](https://github.com/pwndbg/pwndbg/tree/dev/pwndbg/emu/emulator.py) and uncomment the `DEBUG = -1` line. This will enable verbose debug printing. The emulator will print it's current `pc` at every step, and indicate important events, like memory mappings. Likewise, in [pwndbg/aglib/disasm/arch.py](https://github.com/pwndbg/pwndbg/tree/dev/pwndbg/aglib/disasm/arch.py) you can set `DEBUG_ENHANCEMENT = True` to print register accesses to verify they are sane values.
Potential bugs:
- A register is 0 (may also be the source of a Unicorn segfault if used as a memory operand) - often means we are not copying the host processes register into the emulator. By default, we map register by name - if in pwndbg, it's called `rax`, then we find the UC constant named `U.x86_const.UC_X86_REG_RAX`. Sometimes, this default mapping doesn't work, sometimes do to differences in underscores (`FSBASE` vs `FS_BASE`). In these cases, we have to manually add the mapping.
- Unexpected crash - the instruction at hand might require a 'coprocessor', or some information that is unavailable to Unicorn (it's QEMU under the hood).
- Instructions are just no executing - we've seen this in the case of Arm Thumb instructions. There might be some specific API/way to invoke the emulator that is required for a certain processor state.

Loading…
Cancel
Save