07 - Asynchronous Execution for Request-Heavy Exploit Stages

A practical examination of when and how asynchronous execution improves exploit performance. This article focuses on comparing linear and asynchronous approaches and highlighting common async pitfalls

Any situation where many requests are required in order to brute force a token, extract information from a database blindly, or perform any other attack that is I/O bound makes asyncio or threading worth considering. While a purely linear approach that makes one request at a time will work and, assuming the exploit code is correct, will eventually produce the desired result, the bottleneck is almost always waiting. By using async in a request-heavy stage, those requests can be sent without waiting for the previous one to return. Instead of idling while the code waits on network responses, useful work continues.

Async in Python allows many tasks to run “at the same time” within a single thread by pausing and resuming execution instead of blocking it. Under the hood, async tasks register work with an event loop that keeps track of all pending operations. Tasks are resumed when they are ready rather than being blocked by other unfinished tasks. When a function is defined with async def, it becomes a coroutine, meaning it can be paused and resumed. When an awaitable is awaited, execution of that coroutine is paused, allowing the event loop to schedule other tasks instead of sitting idle.

If async is still unfamiliar, the official Python asyncio documentation and the first few sections of the Real Python async guide provide a solid overview of the syntax and mental model. Nothing in this article requires deep knowledge of event loop internals. There are also plenty of high-quality talks and walkthroughs available. While taking the OSWE, I found the ArjanCodes YouTube channel particularly helpful; it covers asyncio along with broader discussions on writing clearer, more maintainable Python

To reiterate, async is not parallel execution. Only one piece of Python code is running at any given moment, and this does not bypass the GIL. In environments where an offensive workstation is running inside a virtual machine, heavily threaded programs may perform poorly compared to a single-threaded program that makes efficient use of the event loop. A useful way to think about async is traffic control. When a task yields to allow another task to run, it does not lose its progress. When it resumes, it continues from where it left off. The event loop tracks which task runs next and preserves the state of each paused task.

Tasks also do not need to execute in the order they were created. Out-of-order execution is expected. In practical terms, this means one request may succeed while many others performing the same check are still queued or in progress. Continuing to wait for those remaining tasks wastes time and resources during that exploit stage. Async provides mechanisms for cancelling pending work and shutting down cleanly once the objective is met.

Programmers without prior async exposure often approach brute forcing as a linear sequence of requests, looping until the search space is exhausted or a successful result is found. For example, a PoC may generate a list of URLs with candidate tokens embedded in a GET request. To find the correct token, one might write a function like the following:

def sync_validate_token(urls: list[str]) -> str | None:
    client = httpx.Client(timeout=2.0)
    for url in urls:
        response = client.get(url)
        if response.status_code == 200:
            return url
    return None

Using functions like this are perfectly reasonable. If you run this as part of a PoC, you just need to wait for the one URL that is validated. Notice that each request imposes a blocking in the code execution. During the loop, the request will take some finite amount of time to receive the response. Let's say that there are 100 URLs that need testing and each response takes 1 second. 100 seconds max to test all the tokens. With async, after a request is sent and waits for a response, other requests are queued and sent, each awaiting for a response. This will cut down the waiting time significantly.

At this point, it’s tempting to reach for asyncio.gather() and consider the problem solved. After all, if the bottleneck is waiting on network I/O, then firing off all requests asynchronously should be faster. A naïve async implementation often looks like creating a list of requests, awaiting them all at once, and then checking the results afterward.

async def naive_async_validate(urls: list[str]) -> str | None:
    async with httpx.AsyncClient(timeout=2.0) as client:
        tasks = [client.get(url) for url in urls]

        responses = await asyncio.gather(*tasks)

        for response in responses:
            if response.status_code == 200:
                return str(response.request.url)

    return None

The problem with this approach is subtle but important. While the requests are sent concurrently, the code still waits for every request to complete before continuing. Even if the correct token is discovered early, the event loop continues to wait on hundreds or thousands of unnecessary requests. In practice, this often performs no better—and sometimes worse—than a linear implementation, especially when the target server processes requests serially or enforces its own rate limits.

This is a common early mistake when learning async: assuming that concurrency alone is enough. In exploit development, especially during brute-force or blind extraction stages, early exit matters just as much as concurrency. If the code cannot stop outstanding work once the goal is achieved, most of the potential performance gains are lost.

I'm going to make a demonstration here that you can run on your own if so inclined. I made a design decision that deviates from most production ready web servers, but I did so to illustrate how async can be faster than without. The web server introduced below can only accept one request at a time. It would be tempting to disable threading entirely by running Flask with threaded=False. While that would avoid the race condition in this specific setup, it hides the underlying problem rather than solving it. Any real deployment, or even a small configuration change, could reintroduce concurrency. By protecting shared state explicitly with a lock, the behavior remains correct regardless of how the server is run.

import random
import threading

from flask import Flask, jsonify, request

app = Flask(__name__)
state_lock = threading.Lock()

def generate_secret():
    number = random.randint(0, 5000)
    new_secret = str(number).zfill(4)
    print(f"[-] current_secret {new_secret}")
    return new_secret

current_secret = generate_secret()
hit_count = 0

@app.route("/probe", methods=["GET"])
def probe():
    candidate = request.args.get("candidate")

    if not candidate or len(candidate) != 4:
        return "Parameter error", 404

    ip_address = request.remote_addr

    with state_lock:
        global current_secret, hit_count
        if candidate != current_secret:
            return "Parameter error", 404

        hit_count += 1
        print(f"[+] candidate found {candidate} from {ip_address} ({hit_count}/2)")

        if hit_count == 2:
            print("[+] hit limit reached, rotating secret")
            current_secret = generate_secret()
            hit_count = 0

        return jsonify({"message": "secret found"}), 200


if __name__ == "__main__":
    app.run(port=8000)

What this server does is generate a 4-digit token between 0000 and 5000. The client code then attempts to recover that token in two ways:

A purely linear brute-force approach
A bounded asynchronous approach In each run, the linear method searches first. Once the token is found, the asynchronous method performs the same search against the endpoint. After both methods succeed, the server rotates the token and the process repeats for a total of ten runs. Each attempt records the elapsed time from the start of the search until the correct token is discovered.

To avoid unbounded request flooding, the asynchronous implementation limits the number of concurrent tasks using a controlled worker pool. Concurrency starts at five workers and increases incrementally up to fifty, allowing us to observe how performance scales and where diminishing returns begin.

import argparse
import asyncio
import sys
import time

import httpx


def create_list(min: int, max: int) -> list[str]:
    numbers = list(range(min, max + 1))
    return [str(number).zfill(4) for number in numbers]


def generate_urls(target: str, port: int, tokens: list) -> list[str]:
    url_partial = f"http://{target}:{port}/probe?candidate="
    return [f"{url_partial}{token}" for token in tokens]


def sync_validate_token(urls: list[str]) -> str | None:
    client = httpx.Client(timeout=2.0)
    for url in urls:
        response = client.get(url)
        if response.status_code == 200:
            return url

    return None


async def spray_token(
    urls: list[str],
    concurrency: int = 10,
) -> str | None:
    queue: asyncio.Queue[str] = asyncio.Queue()
    for url in urls:
        queue.put_nowait(url)

    found_event = asyncio.Event()
    result: dict[str, str | None] = {"url": None}

    async with httpx.AsyncClient(timeout=2.0) as client:

        async def worker(worker_id: int):
            while not found_event.is_set():
                try:
                    url = queue.get_nowait()
                except asyncio.QueueEmpty:
                    return

                try:
                    response = await client.get(url)
                    if response.status_code == 200:
                        result["url"] = url
                        found_event.set()
                        return
                except httpx.RequestError:
                    pass
                finally:
                    queue.task_done()

        tasks = [asyncio.create_task(worker(i)) for i in range(concurrency)]

        await asyncio.wait(
            tasks,
            return_when=asyncio.FIRST_COMPLETED,
        )

        # Cancel remaining workers
        for task in tasks:
            task.cancel()

    return result["url"]


def summarize(name: str, timings: list[float]):
    avg = sum(timings) / len(timings)
    print(f"\n{name} results over {len(timings)} runs:")
    print(f"  min: {min(timings):.4f}s")
    print(f"  max: {max(timings):.4f}s")
    print(f"  avg: {avg:.4f}s")


def parse_args():
    parser = argparse.ArgumentParser(description="OSWE Application Exploit.")

    # --- Target options ---
    target_group = parser.add_argument_group("Target options")
    target_group.add_argument(
        "--target-ip", type=str, required=True, help="Target server IP address"
    )
    target_group.add_argument(
        "--target-port",
        type=int,
        default=80,
        help="Target web frontend port (default: 80)",
    )

    parser.add_argument(
        "--concurrency", type=int, default=5, help="Number of concurrent tasks to run"
    )
    parser.add_argument("--runs", type=int, default=10, help="Number of benchmark runs")

    return parser.parse_args()


async def main():
    args = parse_args()
    print(f"Target IP: {args.target_ip}")
    print(f"Target Port: {args.target_port}")
    print(f"Concurrency: {args.concurrency}")
    print(f"Runs: {args.runs}")

    token_list = create_list(0, 5000)
    urls = generate_urls(args.target_ip, args.target_port, token_list)

    # linear_times: list[float] = []
    # async_times: list[float] = []

    for j in range(args.concurrency, args.concurrency + 50, 5):
        print(f"\n--- Concurrency {j} ---")
        linear_times: list[float] = []
        async_times: list[float] = []

        time.sleep(5.0)

        for i in range(1, args.runs + 1):
            print(f"\n--- Run {i}/{args.runs} ---")

            # --- Linear ---
            start = time.perf_counter()
            linear_result = sync_validate_token(urls)
            end = time.perf_counter()

            if linear_result is None:
                print("[!] Linear: token not found")
                sys.exit(1)

            linear_time = end - start
            linear_times.append(linear_time)
            print(f"[+] Linear: {linear_time:.4f}s → {linear_result}")

            time.sleep(5.0)

            # --- Async ---
            start = time.perf_counter()
            async_result = await spray_token(urls, j)
            end = time.perf_counter()

            if async_result is None:
                print("[!] Async: token not found")
                sys.exit(1)

            async_time = end - start
            async_times.append(async_time)
            print(f"[+] Async:  {async_time:.4f}s → {async_result}")

        # --- Summary ---
        summarize("Linear", linear_times)
        summarize("Async", async_times)

        speedup = (
            (sum(linear_times) / sum(async_times))
            if sum(async_times) > 0
            else float("inf")
        )
        print(f"\nOverall speedup: {speedup:.2f}×")


if __name__ == "__main__":
    asyncio.run(main())

I ran the web server on a Raspberry Pi on my home network at IP address 192.168.1.30. The client machine was also on the same network using WiFi. This setup introduced realistic network latency. Running the server on the same machine resulted in average latencies around 0.2 ms, compared with roughly 4.3 ms over the network. Even this is still far more responsive than a typical Internet-hosted service, but it provides a more honest baseline than localhost testing.

uv run poc-brute-force-secret.py --target-ip 192.168.1.30 --target-port 9000 --concurrency 5 --runs 10 > runs.txt

The goal of the client code is not to be clever, but to be honest about the work being performed. Both approaches generate the same candidate space, hit the same endpoint, and stop as soon as the correct value is observed. The only difference is how requests are issued and managed. The linear version is intentionally straightforward: one request at a time, blocking until a response is received. This establishes a baseline that is easy to reason about and verify. The asynchronous version does not attempt to do everything at once. Instead, it uses a bounded worker model backed by a queue and an early-exit signal. This ensures outstanding work is cancelled as soon as the objective is achieved, rather than continuing to consume time and network resources. In practice, this mirrors how exploit code should behave: aggressive enough to make progress quickly, but controlled enough to stop immediately when the condition you care about is met.

Sample from run.txt:

Target IP: 192.168.1.30
Target Port: 9000
Concurrency: 5
Runs: 10

--- Concurrency 5 ---

--- Run 1/10 ---
[+] Linear: 43.6225s → http://192.168.1.30:9000/probe?candidate=2140
[+] Async:  12.5030s → http://192.168.1.30:9000/probe?candidate=2140

--- Run 2/10 ---
[+] Linear: 56.3574s → http://192.168.1.30:9000/probe?candidate=2755
[+] Async:  17.1905s → http://192.168.1.30:9000/probe?candidate=2755

<SNIP>

--- Run 10/10 ---
[+] Linear: 80.0850s → http://192.168.1.30:9000/probe?candidate=3117
[+] Async:  20.6680s → http://192.168.1.30:9000/probe?candidate=3117

Linear results over 10 runs:
  min: 21.7939s
  max: 101.9704s
  avg: 64.2009s

Async results over 10 runs:
  min: 7.3436s
  max: 43.9945s
  avg: 20.6406s

Overall speedup: 3.11×

--- Concurrency 10 ---

<SNIP>

Concurrency

Avg Linear (s)

Avg Async (s)

Speedup

64.20

20.64

3.11×

48.38

11.16

4.33×

59.42

14.07

4.22×

66.82

15.87

4.21×

51.75

11.98

4.32×

42.65

10.43

4.09×

50.71

11.81

4.29×

65.76

16.40

4.01×

38.33

9.20

4.17×

46.65

11.72

3.98×

Across all tested concurrency levels, the asynchronous approach consistently outperformed the linear brute-force method by roughly 3 to 4 times, with peak gains appearing between 10 and 35 concurrent tasks. Beyond that range, additional concurrency produced diminishing returns and increased variance, likely due to server-side contention and scheduling overhead. Importantly, even at higher concurrency levels, async never regressed to linear performance, reinforcing that the gains come from eliminating idle wait time rather than raw parallelism.

In the final article of this series, I apply these same ideas to a more realistic and punishing problem: blind SQL injection. Rather than brute-forcing a small search space, we will extract data one bit or character at a time and examine how different strategies affect execution time. I will start with a purely linear approach, move to binary search techniques, and then introduce asynchronous variants of both. The focus is not on clever payloads, but on how algorithm choice and request orchestration dominate performance in latency-bound exploit scenarios.

Previous06 - One-Shot Web Servers for Payload Delivery and Callbacks Next08 Time-Based Blind SQLi - Linear, Binary, and Async Extraction

Last updated 26 days ago