Getting Started

Much of this lab will require observing, capturing, and manipulating network requests. On this page, we’ll talk about how to go about doing this.

Browser Network Inspector

All standard browsers come with network inspectors designed to analyze what network requests a site is making. Let’s start by getting familiar with how these work.

Go to https://piazza.com, sign in, and open up your browser’s network inspector. Here are steps on how to do so for Safari, Chrome, and Firefox.

Click on a post, and then identify the API request being made to fetch that post’s data. This may require a little bit of probing around: you’re going to want to try to look for the “Response” tab of each network request to see which request contained the post content within the Response.

Sometimes, it’s helpful to do a global search: Command-Shift-F (or equivalent) will do the trick - you can search across all network request responses for a particular string, e.g. “Homework 6 Thread.”

Alternatively, look for one of the /api?method=content.get network requests.

Viewing Requests & Responses

Many (though not all) API endpoints utilize some combination of query string parameters and JSON payload bodies, and return data in JSON-structured responses. The /api?method=content.get endpoint follows this convention.

If you’d like to view the full, raw request, one way to do this is to copy the request as a cURL command by CTRL-clicking the request and selecting “Copy as cURL.” Here’s what the request might look like:

Request

curl 'https://piazza.com/logic/api?method=content.get&aid=l23s5aolhd44' \
-X 'POST' \
-H 'Content-Type: application/json; charset=utf-8' \
-H 'Accept: application/json, text/javascript, */*; q=0.01' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Host: piazza.com' \
-H 'Origin: https://piazza.com' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.3 Safari/605.1.15' \
-H 'Referer: https://piazza.com/class/ky9e8cq86872u?cid=1054' \
-H 'Content-Length: 70' \
-H 'Connection: keep-alive' \
-H 'Cookie: piazza_session=<redacted>; session_id=<redacted>; AWSELB=<redacted>; AWSELBCORS=<redacted>; _ga=<redacted>; last_piaz_user=<redacted>' \
-H 'CSRF-Token: CRvqpF1mpRsg3aHtCnpE62HJ' \
-H 'X-Requested-With: XMLHttpRequest' \
--data-binary '{"method":"content.get","params":{"cid":"1054","nid":"ky9e8cq86872u"}}'

The response might look something like this:

{
    "result": {
        "history_size": 2,
        "folders": [
            "logistics"
        ],
        "nr": 1054,
        ...
        "tags": [
            "instructor-note",
            "logistics",
            "pin"
        ],
        "tag_good": [],
        "unique_views": 291,
        "children": [
            {
            ...
        "tag_good_arr": [],
        "anon_icons": true,
        "id": "l1x4o65lxt02j5",
        "config": {
            "editor": "rte",
            "bypass_email": 1,
            "has_emails_sent": 1
        },
        "status": "active",
        "drafts": null,
        "request_instructor": 0,
        "request_instructor_me": false,
        "bookmarked": 7,
        "num_favorites": 0,
        "my_favorite": false,
        "is_bookmarked": true,
        "is_tag_good": false,
        "q_edits": [],
        "i_edits": [],
        "s_edits": [],
        "t": 1650229685902,
        "default_anonymity": "no"
    },
    "error": null,
    "aid": "l23s5aolhd44"
}

Transforming Requests into Python

Often, it’s helpful to “replay” requests to either scrape data from a site or probe sites for vulnerabilities. To do so, we can use a tool like curlconverter to transform the cURL syntax above into Python syntax that we can then use to make subsequent requests.

Here’s what the request above looks like when transformed into Python Requests syntax:

import requests

cookies = {
    'piazza_session': '<redacted>',
    'session_id': '<redacted>',
    'AWSELB': '<redacted>',
    'AWSELBCORS': '<redacted>',
    '_ga': '<redacted>',
    'last_piaz_user': '<redacted>',
}

headers = {
    'Content-Type': 'application/json; charset=utf-8',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Language': 'en-US,en;q=0.9',
    'Host': 'piazza.com',
    'Origin': 'https://piazza.com',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.3 Safari/605.1.15',
    'Referer': 'https://piazza.com/class/ky9e8cq86872u?cid=1054',
    'Connection': 'keep-alive',
    'CSRF-Token': 'CRvqpF1mpRsg3aHtCnpE62HJ',
    'X-Requested-With': 'XMLHttpRequest',
}

params = {
    'method': 'content.get',
    'aid': 'l23s5aolhd44',
}

json_data = {
    'method': 'content.get',
    'params': {
        'cid': '1054',
        'nid': 'ky9e8cq86872u',
    },
}

response = requests.post('https://piazza.com/logic/api', headers=headers, params=params, cookies=cookies, json=json_data)

We can then recieve the JSON body from the response by using response.json(). (Note that the response may throw an error if one of the parameters was incorrectly formatted or invalid – to check against that, make sure response.status_code == 200.)

And finally, we can parameterize the Python Requests code as follows:

import requests

# Fixed across requests.
cookies = {
    ...
}

# Fixed across requests.
headers = {
    ...
}

# Fixed across `content.get` requests.
def get_content(cid: str, nid: str, aid: str):
    params = {
        'method': 'content.get',
        'aid': aid,
    }

    json_data = {
        'method': 'content.get',
        'params': {
            'cid': cid,
            'nid': nid,
        },
    }

    response = requests.post('https://piazza.com/logic/api', headers=headers, params=params, cookies=cookies, json=json_data)
    return response.json()

Now, we can call the get_content(...) method as many times as we’d like (as long as we’re not rate limited by the server). It feels like we’re tapping directly into the backend!