Grayson's Blog

Wednesday, January 17, 2024

FastAI Course Part 1: Making an Anime Recognizer with Pytorch and FastAI

Beginning the FastAI Practical Deep Learning course

I have recently set out to complete the FastAI course "Practical Deep Learning", which covers many topics from image recognition to stable diffusion. The course is available as a free e-book which is published as a series of Jupyter Notebooks.

In fact, both the course and the FastAI library itself are built on Jupyter Notebooks, a fitting environment for casual deep learning experimentation. However, this can lead to some interesting quirks of the FastAI library, such as their many monkey-patched classes which make debugging difficult or required use of "import *" which obscures source modules. But these annoyances are well worth the benefit of FastAI's many premade and boilerplate systems for loading and tagging data, training models, and previewing results in Jupyter Notebooks.

The course's first chapter covers introductions to machine learning, model training, and convolutional neural networks. At the end of the chapter, the course asks you find your own application for CNN image recognizers to test your new knowledge. I opted to create an Anime Recognizer which will use a pre-trained Resnet34 CNN to distinguish between screenshots from 4 different animes. While much more advanced anime recognition systems already exist online, I thought this would still be a good application for learning on a subject that is fun.

Some other students of the class had created impressive applications of CNNs through techniques such as:

Converting audio waveforms to images and recognizing sounds
Converting mouse movement to images and recognizing humans vs bot users of websites
Recognizing cities by areal photographs

Project plan

(GitHub Project Link)

My assumption was that I could use a CNN image classifier to identify a handful of anime shows based on their stylistic differences. For instance a show like Samurai Champloo has different edges, gradients, and contours from the animation styles in Blue Eye Samurai. My hope was that the CNN would be able to recognize these features and make an accurate guess of which show a screenshot is taken from. The faces of characters in the shows may also appear as layers in the CNN.

My implementation plan was as follows:

Torrent video files of 3 shows
Create a system to capture standardized screenshots from the video files using ffmpeg-python and Pillow
Load the screenshots as a labeled training and validation data set
Train both a blank and pre-trained resnet34 model on the training data
Evaluate its performance

Creating a training data set

As with any AI project, the most difficult step of this process would be aquiring and creating the training data for the CNN. I was able to find complete video files for all 3 animes online, and after some troubleshooting with ffmpeg, I was able to reliably capture square screenshots from the center of each video file. I opted to take a screenshot for every 10 seconds of video to ensure a variety of scenes. I also set a maximum number of screenshots from a single video file, ensuring that the training data captured at least 6 video files for each show. Overall, my training data set was 1800 items, or 600 images per show.

Training the model

While training the model, I was interested in understanding the difference between a "blank" resnet34 CNN vs a pre-trained CNN. (In this case, Pytorch has a resnet34 model pre-trained on the ImageNet-1K dataset. I trained 10 epochs on the blank model and fine-tuned 5 epochs on top of the pre-trained model.

Results

As you can see from the results in the screenshot below, the pretrained model (pretrained on IMAGENET1K_V2) out-performed the blank model, even when additional epochs were trained on the blank model. This is likely due to either over-fitting on the blank model, transfer learning with the pretrained model, or a combination of both factors.

The fine-tuned model achieved 95% accuracy after 5 epochs and the blank model achieved 81% accuracy after 10 epochs.

Monday, October 30, 2023

SDLGL: Audio system upgrade and Animalese PoC

SDLGL

As you may have seen from my projects overview post, SDLGL is my pet-project game engine written in C++ for 2D game development.

As a re-cap, it has the following features and is a fun way for me to explore modern C++ programming and game design patterns.

A simple and extensible Entity + Scene organization
A simple 'Update and Render' game loop
Audio mixing for sound effects and music
Easily configurable (JSON-defined) resources supporting:
- Animated sprites
- Static textures
- Sound effects
- Music tracks
Collision detection system for rotatable rectangles
Font renderer
Debugging UI elements
Access to the SDL2 rendering context
Multiple example programs demonstrating each engine feature

You can read more about it and see example programs I've made on the GitHub page.

Idea: Animalese-style dialogue animation

A focus of my game development ambitions is the genre of "cozy games". Some examples of cozy include Animal Crossing, Stardew Valley, and Stray.

A common feature of these games is a dialogue system that substitutes real speech with babbling sounds, giving a "cute" effect when characters are speaking.

For example, see how speech is handled for robot characters in Stray:

Animal Crossing also takes this approach. Their language is even given an official name: Animalese:

I wanted to try my hand at implementing a similar dialogue system that I could use for one of my games, modeled off of the Animal Crossing's Animalese.

The challenge: SDL_Mixer

Creating proof-of-concepts like this one are great for pushing the limitations of your game engine. In my case, I found that SDL_Mixer, the simple mixing library that I used for mixing audio samples in SDLGL, was missing some features that I would need to build my Animalese prototype.

Specifically, I needed to dynamically set the pitch of audio samples. It would also be nice to have access to an unlimited number of software-defined channels as opposed to the 16 channels that SDL_Mixer allows for.

I searched for the best replacement to SDL_Mixer and came across OpenAL. OpenAL is a popular choice for indie developers given it's lack of license fees and decent feature set. However, OpenAL seemed almost too heavy-weight for the scope of my game engine. Moreover, the most popular open-source implementation of OpenAL, openal-soft, is licensed as LGPLv2, which does not allow for static linking without releasing your own program under LGPL.

Instead, I opted for the lighter-weight and MIT-no-attribution-licensed miniaudio project by David Reid. It offers 3D spatialization of audio, as well as build-in pitching effects and an API for custom effects. Additionally, it allows me to use the SDL audio device for playback, which reduces compatibility issues for my games. (Specifically, I know that SDL's audio device is supported by WebAssembly.)

The result: Animalese demo using SDLGL with new miniaudio backend

GitHub Repository Link

Using Animalese as my testbed, I re-wrote the audio system of SDLGL to use miniaudio in place of SDL_Mixer. It was a relatively easy process, with the key difference that miniadio has its own resource manager, allowing me to remove sound file management from SDLGL's resource manager.

While I kept the typewriter effect, I opted for a simple face-card in place of Animal Crossing's 3D cutscenes. My approach is inspired by more dialogue-driven games like Ace Attorney:

To create the effect, I used a simple approach that I've adapted from GitHub user Acedio's AnimaleseJS project. Each letter of the alphabet is given a short speaking sound (such as "a", "be", "ki"). Then the sentence being spoken is shortened to just the first letter of each word and puctuation is removed.

So for example, "Hello, World!" becomes "h w".

Then, the sound for each character in the string is played with a random variation in pitch give a natural speech-like pattern.

You can see a screen recording of the dialogue here. (to be uploaded)

A screenshot of the Animalese demo

You can also try interacting with the demo yourself by building it from source. :)

Wednesday, July 26, 2023

Photography Series: New York

In this first post of my new photography series, I'm showcasing photos I've taken in NYC from 2022 through June of 2023. See my previous post about my camera here.

For the best viewing experience, click on the first image to open a carousel.

A couple in Sheep Meadow
Credit to Yash for noticing the sunbeam while we were out in Central Park together
Olympus Zuiko 100-200mm f/4 on Fujifilm X-E1

A different couple in Sheep Meadow
Olympus Zuiko 100-200mm f/4 on Fujifilm X-E1

Times Square after rain
Olympus Zuiko 100-200mm f/4 on Fujifilm X-E1

Devika in Hanoi House
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

A korean corn dog from Two Hands at St. Marks Place
Sadly Two Hands closed down :(
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Cherry blossoms in the Brooklyn Botanic Garden
You'd think I would have brought color film to the cherry blossoms, but Ilford was the only film in stock.
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Ilford HP5 Plus 200

A building on Wall St.
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Devika in Soho
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Devika in Soho again
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Devika at pride in June 2022
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Devika at pride in June 2023
Olympus Zuiko 50mm f/3.5 on Fujifilm X-E1

Cherry blossoms in Sakura Park with Cathedral of St. John the Divine
The accidental exposure to this roll of film ended up looking cool.
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Waving mural in St. Marks Place
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Dive bar bathroom in East Village
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Trinity Church seen from Wall St.
Is this photo, I capture traditional faith overtaken by modern capitalism :)
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Upper West Side buildings
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

One Liberty Plaza, formerly the U.S. Steel building in FiDi
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Frank Gehry's tower in FiDi
Olympus Zuiko 50mm f/3.4 on Olympus OM-2 with Fujicolor Superia 400

Tuesday, May 30, 2023

New photography series

A new series of posts for my photography

Since 2021, I've been leaning into photography as a hobby. Starting with my father's Olympus OM-2 35mm film camera and a reusable plastic film camera, I have set out to capture scenes I find beautiful in New York and abroad. This post is an introduction to a new series I'm planning to showcase my best photos, explaining my cameras, film stocks, and developers.

Camera 1: Olympus OM-2 with assorted lenses

My primary camera for taking photos is my father's old Olympus OM-2, a 35mm automatic-exposure film camera originally released in 1975. I love the weight and solid feeling of the metal body and all-metal components. I have collected 3 lenses from the OM series:

Olympus Zuiko 50mm f/3.4 (for most scenes, macro, and portraits)
Olympus Zuiko 100-200mm f/5
Olympus Zuiko 35mm f/?

So far, this camera has proven to be a great all-around system. It is just small enough that I can carry it on a shoulder strap while exploring new cities but offers all the features and interchangeable lenses that I want. (Plus it's cool to be using my dad's high-school camera.)

Sadly, this camera's light meter and timing calibration are in need of service, so it'll be awhile before I can take new photos.

Camera 2: Reusable plastic 35mm camera with flash (e.g. the party camera)

I've opted to leave this as more of a category than a specific model of camera, because I have gone through about 3 since I started in 2021. I'm still yet to find one with a flash that lasts longer than 2-3 months of use. (The ones on the market now like the Kodak Ultra F9 are especially bad.)

The main purpose of this camera is to have a small, cheap flash film camera to take to parties and night events, where my OM-2 would be unwieldy (and flash-less).

In hopes of a better flash system and with the exciting prospect of getting 2x photos per roll, I am planning to try the Kodak Ektar H35 half-frame camera next.

Film Stocks

With the explosion in popularity of film photography since 2019/2020, finding (reasonably priced) film stock is harder than ever. In the past all of my photography has been with Fujifilm Fujicolor Superia 400. On a few occasions, cost and availability has pushed me to try Ilford SFX 200 and HP5 400 speed film, with only limited success.

Recently though, I have wanted to try alternatives to Fujicolor's green tint, and plan to try Kodak stocks such as UltraMax 400 and Portra 800.

Film developers

I have yet to take the very deep and expensive plunge into developing and/or scanning my own film, so for now I rely on film labs to do it for me. So far I have tried the following labs:

Photoreal, downtown Brooklyn (low-quality scans)
Eliz Digital, Manhattan Chinatown (low-quality scans)
Bleeker Digital, Nolita (slightly better but still low-quality scans)
State Film Lab, Louisville KY (best quality scans, slow turn-around, requires shipping)

Prioritizing scan quality, I have decided to stick with State Film Lab, as they're the only lab capable of producing scans which show individual film grains.

Future posts

In the future, I plan to release some of my past photography in a series of posts to this blog.

Thursday, May 25, 2023

Hello, Inky: Hello World with the Inky 7.3

Creating a personal dashboard with the Inky Frame 7.3

I am working to create a personal e-ink dashboard with the Pimorioni Inky Frame 7.3, a 7.3" color e-ink programmable display with integrated Raspberry Pi Pico. As a first step in this process, I set out to learn about the board and create a basic Hello World program.

Limitations of the Inky Frame and Rapsberry Pi Pico

It's a microcontroller board.

When I ordered the Inky Frame, I didn't fully appreciate that the Pico is a microcontroller board as opposed to a fully fledged computer like a Raspberry Pi Model B. This means that instead of running a full operating system, it runs code directly on its two ARM Cortex-M0+ cores. With limited RAM and CPU power and without the operating system to manage networking, peripherals, and task scheduling, "simple" tasks like fetching data from APIs and loading images become more complicated. I will be writing programs in either MicroPython (for readability and ease of use) or in C++ (for performance). These will have limited build-in functionality compared to a full Python 3.x environment.

Pimoroni's graphics library is limited.

The Picographics module maintained by Pimoroni supports only basic rendering functionality. It can only render JPG images or 8x8px sprites from a specially loaded sprite sheet, and text can only be rendered as a bitmap or a jagged Hershey vector font. If I want to support more advanced rendering, I will need to get creative and build abstractions to overcome this limited feature set.

The refresh rate is slow.

The Inky Frame 7.3 has a slow refresh time of 40 seconds in full color. This rules out any applications that would refresh every minute, such as a wall clock program. Instead I will optimize for use cases that require less refreshing, such as displaying unread emails and news headlines.

Creating the Hello World

Fortunately, the 2MB of memory on the Pico are pre-flashed by Pimoroni with the Micropython interpreter and a set of demo programs. After running a few of them, I got to reading the example code and the picographics docs. While the examples feel a bit scrappy, I was able to begin extracting their functionality into a new module I call Display, which I plan to use as the base of my abstractions and utility functions while creating my programs for the Inky Frame.

You can view the module here (linked to the commit, to show it at this point in time).

I created a basic demo mode which uses only the bitmap text rendering feature to create an artistic repeated word display.

An example of the text-based demo mode.

Next steps: Network requests

For the next step in this project, I will test out the Pico's ability to make network requests, parse JSON response bodies, and display the results.

Tuesday, May 16, 2023

Reviving this blog

Now that I'm working full time, I need a source of inspiration to see my personal projects through to fruition. Inspired by the updates I gave for my PyGame video game Explore back as a teenager, I would like to renew this blog as a place to give updates on my projects, even if my day-to-day is very different from the simple days of Explore :)

With this said, I have a number of unfinished projects in the works that I would like to see completed:

Color E-Ink Display

I recently purchased a 7.3" e-ink display from Pimoroni (Inky Frame 7.3" (Pico W Aboard)). It has an onboard Raspberry Pi Pico with WiFi and Bluetooth radios. I've decided to use it to create a helpful dashboard-type display to mount on the wall next to my standing desk. I hope to use it to display useful things like unread emails, Slack notifications, weather forecast, and more.

The Inky display sits on my shelf, with the factory image still displayed.

Facial Recognition Door Lock v2.0

Back in freshman year of college, I created a facial recognition automatic door lock for my dorm room. It used a Raspberry Pi model 2B with a Pi Camera Module to perform face recognition using OpenCV and the program was written in C++, as no Python bindings were available for ARM at that time. Its camera poked through the eye-hole in the door and a heavy-duty servo manipulated a 3D-printed assembly that could rotate the door's deadbolt lock in either position.

Now, 6 years later, I hope to recreate the success of my first door lock with new, upgraded tools. My hope is to learn about the advancements in facial recognition technology and write about my experience then vs. now.

Somehow I have no photos or videos of the door lock in my dorm, but I did find this photo of myself trying to compile OpenCV from source for ARM. (There were no distributables back then.) You had to adjust the compilation settings so that the gcc on the Pi would not run out of memory and crash.

GPT and ElevenLabs powered voice chat agent with personality

Thanks to OpenAI and ElevenLabs, the technology exists to create RoBro, your very own robotic bro. Extremely realistic inflections and emotional speech by ElevenLabs should allow us to load a Raspberry Pi with some speakers, a microphone, and a dude-bro personality. Other personality types, like an angry Roomba or seductive smart-door-lock may also be entertaining.

SDLGL: My pet game engine

SDLGL is a 2D game library that I have worked on on-and-off since 2017. Inspired by my humble beginnings using Allegro and PyGame, but frustrated with their limited capabilities, I have set out to create my own engine that uses SDL to interact with PC graphics, input, and sound. All other abstractions for game creation such as scene and entity management, collision, physics, animation, etc. are written from scratch in C++.

I see the project as a great way to learn in-depth about C++, game programming design patterns, and how to manage programming complexity as a project grows over time.

So far, I have created a simple top-down tank shooting game using the engine, but there is still lots of work and refactoring to be done before a useable game can be created in SDLGL.

A screenshot of my tank shooter POC game.

Looking forward

Clearly, there are a lot of projects I need to wrap up! I hope to post regular updates on my progress here, so that I feel inspired (or at least pressured) to finish them all.

-Grayson

Sunday, November 10, 2013

Explore v1.4

Removed the horrid block collision and added and Item Manager module along with a test item, a Coin. You can collide with coins, at which point the coins will disappear. Check out the code and compile it yourself via Dropbox: https://www.dropbox.com/sh/z9bmw4q3jkm4b2m/T25PLJps1k

5 coins added for demonstration. Colliding with them causes them to disappear.