Oli's old stuff

Tinkering with retro and electronics

Mar 26, 2023 - 9 minute read - retro electronics hardware pio minicube64 rp2040 raspberry pi pico

Making the Minicube64 on the Raspberry Pi Pico - Part 2

Last time we talked about the initial foray into me trying to make the Minicube64 run on the Raspberry Pi Pico. We got an RGB332 screen running and the boot ROM was running. Can we make it run a real game?

SD Card

Until now the only ROM I have used is a version of the boot.bin ROM embedded in the Pico’s flash. This won’t do for loading a real ROM, so we need a storage solution. I’m choosing to load from SD Card - they’re pretty ubiquitous as a storage device for micro controllers.

I had a Teensy SD Adapter board at hand from a previous project, so decided to use that. This is nice as it uses 3v3 volts directly and takes up barely any space. The SD card interface can use SPI so I wired it up on the Pico’s SPI interface pins.

We only need SCLK, MISO, MOSI and the CS pins - I’m not bothered about card detect (plus I’m not even sure if the Teensy board supports it).

                                PI PICO

                          -------\___/-------
                BLU0  GP0 |  1           40 | +VBUS
                          |                 |
                BLU1  GP1 |  2           39 | +VSYS
                          |                 |
                      GND |  3           38 | GND
                          |                 |
                GRN0  GP2 |  4           37 | 3V3_EN
                          |                 |
                GRN1  GP3 |  5           36 | +3V3
                          |                 |
                GRN2  GP4 |  6           35 | ADC_VREF
                          |                 |
                RED0  GP5 |  7           34 | GP28  N/C
                          |                 |
                      GND |  8           33 | GND
                          |                 |
                RED1  GP6 |  9           32 | GP27  N/C
                          |                 |
                RED2  GP7 | 10           31 | GP26  JOY_7 (MD_SELECT out)
                          |                 |
                HSYNC GP8 | 11           30 | RUN   /RESET
                          |                 |
                VSYNC GP9 | 12           29 | GP22  SD_CD
                          |                 |
                      GND | 13           28 | GND
                          |                 |
               JOY_1 GP10 | 14           27 | GP21  N/C
                          |                 |
               JOY_2 GP11 | 15           26 | GP20  N/C
                          |                 |
               JOY_3 GP12 | 16           25 | GP19  SD_MOSI
                          |                 |
               JOY_4 GP13 | 17           24 | GP18  SD_SCK
                          |                 |
                      GND | 18           23 | GND
                          |                 |
               JOY_6 GP14 | 19           22 | GP17  SD_CS0
                          |                 |
               JOY_9 GP15 | 20           21 | GP16  SD_MISO
                          -------------------

Now; I wanted to be able to read the data off the card as a FAT32 formatted drive - this would let anyone plug it into their PC and drop files on it, so I needed a library. I picked the no-OS-FatFS library by carlk3. It claimed to do everything I needed and seemed to be simple to set up.

The first issue I had was that the library was really unstable; the Pico would randomly crash or hang when reading files. It turns out the culprit was that the DMA IRQ number it uses was conflicting with that of the PicoQVGA library. I decided to go and update my version of PicoQVGA to use DMA IRQ 1 and the SD card would initialize properly every time.

The next thing to note is that FatFS implementation uses ff_fopen, ff_fread, etc instead of the normal fopen type functions of stdio.h. It also declares it’s own FILE ptr, called FF_FILE. I decided to cheat at this point and set up a bunch of #defines so I could reuse the assembler without having to rewrite it.

    #include <ff.h>
    #include <ff_stdio.h>

    #define fopen ff_fopen
    #define fwrite ff_fwrite
    #define fread ff_fread
    #define fclose ff_fclose
    #define ftell ff_ftell
    #define fseek ff_fseek
    #define fflush ff_fflush
    #define fputs ff_fputs
    #define fgets ff_fgets
    #define fprintf ff_fprintf
    #define FILE FF_FILE

    int ff_fflush(FF_FILE* f);
    int ff_fputs(const char* iStr, FF_FILE *pxStream);
    int ff_fprintf(FF_FILE *pxStream, char *c, ...);

It’s super nasty, creates a load of warnings but works.

Loading files

I originally set out to allow loading from source files, which would invoke the assembler. I actually got it working, but found that the whole thing was really slow and really unstable. The asm6f library that Minicube64 uses relies heavily on reading and writing lots of individual file lines and I found that to be really unreliable with the SD card library I am using; I had to drop the baud rate to something like ~9600 to get it work work 75% of the time.

I’m sort of on the fence about whether to include the assembler; if this was a final product most people would want to consume pre-built binary images and not have to assemble it on the device. In my mind this type of workflow sits squarely on the original Minicube64’s shoulders; you are likely editing code and assets - you won’t be doing this on the standalone machine.

With that in mind, I dropped a copy of the DEAD END holiday demo on the SD card (renamed as game.bin) and set my Pico up to run it.

    reset_machine("game.bin");

It actually loads and runs; although it does sit there for a few seconds with a black screen due to the slow speed of the SD Card. I’m running the card at ~115200 baud which seems to be stable enough for reading.

Moving to RGB444

Discussions with aeriform and MonstersGoBoom made me reconsider the use of the 8bpp RGB332 mode. When running the DEAD END demo, it was clear that the colours looked rough compared to the original due to the lack of shades.

Running a full 16bpp RGB out over VGA would need 18 GPIO, far more than I had left. Looking around the schematic, I realised I could move the Mega Drive controller reading into hardware and free up an extra 3 GPIO. The results of that work are discussed in a dedicated post, so go check that out.

The extra 3 GPIO would let me move things around a little but also dedicate an extra bit to each colour channel. I had an unused GPIO nearby, which was enough to give me a 12-bit RGB444.

This gives me a Pico pin mapping of:

                                PI PICO

                          -------\___/-------
                BLU0  GP0 |  1           40 | +VBUS +5V
                          |                 |
                BLU1  GP1 |  2           39 | +VSYS
                          |                 |
                      GND |  3           38 | GND
                          |                 |
                BLU2  GP2 |  4           37 | 3V3_EN
                          |                 |
                BLU3  GP3 |  5           36 | +3V3
                          |                 |
                GRN0  GP4 |  6           35 | ADC_VREF
                          |                 |
                GRN1  GP5 |  7           34 | GP28  N/C
                          |                 |
                      GND |  8           33 | GND
                          |                 |
                GRN2  GP6 |  9           32 | GP27  N/C
                          |                 |
                GRN3  GP7 | 10           31 | GP26  N/C
                          |                 |
                RED0  GP8 | 11           30 | RUN   /RESET
                          |                 |
                RED1  GP9 | 12           29 | GP22  PAD_SEL (MD_SELECT out)
                          |                 |
                      GND | 13           28 | GND
                          |                 |
                RED2 GP10 | 14           27 | GP21  /SH_PL (Shift reg load)
                          |                 |
                RED3 GP11 | 15           26 | GP20  SH_CLK (Shift reg clk)
                          |                 |
               HSYNC GP12 | 16           25 | GP19  SD_MOSI
                          |                 |
               VSYNC GP13 | 17           24 | GP18  SD_SCK
                          |                 |
                      GND | 18           23 | GND
                          |                 |
                SH_D GP14 | 19           22 | GP17  SD_CS0
                          |                 |
                N/C  GP15 | 20           21 | GP16  SD_MISO
                          -------------------

I wired the new signal lines up to the DAC - essentially shifting each bit up and hooking the new lowest bit to a 4K-ish resistor.

    RED0    --|  4 K |--\
    RED1    --|  2 K |---\
    RED2    --|  1K  |------ VGA_R
    RED3    --| 470R |---/

    GRN0    --|  4 K |--\
    GRN1    --|  2 K |---\
    GRN2    --|  1K  |------ VGA_G
    GRN3    --| 470R |---/

    BLU0    --|  4 K |--\
    BLU1    --|  2 K |---\
    BLU2    --|  1K  |------ VGA_B
    BLU3    --| 470R |---/

I changed the pixel macro to pack the colour into an ----RRRR GGGG BBBB format, taking the 4 MSB of each colour now.

    rgb444 = ((r & 0xF0) << 4) | (g & 0xF0) | ((b & 0xF0) >> 4)

Modifying the PicoQVGA code was simple enough; extend the number of GPIO pins to 12, change the BPP to 16 and leave the upper 4 bits unused. I changed the frame buffer from an 8 bit 320x240 size to 16 bit and pushed the code to the pico.

… aaaaand… nothing. The Pico had crashed with an OOM error.

Optimizing the graphics driver

To this point, I was driving the QVGA code from a full 320x240 pixel framebuffer - extending this to 16bits meant it would use 153,600 bytes. The Pico only has 256Kb RAM, so this pushed it over the edge.

I decided to take a different approach and render the screen on-demand from a 64x64 pixel “original” framebuffer.

To do this, we need to understand how the QVGA driver code works. Essentially, every scanline we get a callback to the QVgaLine function. In here it decides whether it’s sending sync signals or pixel data. We need to change how we send the pixels.

    uint16_t* buf_to_use = fbuf;
    if (line >= 56 && line < 56+128)
    {
        if (line % 1 == 0)
        {
            scr_line = (line-56)/2;
            buf_to_use = fbuf;
            uint16_t *scr = g_buffer + (scr_line*64);
            uint16_t *f = fbuf + 92;
            for(int i = 0; i < 64; ++i) {
                *(f++) = *scr;
                *(f++) = *scr;
                ++scr;
            }
        }
    }
    else
    {
        scr_line = 0;
        y_dup = 0;
        buf_to_use = ebuf;
    }

I am maintaining 2 buffers of 320 uint16_t called fbuf and ebuf. The latter is all zeros so that I can quickly emit a black line; fbuf is the buffer used for pixel data.

When I’m drawing between lines 56 and 184 we emit pixel data, when not we emit black. From here, we start copying 2 pixels at a time into the target scanline buffer fbuf, offset by 92. Finally, we only fill this buffer on even lines, meaning the odd lines reuse the last buffer. These offset numbers were chosen as it gives a centred 128x128 pixel screen, which is the original Minicube64 screen at a 2x scale.

Doing this, we take our memory requirements down to just over 1Kb instead of 153Kb - a nice saving.

Booting the Pico I get a much better colour range compared to the RGB332 scheme (let’s ignore that on my first attempt, I had the wrong colours I had a rogue resistor value in the DAC). Here’s how it looks after the colour fix:

It’s worth noting at this point that the code section in QVgaLine is extremely time sensitive; if you run even slightly over it becomes immediately visible on the display. I have a time buffer as I’m drawing the screen 96 pixels in, which seems to give me just enough processing time to deal with the copying to the scanline buffer. If I had a genuine full screen display to deal with, I’d be looking at using a double scanline buffer and filling the off-screen version ahead of time - this is similar to what I did in the Interak-1 like VDU.

Wrapping up

The next thing to deal with is the framerate; during testing around 10-15fps was common - far below the target of 60fps. The video above is my optimized version running at 30fps, it took quite a bit of work to get it to there.

We’ll talk about that next time.