cpct_drawCharM1_f

Prints a ROM character on a given byte-aligned position on the screen in Mode 1 (320x200 px, 4 colours).  It does it ~50% faster than cpct_drawCharM1.

C Definition

void cpct_drawCharM1_f (void* video_memory, u8 fg_pen, u8 bg_pen, u8 ascii)

Input Parameters (5 Bytes)

(2B DE) video_memoryVideo memory location where the character will be drawn
(1B C ) fg_penForeground palette colour index (Similar to BASIC’s PEN, 0-3)
(1B B ) bg_penBackground palette colour index (PEN, 0-3)
(1B A ) asciiCharacter to be drawn (ASCII code)

Assembly call (Input parameters on registers)

call cpct_drawCharM1_f_asm

Parameter Restrictions

  • video_memory could theoretically be any 16-bit memory location.  It will work outside current screen memory boundaries, which is useful if you use any kind of double buffer.  However, be careful where you use it, as it does no kind of check or clipping, and it could overwrite data if you select a wrong place to draw.
  • fg_pen must be in the range [0-3].  It is used to access a colour mask table and, so, a value greater than 3 will return a random colour mask giving unpredictable results (typically bad character rendering, with odd colour bars).
  • bg_pen must be in the range [0-3], with identical reasons to fg_pen.
  • ascii could be any 8-bit value, as 256 characters are available in ROM.

Requirements and limitations

  • Do not put this function’s code below 0x4000 in memory.  In order to read characters from ROM, this function enables Lower ROM (which is located 0x0000-0x3FFF), so CPU would read code from ROM instead of RAM in first bank, effectively shadowing this piece of code.  This would lead to undefined results (typically program would hang or crash).
  • Screen must be configured in Mode 1 (320x200 px, 4 colours)
  • This function requires the CPC firmware to be DISABLED.  Otherwise, random crashes might happen due to side effects.
  • This function disables interrupts during main loop (character printing), and re-enables them at the end.
  • This function will not work from ROM, as it uses self-modifying code.

Details

This function reads a character from ROM and draws it at a given byte-aligned video memory location, that corresponds to the upper-left corner of the character.  As this function assumes screen is configured for Mode 1 (320x200, 4 colours), it means that the character can only be drawn at module-4 pixel columns (0, 4, 8, 12...), because each byte contains 4 pixels in Mode 0.  It prints the character in 2 colours (PENs) one for foreground (fg_pen), and the other for background (bg_pen).

This function does the same as cpct_drawCharM1, but as fast as possible, not taking into account any space constraints.  It is unrolled, which makes it measure a great amount in bytes, but it is ~50% faster than <cpct_drawROMCharM1> The technique used to be so fast is difficult to understand: it uses dynamic code placement.  I will try to sum up it here, and you can always read the detailed comments in the source to get a better understanding.

Basically, what this function does is what follows

1It gets the 8-byte definitions of a character.
2It transforms each byte (a character line) into 2 bytes for video memory (8 pixels, 2 bits per pixel).

The trick is in transforming from 1-byte character-line definition to 2-bytes video memory colours.  As we have only 4 colours per pixel, we have 4 possible transform operations either for foreground colour or for background.  So, we have to do 4 operations for each byte:

1Foreground colour for video byte 1
2Background colour for video byte 1
3Foreground colour for video byte 2
4Background colour for video byte 2

What we do is, instead of adding branching logic to the inner loop that has to select the operation to do for each byte and type, we create 4 8-byte holes in the code that we call “dynamic code sections” (DCS).  Then, we use logic at the start of the routine to select the 4 operations that are required, depending on the selected foreground / background colours.  When we know which operations are to be performed, we fill in the holes (DCS) with the machine code that performs the required operation.  Then, when the inner loop is executed, it does not have to do any branching operations, being much much faster.

The resulting code is very difficult to follow, and very big in size, but when speed is the goal, this is the best approach.

Destroyed Register values

AF, BC, DE, HL

Required memory

349 bytes

Time Measures

  Case     | Cycles | microSecs (us)
------------------------------------
  Best     |  1952  |   488.00
  Worst    |  2670  |   668.50
------------------------------------
Asm saving |   -80  |   -20
------------------------------------
Prints a ROM character on a given byte-aligned position on the screen in Mode 1 (320x200 px, 4 colours).
Prints a ROM character on a given byte-aligned position on the screen in Mode 1 (320x200 px, 4 colours).
unsigned char (u8 = unsigned 8-bits, 1 byte )
Close