Arabic Decoding for Games & Game Engines – Part 4: Lam-Aleph

We ended up in the previous part with a correctly rendered Arabic, that is draw from Right to Left with correct formations for characters based on their location in the word and based on the surrounding of each character. But there were some missing characters, why that?

If you look carefully, you may notice that there is a consistency in here!

The missing character from the glyph atlas is always a character that is following a ligatures, or to be exact a character that is following a Lam-Aleph (character Lam compound withe character Aleph), and this comes from the fact that i mentioned i’m using a cache list of codepoints per string (from previous part in the series it was CodepointsCache and CodepointsCachePreFormations). This cache is initialized in fixed size per string, so if we manipulated the string, the cache size remains the same size, so in the case of let’s say a word of 6 characters from our example الكلام, and it has a Lam [idx 3] followed by Aleph [idx 4], when we do merge both in formation step, we end up with a word of 5 characters, not 6, but the cache at my case remains considering the string made of 6.

ا ل ك ل ا م………………..الكلام

Post-Formation………………………………………………………………………….Pre-Formation

5 Characters……………………………………………………………………………….6 Characters

When we cache the string codepoints at creation of new UIText object, that codepoints included Lam and Aleph sequences at some sequential indices in the list. But when we do the formation, we do replace one of these two with a ligature Lam-Aleph (mix of the two) leaving one of them (the Aleph) spot in the cache but filled with an invalid codepoint (-100) because it is not something we want to draw anymore, and this was intentional, as i don’t want to remove anything from the cache to not pay the cost of resizing, and to keep it’s size matching the string the UIText object is holding which makes drawing easier. So the rule in this cache design, is as long as the string itself did not change i don’t change the size of the cache, and even if the string changed and it is the same size of the previous one or shorter, we still don’t resize the cache, we will just update the values in there.

When we bake the codepoints glyphs to the atlas, we don’t bake the glyph for the Aleph because in the cache it is not Aleph anymore, it is -100, and the same thing when we draw, we don’t find it in the font glyphs data & the atlas, then we draw a missing character.

Thankfully the fix for this is very simple, that when we draw, we just make an exception for any -100 that is following a Lam-Alpeh, as by now we know that this will be always the case due to the fixed cache size, that every Lam-Aleph will have an invalid codepoint next to it, that is a representation of the ancestor Aleph that was exist in the string before we do combine an Aleph with a Lam to a Lam-Aleph.

C++
...
//some code before that

//these are used to hold the advance through the loop to the next vertices
f32 _x, _y;

//loop through all characters of the given string
for (u32 c = 0; c < _strLength; ++c)
{
//some code
...

  //if this is aleph 1575 (replaced with -100 while formting the lam-aleph sequence) and before it was lam-aleph 65276, this mean it is an extra, no need to process it.
//we do this here, coz we have the cache in fixed size & the aleph always left dangling in there insted of removing the entire index or set it to -100 or 0 or some invalid value
  u32 _codepointBeforeThis = c - 2;
  if (_codepointBeforeThis > 0 && _codepointBeforeThis < Text->CodepointsCache.size())
  {
	  if (_codepoint == -100 && (Text->CodepointsCache[_codepointBeforeThis] == ARABIC_UNICODE_LETTER_LAM_ALEPH || Text->CodepointsCache[_codepointBeforeThis] == ARABIC_UNICODE_LETTER_LAM_ALEPH_ISOLATED))
	  {
		  _codepoint = 0;//this not matter, we won't use the value anyways, can remove that 
		
		  //skip & advance, like if we processed & draw it, but we did not and don't want to draw or count that
		  c += _advance;
		  continue;
	  }
  }
}

...
//get the glyphs and generate the index & vertex buffers and then draw (Part 2 of the article series)

And with this little change (that is required only if you using a similar cache technique that is not changing size), when try to render the exact same text, we get it just perfect!

With that looking pretty cool, i think it is the time for us to change the test text to something that is more meaningful, and at the same time including more language features, such as Diacritics!

What about this common one from google (i don’t have diacritics on my keyboard):

“اللَّهُمَّ إِنِّي أَسْأَلُكَ رِضَاكَ وَالجَنَّةَ ، وَأَعُوذُ بِكَ مِنْ سَخَطِكَ وَالنَّارِ”

Let’s replace the text in the test UIText object and kick start the engine to see how pretty 🤩 it will look like…

Ouch…🫤

This is not looking right, yes we’ve got some diacritics, but everything broke again! Let’s see why this happen & how to fix it in the next part.

-m