An attempt at creating "ASCII" Art but with Unicode characters


Following a quick read of this article on Medium [4], I felt tempted to try my hand at making ASCII art. Well, ASCII art but including Urdu/Arabic characters in it. So in other words, not “ASCII” art at all. This means we’re going beyond the realm of a restrictive 127 character set.

Making ASCII art has been tried and done hundreds of times. For this project, I used C++ (my go-to programming language). Upon researching the working principle behind ASCII art [10] and how others approached this in C# [2], Python [4] and JavaScript [3] , I found that implementing an intensity-dependent program is the simplest solution. It can be summed up in 4 steps.


1) Set a char array with several characters sorted by “intensity”

2) Get a grayscale, contrast-adjusted image.

3) Loop through each pixel and select a corresponding character for each pixel, based on the grayscale value (0-255).

4) Print out the corresponding characters.

I used the SFML library to load the image, because I’ve used it frequently in the past. (Also, because if I attempt any future improvements, they’ll be easier to make using SFML). Initially, I was tempted to just print out the characters directly in a textured sprite and then save it as an image. But then I thought that would be plain boring. I wanted the ability to copy and paste the text anywhere I want. Think of those Youtube comments consisting of the suprised Pikachu meme (in ASCII form).

The image was loaded using the code below:

//Instantiate an image variable using the SFML sf::Image class
    sf::Image image;

    //path to your image
    
    //IMPORTANT:- Your image should preferably be GRAYSCALE
    //         :- This program is CONTRAST DEPENDENT (adjust the contrast in an image editor software.
    //                                                 although, you can try out without adjusting the contrast first.)
     string path="kermiti1.jpg";

    //checks to see if image has correctly loaded
    if (!image.loadFromFile(path))
    {
        return -1;
    }

Now, Unicode supports a wide array of non-English characters and emojis. I found the codepoints of widely used Urdu and Arabic characters on this website. [5] These characters take more than 1 byte, in this case it takes 2 bytes because they belong to the UCS-2 character set. The char data type can only store 1 byte, thus one needs to use wchar_t in its stead. wchar_t is capable storing 4-byte wide characters and can store Unicode characters.[6] In this program, the sequence of characters are stored in a wchar_t data type variable. The sequence I got was from [3], but then I added in some characters namely: (hamza),ه (heh),ݯ (hah with two dots),ب (bay), and ن (noon).

//Ordered Character String obtained from source (URL:https://marmelab.com/blog/2018/02/20/convert-image-to-ascii-art-masterpiece.html)
    //Original Character string with ASCII only characters
    //L"$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,^`'.";

    //The one I used includes several Urdu and Arabic characters
    // \u0621 is ء(hamza)
    // \u0684 is ه (heh)
    // \u076f is ݯ (hah with two dots) (this character isnt used in the Urdu language)
    // \u0628 is ب (bay)
    // \u0646 is ن (noon)

    wchar_t charset1[]=L"$@B%8&WM\u076f\u0628#*oahkbdpqwmeZO0QLCJUYXzcvunxrjft/\|\u0646()1{}[]?-_+~<>i!lI;:,^\u0621`'\u0684 .";

Since I am using a wchar_t variable and I want to print to a text file, one has to use std::wofstream for a wide character supported file stream. Printing the wchar_t character array directly wouldn’t give me a legible output. Thus, to make sure that the text file displays the correct characters, we need to set the locale of the stream object. The locale is an object which we can use to set how characters are encoded, how dates are displayed, etc.. [8]

If I set the locale to encode the unicode codepoints in UTF-8, then this will make sure that the correct characters are displayed. [7]. UTF-8 was chosen because it can encode up to 4 byte unicode characters. [9] Therefore, it can encode the Urdu/Arabic characters.

//Because we want the Unicode codepoints to be printed properly in our text file, we need to make sure they are encoded
    //in UTF-8, which can handle Unicode characters' encoding.
    //A UTF-8 locale is defined
    const std::locale utf8_locale = std::locale(std::locale(), new std::codecvt_utf8());
    wofstream f("sample.txt"); // we need to use wofstream because wide characters are being printed
    f.imbue(utf8_locale);//the locale is imbued to the stream
    f.clear();  //clears previous entries  in the file

One important thing to mention is that Urdu and Arabic have a cursive script. If two characters are next to each other, they will join together. For instance if you have two of the characters ghayn (غ ) next to each other, it will print out as غغ. This isn’t something you want, because it messes up the end result of the “art”. The output “pixel” location may end up looking darker than it actually was in the image. To avoid this issue, a space is added after each character.

What I have noticed is that because I am using such few Urdu/Arabic characters, it may not look like there's a huge difference between using the simple ASCII character sequence and my character sequence. However, in certain images the difference is slightly more noticeable. For instance, in the panda image below, the darker portions are a bit more jagged, these portions consist mainly of Urdu/Arabic characters. Perhaps, the jaggedness can be attributed to the smaller size of the characters as opposed to their English counterparts. In the future, I may try an all-Urdu/Arabic character sequence to see if the program performs better or worse.

Image Source: bbc.com

There are certain factors that affect how well your output looks. Firstly, the type of font you use is a contributing factor. If you don’t use a monospaced font, your end result may be distorted. Shown below are three examples of when I used Microsoft Sans Serif, Arial and Noto Mono (a monospaced font). Clearly, Noto Mono looks better than the other two.

Using the Arial Font
Using the Microsoft Sans Serif Font
Using the Noto Mono Font

Contrast and brightness are two other factors that have to be considered as well. Currently, the code doesn’t allow the user to alter the contrast and brightness of the image directly. You’re going to have to alter the brightness and contrast by yourself in an Image Editing Software. In my case, I used LibreOffice Draw (not exactly an image editing software). I’m not sure if there is a “formula” or heuristic that would help in determining the best percentage values to get the best looking outcomes. However, the lower the contrast was, the worse my images ended up looking.

The resolution you choose to “pixelate” your image also has a huge role in how well your ASCII art turns out. I’ve seen people use a ratio of 1:2 when it comes to the X:Y resolution. But in my experience, it is not always the case. I’ve found that sometimes both variables were independent of each other. So for me, it wasn’t always a fixed rule. However, note that if your X resolution value is too small, your text will end up overlapping thus, making the end result look messy.

What I have ended up creating is a simple program that converts your image to “ASCII/UNICODE" Art. Hopefully, in the future, I’ll include a GUI that allows you adjust the brightness and contrast of your image, set the X and Y resolutions, use a character set with no-English characters at all and perhaps, print out colored characters corresponding to the pixel's color. In [10], a second approach was discussed whereby the characters are assigned based on their shapes and the area they cover. Which seems to be a better method, maybe I will consider using that in the future.

Here are some examples of what I ended up making. This does count as my contribution for inktober, right?


Image Source: HypeBae

Image Source: Pinterest.com

Image Source: PinClipArt.com

Image Source: Independent.co.uk
Image Source: Pinterest.com

Cyan looking sus.
Image Source: Planetminecraft.com

The full code is shown here below. Here is the link to the Github repository.

#include <locale>
#include <codecvt>
#include <fstream>
#include <SFML/Graphics.hpp>
#include <wchar.h>
using namespace std;

int main()
{   //Instantiate an image variable using the SFML sf::Image class
    sf::Image image;

    //path to your image
    
    //IMPORTANT:- Your image should preferably be GRAYSCALE
    //         :- This program is CONTRAST DEPENDENT (adjust the contrast in an image editor software.
    //                                                 although, you can try out without adjusting the contrast first.)
     string path="kermiti1.jpg";

    //checks to see if image has correctly loaded
    if (!image.loadFromFile(path))
    {
        return -1;
    }
    //samples the pixels at a certain resolution
    //TIP: Every image is different, certain images may print out better with bigger or smaller resolutions
    //Xresolution and Yresolution can be considered independent
    //It varies on a case by case basis, if you dont choose a correct value your image will be WONKY
    int Xresolution=4;
    int Yresolution=2;



    //Ordered Character String obtained from source (URL:https://marmelab.com/blog/2018/02/20/convert-image-to-ascii-art-masterpiece.html)
    //Original Character string with ASCII only characters
    //L"$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,^`'.";

    //The one I used includes several Urdu and Arabic characters
    // \u0621 is ء(hamza)
    // \u0684 is ه (heh)
    // \u076f is ݯ (hah with two dots) (this character isnt used in the Urdu language)
    // \u0628 is ب (bay)
    // \u0646 is ن (noon)

    wchar_t charset1[]=L"$@B%8&WM\u076f\u0628#*oahkbdpqwmeZO0QLCJUYXzcvunxrjft/\|\u0646()1{}[]?-_+~<>i!lI;:,^\u0621`'\u0684 .";
    //char_set will store the character that we want to print out
    wchar_t char_place;
    //Depending on the intensity of the pixel, we find a corresponding character from charset1. index will store the index
    //value of the character
    int index;
    //imagedata is used to store the grayscale value of each pixel from the image (0-255)
    int imagedata;

    //Because we want the Unicode codepoints to be printed properly in our text file, we need to make sure they are encoded
    //in UTF-8, which can handle Unicode characters' encoding.
    //A UTF-8 locale is defined
    const std::locale utf8_locale = std::locale(std::locale(), new std::codecvt_utf8());
    wofstream f("sample.txt"); // we need to use wofstream because wide characters are being printed
    f.imbue(utf8_locale);//the locale is imbued to the stream
    f.clear();  //clears previous entries  in the file


    //The following code loops through each pixel
    for(int i=Yresolution; i <=(image.getSize().y-Yresolution) ;  i+=Yresolution) // converts each pixel to grayscale with weights
    {
        for(int j=Xresolution; j <=(image.getSize().x-Xresolution);  j+=Xresolution)
        {
            imagedata=image.getPixel(j,i).r*0.21+0.72*image.getPixel(j,i).g+0.07*image.getPixel(j,i).b; //converts each pixel to grayscale
            index=((int)((((wcslen(charset1))-1)*(imagedata/255.f)*1 ))); //finds corresponding character depending on grayscale value from the intensity-sorted character set
            char_place=charset1[index]; //sets character to the wchar_t variable
            //prints the character, a space is added to make sure the Urdu characters dont collate
            f<<char_plac<<"  \a"; 
        }
        // skips line after each row
        f<<"\r\n";  
    }
    
    f.close(); 
    return 0;
    


References used:

[1]https://en.cppreference.com/w/cpp/locale/codecvt
[2] https://bitesofcode.wordpress.com/2017/01/19/converting-images-to-ascii-art-part-1/
[3] https://marmelab.com/blog/2018/02/20/convert-image-to-ascii-art-masterpiece.html
[4]https://medium.com/towards-artificial-intelligence/convert-images-to-ascii-art-images-using- python-90261de03c53
[5]https://r12a.github.io/scripts/arabic/
[6] http://candcplusplus.com/c-wchar_t-type-and-wstring-type
[7] https://stackoverflow.com/questions/3950718/wrote-to-a-file-using-stdwofstream-the-file-remained-empty
[8] http://www.cantrip.org/locale.html
[9] https://www.cprogramming.com/tutorial/unicode.html
[10] https://stackoverflow.com/questions/32987103/image-to-ascii-art-conversion

Comments

Popular posts from this blog

Making an 8DOF Quadruped

Fixing the "A software problem has caused Meshmixer to close unexpectedly" Problem