News:

FOR INFORMATION ON DONATIONS, AND HOW TO OBTAIN ACCESS TO THE GAME, PLEASE VIEW THE FOLLOWING TOPIC: http://stick-online.com/boards/index.php?topic=2.0

Main Menu

Unicode Research and Programming

Started by RayRay, September 05, 2011, 04:53:55 PM

Previous topic - Next topic

RayRay

I was going to explain Unicode further on the Unicode Snowman topic, but SMF suggested that I should make another topic.

Remember that I said you could produce a Unicode or ASCII character with Alt and the Keypad?

After learning some hexadecimal, (the format consisting of 16 numbers <0-9, A-F>, with decimal consisting of 10) I learned the TRUE meaning of Unicode. (you could look this stuff up on Wikipedia, but I just like to sum it up here)

I was learning about how the NES programming worked, so I went to read about bytes, how each byte consists of two hexadecimal digits called 'nibbles'. When I read about nibbles, I saw this chart:

I looked at the bottom two rows, and it looked really familiar.

Later I went to read about Unicode. Turns out that there isn't just 10,000 characters. The index definition consists of 0 to 10FFFF, which is 1,114,112 indexes! However, many of the indexes doesn't have characters, but I'm pretty sure it still is a million.

Unicode is actually a product automatically installed in Windows. It consists of emoticons, symbols, the letters and numbers, japanese symbols, and your fellow Windows 96 window graphics. Also files have a bit format. They could have the normal 8-bit characters, which is all and only all the characters you see on the picture. (some characters are a bit different actually) But some are 16-bit, or even 32-bit. If you open a 32-bit file in an 8-bit editor, you should realize that each 4 characters is just 1 character.

A nifty little fact I found is that each time you press the Enter button, you are actually sending a 16-bit character. In ASCII, it is either (in the picture (HL)) 0A + 0D, or 0D + 0A. This is ◙♪/♪◙ in Unicode form. Windows uses 0D0A (hex) for newline. I looked in a hex editor, and found that I was correct. If you try to do Alt+3338, (0D0A) you just get an inverted circle, since it's not formatted to do that. (SMF's text editor only inputs the first 256 unicode characters; it wraps if you exceed 255) Google 'newline' if you don't get it.

All of this stuff is straight from Wikipedia.

ARTgames

yeah dude you have the idea. Before Unicode it was a messy world. Now ASCII was ok but 1 byte was way too small to hold the resolution needed for each symbol for every language in the world that needed to be on a computer. And in other parts of the world they came up with other encoding methods that where not compatible with each other. Was a mess.

Then Unicode was invented and things got a lot easier. But you do still come across problems here and there. But still a great standard.