Pragmatic Programmer Issues

Java Specification – chapter three

Comments: 1

I’ve just started to read Java Specification, and I suggest every java programmer should read it. I try to write some interesting facts in this post and future posts. In first two chapter I didn’t found anything interesting. I almost pass over chapter three, but fortunately I found some interesting information on the first page.

  • Programs are written using Unicode. Unicode version depends on java version and this match
    • Prior Java 1.1 – Unicode 1.1.5
    • Java 1.1 – Unicode 2.0
    • Java 1.1.7 – Unicode 2.1
    • J2SE 1.4 – Unicode 3.0
    • J2SE 1.5 – Unicode 4.0
  • Programs are written using Unicode, and as we know we rather code our programs in ASCII. There is a translation to Unicode. In this process every unicode escape \uXXXX where XXXX is hexadecimal value is converted to proper unicode character.
    • Note that \\uxxxx is not unicode escape and produce “ \ \ u x x x x”.
    • Note that after \ there can be more u character than one !!!. So \uuuuu005a produce Z character
    • Note that \u005c (character \) is not interpreted as further unicode escape. So \u005cu005a produce \ u 0 0 5 a characters. And produce compiler error !!!, because u is invalid character escape.
  • Unicode escapes are processed very early, it is not correct to write \u000a – linefeed, because in next step it will be treated as line terminator. The same with other special characters. We should use for them character escape (\r \n )
  • It good to know that white space is “space”, “horizontal tab”, “form feed” and line terminators LR and CR.
  • Unicode provides to use national characters to name variables. A character for first identifier letter can be any character for which Character.isJavaIdentifierStart return true. And for another part Character.isJavaIdentifierPart returns true.
  • Two identifiers are the same if they are identical that means that every character have the same unicode code. Be careful the same letter can have many unicode codes, so they are different – aVar is different aVar because first a can be LATIN_SMALL_LETTER \u0041 but second a can be CYRILLIC_SMALL_LATTER \u0430 which both look like ‘a’.
  • const and goto are reserved keywords which is not use at all, and in opposite true, false and null is not keywords.
  • L is suffix for long literal and is preferred from l because l is similar to 1.
  • It’s strange but the largest positive literal for int type is 2147483647 and 2147483648 can be only with negation operator. Otherwise compiler raise error “integer number too large” . The same situation is with long but the value is much larger 2^63.
  • If you want to know how java converts unicode floats and doubles values you should read Double.valueOf, Float.valueOf
  • Compile time error occurs when rounded conversion to internal representation becomes infinity. A program can represents infinity as 1f/0f or -1d/0d expressions or predefined constants POSITIVE_INFINITY and NEGATIVE_INFINITY in class Float and Double.

The last part was about String literals. To see unique instances use String.intern. And the rules for that are:

  • String literal represents always the same String object
  • String computed by constant expressions are computed at compile time and than treated as literals so “ala” + “ola” represents the same object as “alaola”
  • Strings computed by concatenation at run time are newly created and are different.

Categories

Comments

kreteskretes

Great! Looking forward to more such a dense posts!