The basis of character sets used in almost all present-day
computers.
US-ASCII uses only the lower seven
bits
(
character points 0 to 127) to convey some
control codes,
space, numbers, most basic punctuation, and unaccented letters
a-z and A-Z. More modern coded character sets (e.g.,
Latin-1,
Unicode) define extensions to ASCII
for values
above 127
for conveying special Latin characters (like
accented characters, or German ess-tsett), characters from
non-Latin writing systems (e.g., Cyrillic, or {Han
characters}), and such desirable
glyphs as distinct open-
and close-quotation marks. ASCII replaced earlier systems
such as
EBCDIC and
Baudot, which used fewer bytes, but
were each
broken in their own way.
Computers are much pickier about spelling than humans; thus,
hackers need to be very precise when talking about characters,
and have developed a considerable amount of verbal shorthand
for them. Every character has one or more names - some
formal, some concise, some silly.
Individual characters are listed in this dictionary with
alternative names from revision 2.3 of the
Usenet ASCII
pronunciation guide in rough order of popularity, including
their official
ITU-T names and the particularly silly names
introduced by
INTERCAL.
See
V ampersand,
asterisk,
back quote,
backslash,
caret,
colon,
comma,
commercial at,
control-C,
dollar,
dot,
double quote,
equals,
exclamation mark,
greater than,
hash,
left bracket,
left parenthesis,
less than,
minus,
parentheses,
oblique stroke,
percent,
plus,
question mark,
right brace, {right
brace},
right bracket,
right parenthesis,
semicolon,
single quote,
space,
tilde,
underscore, {vertical
bar},
zero.
Some other common usages cause odd overlaps. The "#", "$",
" > ", and "&" characters,
for example, are all pronounced "hex"
in different communities because various assemblers use them
as a prefix tag
for hexadecimal constants (in particular,
"#" in many assembler-programming cultures, "$" in the
6502
world, " > " at
Texas Instruments, and "&" on the
BBC Micro,
Acorn Archimedes,
Sinclair, and some
Zilog Z80
machines). See also
splat.
The inability of
US-ASCII to correctly represent nearly any
language other than English became an obvious and intolerable
misfeature as computer use outside the US and UK became the
rule rather than the exception (see
software rot). And so
national extensions to US-ASCII were developed, such as
Latin-1.
Hardware and software from the US still tends to embody the
assumption that US-ASCII is the universal character set and
that words of text consist entirely of byte values 65-90 and
97-122 (A-Z and a-z); this is a major irritant to people who
want to use a character set suited to their own languages.
Perversely, though, efforts to solve this problem by
proliferating sets of national characters produced an
evolutionary pressure (especially in protocol design, e.g.,
the
URL standard) to stick to
US-ASCII as a subset common
to all those in use, and therefore to stick to English as the
language encodable with the common subset of all the ASCII
dialects. This basic problem with having a multiplicity of
national character sets ended up being a prime justification
for Unicode, which was designed, ostensibly, to be the *one*
ASCII extension anyone will need.
A system is described as "
eight-bit clean" if it doesn't
mangle text with byte values above 127, as some older systems
did.
See also
ASCII character table,
Yu-Shiang Whole Fish.
(1995-03-06)