C语言字符集

作者: 博客园精华区  更新时间:2020-01-05 14:53:00  原文链接


C语言字符集

编译器在转换源程序代码时,所处的环境称为翻译环境(translation environment);编译后程序执行时,所处的环境成为运行环境(execution environment)。对C语言来说,翻译环境和运行环境是不同的。因此,C语言定义了两个字符集(character set): 源代码字符集与运行字符集。源代码字符集(source character set)是用于组成C源代码的字符集合,而运行字符集(execution character set)是可以被执行程序解释的字符集合。在许多C语言的实现版本中,这两个字符集是一样。如果不一样,则编译器会把源代码中的字符常量和字符串字面量转换成运行字符集中的对应元素。

这两种字符集都包括基本字符集(basic character set)和扩展字符(extended character)。C语言通常没有指定扩展字符,这通常由本地语言所决定。扩展字符加上基本字符集,组成扩展字符集(extended character set)。

基本源代码字符集和基本运行字符集都包含了下面的字符类型:

拉丁字母、十进制阿拉伯数字、

下面29个字符:

!“   #  %  &  `  ()  *  +  ,  -  .   /  :  ;  <  =  >  ? [  \  ]  ^  _  {  |  }  ~

5种空白符:

空格、水平制表符、垂直制表符、换行、换页

基本运行字符集定义了四个不可打印字符集:

null字符(用作字符串终止) \0 、警报(alert) \a 、退格(backspace)  \b以及回车(carriage return)\r

C language character set

When the compiler converts source program code, the environment in which it is located is called the translation environment; when the program is executed after compilation, the environment is in the execution environment. For C, the translation environment and the runtime environment are different. Therefore, C language defines two character sets (character set): source code character set and running character set. The source character set is the set of characters used to form the C source code, and the execution character set is the set of characters that can be interpreted by the executing program. In many C implementations, these two character sets are the same. If they are not the same, the compiler will convert the character constants and string literals in the source code into corresponding elements in the running character set.

Both character sets include a basic character set and an extended character. C language usually does not specify extended characters, which is usually determined by the native language. The extended characters plus the basic character set form the extended character set.Both the basic source code character set and the basic run character set include the following character types

:Latin alphabet, decimal Arabic numerals,

The following 29 characters :

! "#% &` () * +,-. /:; <=>? [\] ^ _ {|} ~

5 types of whitespace:Spaces, horizontal tabs, vertical tabs, line breaks, page breaks

The basic running character set defines four non-printable character sets:null characters (used as string termination) \ 0, alert \ a, backspace \ b, and carriage return \ r