In computers, all data must be represented by binary values ​​when stored and operated, and which binary numbers are used to represent which symbols, of course, everyone can agree on your own set (this is called encoding), and if everyone wants to communicate with each other without causing confusion, then everyone must use the same encoding rules, which is the reason for unified encoding. In simple terms, encoding is the correspondence between characters and values.
In this article, we describe the encoding rules and applications of different encodings in detail.

Unicode and UTF

  • Unicode
    Unified character number, only provides the mapping between characters and numbers . Due to historical legacy and colloquial reasons, when it comes to unicode, it often refers to UTF-16.
  • UTF
    Unicode transformation format, that is, the specific implementation of unicode.
    • UTF-16
      • In the old standard: “16” means that 16 bits (ie 2 bytes) are used to store a character. 0-127 are those in ASCII, and 128-65536 are those extended characters, which are easy to understand.
      • In the new standard: use 16-bit/32-bit to store a character, no longer a fixed-length 16-bit. It should be noted here that under UTF-16 encoding, it is possible to read 2 bytes to parse a character, but read 4 bytes to parse the next character. In the new standard, UTF-16 has also become an indeterminate length encoding.
    • UTF-8
      “8” means that 8 bits are a block (saying it means not saying the series). “8” is very misleading. In fact it is a variable-length encoding, using 8-bit/16-bit/24-bit/32-bit to store a character.
    • UTF-32
      Fixed use of 32 bits to store a character, which is a fixed-length encoding.

URL Encoding

Question: http://localhost?a=1&b=2, what are the query parameters in the url?
Obviously a=1and b=2. But why is it not understood as a= “1&b=2” this string
The point is &, which is similar to \ of our string , and & is a symbol with a special meaning. The backend will treat this character as a token for segmentation.

  • URLEncode
    URL encoding is an encoding method based on unicode/GBK and other encodings to solve the above-mentioned and other (Chinese transmission) problems.
    A simplest example, such as “中” the UTF-8 encoding of the string in the previous example E4 B8 AD, then after urlencoded, it is a string %E4%B8%AD . If it is based on UTF-16BE, then it is a string %4E%2D after urlencoded .
    For characters with different encodings, the results of urlencoded may be different. And for these special symbols &=?, the encoding rules are certain

  • Application/x-www-form-urlencoded
    It is one of the optional values ​​of http header content-type, which means that the current request parameter/entity content has been urlencoded. In addition application/x-www-form-urlencoded; charset=UTF-8, it indicates that this urlencoded is based on UTF-8 encoding to avoid errors during decoding.

  • Post method

    • application/x-www-form-urlencoded; charset=UTF-8
      The description is as above, please understand the difference between the browser form’s default auto-encoded submission and your own manual submission, avoid urlencoded twice or no urlencoded

    • application/json; charset=UTF-8
      No url encoding is performed. Because the backend does not need & symbols to be used as separators at this time passing the json string directly

    • multipart/form-data
      No url encoding is performed. Because the back end does not need &symbols as separators at this time, but uses a special separator.