How do I get a consistent byte representation of strings in C# without manually specifying an encoding?Ask Questions

 Posted on 29 days ago

How do I convert a string to a byte[] in .NET (C#) without manually specifying a specific encoding?

I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know why encoding comes to play here.

Also, why should encoding be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?

Share On: facebook gplus twitter
profile
Asked by Joseph Sible on 29 days ago Score: 25 points
Add Comment:

Comments

3 Answers

0 Corrected Answers
Aproved Answers
0
Profile
Answered by hardik chaudhary on 9/28/2019 3:40:26 PM Score: 542 points

It depends on the encoding of your string (ASCIIUTF-8, ...).

For example:

byte[] b1 = System.Text.Encoding.UTF8.GetBytes (myString);

byte[] b2 = System.Text.Encoding.ASCII.GetBytes (myString);

A small sample why encoding matters:

string pi = "\u03a0"; byte[] ascii = System.Text.Encoding.ASCII.GetBytes (pi);

byte[] utf8 = System.Text.Encoding.UTF8.GetBytes (pi);

Console.WriteLine (ascii.Length); //Will print 1

Console.WriteLine (utf8.Length); //Will print 2

Console.WriteLine (System.Text.Encoding.ASCII.GetString (ascii)); //Will print '?'

ASCII simply isn't equipped to deal with special characters.

Internally, the .NET framework uses UTF-16 to represent strings, so if you simply want to get the exact bytes that .NET uses, use System.Text.Encoding.Unicode.GetBytes (...).

Comments

Add Comment:
Aproved Answers
0
Profile
Answered by Praful Chauhan on 9/23/2019 8:09:40 AM Score: 116 points

It depends on the encoding of your string (ASCIIUTF-8, ...).

For example:

byte[] b1 = System.Text.Encoding.UTF8.GetBytes (myString);

byte[] b2 = System.Text.Encoding.ASCII.GetBytes (myString);

A small sample why encoding matters:

string pi = "\u03a0";

byte[] ascii = System.Text.Encoding.ASCII.GetBytes (pi);

byte[] utf8 = System.Text.Encoding.UTF8.GetBytes (pi);

Console.WriteLine (ascii.Length);

//Will print 1 Console.WriteLine (utf8.Length);

//Will print 2 Console.WriteLine (System.Text.Encoding.ASCII.GetString (ascii));

//Will print '?'

ASCII simply isn't equipped to deal with special characters.

Internally, the .NET framework uses UTF-16 to represent strings, so if you simply want to get the exact bytes that .NET uses, use System.Text.Encoding.Unicode.GetBytes (...).

See Character Encoding in the .NET Framework (MSDN) for more information.

Comments

Add Comment:
Aproved Answers
1
Profile
Answered by Grayson Omans on 9/23/2019 8:08:13 AM Score: 59 points

Contrary to the answers here, you DON'T need to worry about encoding if the bytes don't need to be interpreted!

Like you mentioned, your goal is, simply, to "get what bytes the string has been stored in".
(And, of course, to be able to re-construct the string from the bytes.)

For those goals, I honestly do not understand why people keep telling you that you need the encodings. You certainly do NOT need to worry about encodings for this.

Just do this instead:

static byte[] GetBytes(string str)

{

byte[] bytes = new byte[str.Length * sizeof(char)];

System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);

return bytes;

}

// Do NOT use on arbitrary bytes; only use on GetBytes's output on the SAME system

static string GetString(byte[] bytes) {

char[] chars = new char[bytes.Length / sizeof(char)];

System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);

return new string(chars);

}

As long as your program (or other programs) don't try to interpret the bytes somehow, which you obviously didn't mention you intend to do, then there is nothing wrong with this approach! Worrying about encodings just makes your life more complicated for no real reason.

Additional benefit to this approach:

It doesn't matter if the string contains invalid characters, because you can still get the data and reconstruct the original string anyway!

It will be encoded and decoded just the same, because you are just looking at the bytes.

If you used a specific encoding, though, it would've given you trouble with encoding/decoding invalid characters.

Comments

Add Comment:

Post Your Answers

Existing Members

Sign in to your account
Email Address
Password
New Member?
Sign up and complete profile
Full Name
Email Address
I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the StoodQ newsletters
Guideline to answer a question:

Useful tips to submit your answer
Please read below guidelines before you submit your answer for question.

  • Read and understand question for which you are submitting your answer.
  • Try to avoid grammatical and spell mistake while answering.
  • Do not post any irrelevant information in your answer.
  • Explain your answer with example or any reference link to help who posted question.
  • If you find irrelevant question, please report it to support. Click here to contact support.
  • You agree to the privacy policy and terms of use to submit any contents.

Note: StoodQ is online developers community which helps developer for their difficulty, lets help them with your value contribution.