Damien Sullivan (mindstalk) wrote,
Damien Sullivan
mindstalk

Python is annoying

Our code works with binary data (hashes/digest) and hexstring representations of such data, a lot. It was written in Python 2, when everything was a string, but some strings were "beef" and some were "'\xbe\xef'"

Then we converted to Python 3, which introduced the 'bytes' type for binary data, and Unicode strings everywhere, which led to some type problems I had figured out, but a recent debugging session revealed I had to think about it some more. Basically we can now have a hexstring "beef", the bytes object b'\xbe\xef' described by that hexstring... and the bytes b"beef" which is the UTF-8 encoding of the string.

In particular, the function binascii.hexlify (aka binascii.b2a_hex) which we used a lot, changed what it returned.

Python 2:
>>> binascii.a2b_hex("beef")
'\xbe\xef'
>>> binascii.hexlify(_)
'beef'

Python 3:
>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> binascii.hexlify(_)
b'beef'

vs.
>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> _.hex()
'beef'


I found it easy to assume that if one of our functions was returning b"beef" and the other "beef" that they were on the same page, when really, not.

Bunch of examples in the cut.



>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> binascii.hexlify(_)
b'beef'
>>> _.decode()
'beef'
>>> binascii.a2b_hex("beef")
b'\xbe\xef'
>>> _.decode()
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 0:
invalid start byte

>>> b"beef".hex()
'62656566'

>>> sha=hashlib.sha256(b'fee')
>>> sha.digest()
b'\xb8\x0c\xda\xae\x9b2+\xba*8\xfd9b\x99x*L\xc6\xb4\xa0\xcc\xf6\x7f\xcc\xbb\xcca|\x94\xa4`&'
>>> sha.digest().hex()
'b80cdaae9b322bba2a38fd396299782a4cc6b4a0ccf67fccbbcc617c94a46026'
>>> sha.hexdigest()
'b80cdaae9b322bba2a38fd396299782a4cc6b4a0ccf67fccbbcc617c94a46026'

>>> bytes.fromhex("cow")
Traceback (most recent call last):
  File "", line 1, in 
ValueError: non-hexadecimal number found in fromhex() arg at position 1
>>> "cow".encode()
b'cow'
>>> "beef".encode()
b'beef'

>>> binascii.b2a_hex(b"beef")
b'62656566'
>>> binascii.b2a_hex(bytes.fromhex("beef"))
b'beef'
>>> bytes.fromhex("beef").hex()
'beef'




See the comment count unavailable DW comments at https://mindstalk.dreamwidth.org/513680.html#comments
Tags: programming, python
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 0 comments