Python: How to check if file is encoded as UTF16
To determine if a file is encoded as UTF-16, you can check for the presence of a Byte Order Mark (BOM) at the beginning of the file. UTF-16 encoded files usually start with a BOM, which is 0xFEFF
for little-endian and 0xFFFE
for big-endian.
def is_utf16(filename):
with open(filename, 'rb') as file:
start = file.read(2)
return start in [b'\xff\xfe', b'\xfe\xff']
# Example usage
filename = 'test.txt'
if is_utf16(filename):
print(f"The file '{filename}' is encoded as UTF-16.")
else:
print(f"The file '{filename}' is NOT encoded as UTF-16.")