zipfile header language encoding bit set differently between Python2 and Python3










3














I would like this code to work the same when run with Python 2 or Python 3



from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)


However, under Python 2 out.zip starts:



50 4b 03 04 14 00 00 08


Under Python3, it starts:



50 4b 03 04 14 00 00 00


The differing part is flag_bits, set to 0x800 for Python 2, 0x00 for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii") throws.



I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().



I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.



EDIT: Updated to add the info.flag_bits = 0x800 line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:



pythonX test.py









share|improve this question























  • Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
    – Algorithmic Canary
    Nov 12 '18 at 1:06










  • Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
    – martineau
    Nov 12 '18 at 2:50











  • @martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
    – Keeely
    Nov 12 '18 at 9:36










  • Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
    – martineau
    Nov 12 '18 at 9:42










  • @martineau, indeed that is my last resort, but it's a pretty horrible solution.
    – Keeely
    Nov 12 '18 at 9:47















3














I would like this code to work the same when run with Python 2 or Python 3



from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)


However, under Python 2 out.zip starts:



50 4b 03 04 14 00 00 08


Under Python3, it starts:



50 4b 03 04 14 00 00 00


The differing part is flag_bits, set to 0x800 for Python 2, 0x00 for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii") throws.



I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().



I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.



EDIT: Updated to add the info.flag_bits = 0x800 line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:



pythonX test.py









share|improve this question























  • Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
    – Algorithmic Canary
    Nov 12 '18 at 1:06










  • Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
    – martineau
    Nov 12 '18 at 2:50











  • @martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
    – Keeely
    Nov 12 '18 at 9:36










  • Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
    – martineau
    Nov 12 '18 at 9:42










  • @martineau, indeed that is my last resort, but it's a pretty horrible solution.
    – Keeely
    Nov 12 '18 at 9:47













3












3








3


1





I would like this code to work the same when run with Python 2 or Python 3



from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)


However, under Python 2 out.zip starts:



50 4b 03 04 14 00 00 08


Under Python3, it starts:



50 4b 03 04 14 00 00 00


The differing part is flag_bits, set to 0x800 for Python 2, 0x00 for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii") throws.



I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().



I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.



EDIT: Updated to add the info.flag_bits = 0x800 line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:



pythonX test.py









share|improve this question















I would like this code to work the same when run with Python 2 or Python 3



from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)


However, under Python 2 out.zip starts:



50 4b 03 04 14 00 00 08


Under Python3, it starts:



50 4b 03 04 14 00 00 00


The differing part is flag_bits, set to 0x800 for Python 2, 0x00 for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii") throws.



I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().



I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.



EDIT: Updated to add the info.flag_bits = 0x800 line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:



pythonX test.py






python python-2.7 zipfile python-3.7






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 '18 at 9:52

























asked Nov 12 '18 at 0:29









Keeely

31129




31129











  • Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
    – Algorithmic Canary
    Nov 12 '18 at 1:06










  • Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
    – martineau
    Nov 12 '18 at 2:50











  • @martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
    – Keeely
    Nov 12 '18 at 9:36










  • Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
    – martineau
    Nov 12 '18 at 9:42










  • @martineau, indeed that is my last resort, but it's a pretty horrible solution.
    – Keeely
    Nov 12 '18 at 9:47
















  • Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
    – Algorithmic Canary
    Nov 12 '18 at 1:06










  • Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
    – martineau
    Nov 12 '18 at 2:50











  • @martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
    – Keeely
    Nov 12 '18 at 9:36










  • Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
    – martineau
    Nov 12 '18 at 9:42










  • @martineau, indeed that is my last resort, but it's a pretty horrible solution.
    – Keeely
    Nov 12 '18 at 9:47















Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
– Algorithmic Canary
Nov 12 '18 at 1:06




Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
– Algorithmic Canary
Nov 12 '18 at 1:06












Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 '18 at 2:50





Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 '18 at 2:50













@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 '18 at 9:36




@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 '18 at 9:36












Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 '18 at 9:42




Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 '18 at 9:42












@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 '18 at 9:47




@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 '18 at 9:47












2 Answers
2






active

oldest

votes


















1














Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



$ cat zipf.py
from __future__ import print_function

from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
# don't set info.file_size here: zf.writestr() does that
zf.writestr(info, content)

with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print(':02x'.format(i), end=' ')
print()


Run as:



$ python2.7 zipf.py
50 4b 03 04 14 00 00 08


but:



$ python3.6 zipf.py
50 4b 03 04 14 00 00 00


It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



from __future__ import print_function

from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
info = ZipInfo()
info.filename = "file.txt"
content = "content"
if not isinstance(content, bytes):
content = content.encode('utf8')
info.file_size = len(content)
with zf.open(info, 'w') as stream:
info.flag_bits = 0x800
stream.write(content)

with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print(':02x'.format(i), end=' ')
print()


It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



Original answer below



I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



def _encodeFilenameFlags(self):
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits


(Python 2.7 zipfile.py source) or:



def _encodeFilenameFlags(self):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800


(Python 3.6 zipfile.py source).



To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


(this notation works with both Python 2.7 and 3.6).




I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




If I add:



info.filename = "file.txt"
info.flag_bits |= 0x0800


(just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).






share|improve this answer






















  • Can you post your full code if you got the bit set for filename==file.txt with Python3?
    – Keeely
    Nov 12 '18 at 9:34











  • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
    – torek
    Nov 12 '18 at 9:55










  • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
    – Keeely
    Nov 12 '18 at 10:33










  • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
    – torek
    Nov 12 '18 at 11:26



















0














I am using something like this for the time being:



from zipfile import ZipFile, ZipInfo
import struct

orig_function = ZipInfo.FileHeader

def new_function(self, zip64=None):
header = orig_function(self, zip64)
fmt = "B"*len(header)
blist = list(struct.unpack(fmt, header))
blist[7] |= 0x8
return struct.pack(fmt, *blist)

setattr(ZipInfo, "FileHeader", new_function)

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.file_size = len(content)
zf.writestr(info, content)


Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.






share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254622%2fzipfile-header-language-encoding-bit-set-differently-between-python2-and-python3%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



    $ cat zipf.py
    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.flag_bits = 0x800
    # don't set info.file_size here: zf.writestr() does that
    zf.writestr(info, content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    Run as:



    $ python2.7 zipf.py
    50 4b 03 04 14 00 00 08


    but:



    $ python3.6 zipf.py
    50 4b 03 04 14 00 00 00


    It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    info = ZipInfo()
    info.filename = "file.txt"
    content = "content"
    if not isinstance(content, bytes):
    content = content.encode('utf8')
    info.file_size = len(content)
    with zf.open(info, 'w') as stream:
    info.flag_bits = 0x800
    stream.write(content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



    Original answer below



    I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



    def _encodeFilenameFlags(self):
    if isinstance(self.filename, unicode):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
    else:
    return self.filename, self.flag_bits


    (Python 2.7 zipfile.py source) or:



    def _encodeFilenameFlags(self):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800


    (Python 3.6 zipfile.py source).



    To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



    info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


    (this notation works with both Python 2.7 and 3.6).




    I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




    If I add:



    info.filename = "file.txt"
    info.flag_bits |= 0x0800


    (just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).






    share|improve this answer






















    • Can you post your full code if you got the bit set for filename==file.txt with Python3?
      – Keeely
      Nov 12 '18 at 9:34











    • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
      – torek
      Nov 12 '18 at 9:55










    • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
      – Keeely
      Nov 12 '18 at 10:33










    • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
      – torek
      Nov 12 '18 at 11:26
















    1














    Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



    $ cat zipf.py
    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.flag_bits = 0x800
    # don't set info.file_size here: zf.writestr() does that
    zf.writestr(info, content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    Run as:



    $ python2.7 zipf.py
    50 4b 03 04 14 00 00 08


    but:



    $ python3.6 zipf.py
    50 4b 03 04 14 00 00 00


    It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    info = ZipInfo()
    info.filename = "file.txt"
    content = "content"
    if not isinstance(content, bytes):
    content = content.encode('utf8')
    info.file_size = len(content)
    with zf.open(info, 'w') as stream:
    info.flag_bits = 0x800
    stream.write(content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



    Original answer below



    I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



    def _encodeFilenameFlags(self):
    if isinstance(self.filename, unicode):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
    else:
    return self.filename, self.flag_bits


    (Python 2.7 zipfile.py source) or:



    def _encodeFilenameFlags(self):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800


    (Python 3.6 zipfile.py source).



    To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



    info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


    (this notation works with both Python 2.7 and 3.6).




    I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




    If I add:



    info.filename = "file.txt"
    info.flag_bits |= 0x0800


    (just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).






    share|improve this answer






















    • Can you post your full code if you got the bit set for filename==file.txt with Python3?
      – Keeely
      Nov 12 '18 at 9:34











    • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
      – torek
      Nov 12 '18 at 9:55










    • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
      – Keeely
      Nov 12 '18 at 10:33










    • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
      – torek
      Nov 12 '18 at 11:26














    1












    1








    1






    Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



    $ cat zipf.py
    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.flag_bits = 0x800
    # don't set info.file_size here: zf.writestr() does that
    zf.writestr(info, content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    Run as:



    $ python2.7 zipf.py
    50 4b 03 04 14 00 00 08


    but:



    $ python3.6 zipf.py
    50 4b 03 04 14 00 00 00


    It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    info = ZipInfo()
    info.filename = "file.txt"
    content = "content"
    if not isinstance(content, bytes):
    content = content.encode('utf8')
    info.file_size = len(content)
    with zf.open(info, 'w') as stream:
    info.flag_bits = 0x800
    stream.write(content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



    Original answer below



    I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



    def _encodeFilenameFlags(self):
    if isinstance(self.filename, unicode):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
    else:
    return self.filename, self.flag_bits


    (Python 2.7 zipfile.py source) or:



    def _encodeFilenameFlags(self):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800


    (Python 3.6 zipfile.py source).



    To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



    info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


    (this notation works with both Python 2.7 and 3.6).




    I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




    If I add:



    info.filename = "file.txt"
    info.flag_bits |= 0x0800


    (just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).






    share|improve this answer














    Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



    $ cat zipf.py
    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.flag_bits = 0x800
    # don't set info.file_size here: zf.writestr() does that
    zf.writestr(info, content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    Run as:



    $ python2.7 zipf.py
    50 4b 03 04 14 00 00 08


    but:



    $ python3.6 zipf.py
    50 4b 03 04 14 00 00 00


    It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    info = ZipInfo()
    info.filename = "file.txt"
    content = "content"
    if not isinstance(content, bytes):
    content = content.encode('utf8')
    info.file_size = len(content)
    with zf.open(info, 'w') as stream:
    info.flag_bits = 0x800
    stream.write(content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



    Original answer below



    I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



    def _encodeFilenameFlags(self):
    if isinstance(self.filename, unicode):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
    else:
    return self.filename, self.flag_bits


    (Python 2.7 zipfile.py source) or:



    def _encodeFilenameFlags(self):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800


    (Python 3.6 zipfile.py source).



    To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



    info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


    (this notation works with both Python 2.7 and 3.6).




    I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




    If I add:



    info.filename = "file.txt"
    info.flag_bits |= 0x0800


    (just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 12 '18 at 11:59

























    answered Nov 12 '18 at 1:09









    torek

    183k17234314




    183k17234314











    • Can you post your full code if you got the bit set for filename==file.txt with Python3?
      – Keeely
      Nov 12 '18 at 9:34











    • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
      – torek
      Nov 12 '18 at 9:55










    • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
      – Keeely
      Nov 12 '18 at 10:33










    • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
      – torek
      Nov 12 '18 at 11:26

















    • Can you post your full code if you got the bit set for filename==file.txt with Python3?
      – Keeely
      Nov 12 '18 at 9:34











    • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
      – torek
      Nov 12 '18 at 9:55










    • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
      – Keeely
      Nov 12 '18 at 10:33










    • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
      – torek
      Nov 12 '18 at 11:26
















    Can you post your full code if you got the bit set for filename==file.txt with Python3?
    – Keeely
    Nov 12 '18 at 9:34





    Can you post your full code if you got the bit set for filename==file.txt with Python3?
    – Keeely
    Nov 12 '18 at 9:34













    @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
    – torek
    Nov 12 '18 at 9:55




    @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
    – torek
    Nov 12 '18 at 9:55












    thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
    – Keeely
    Nov 12 '18 at 10:33




    thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
    – Keeely
    Nov 12 '18 at 10:33












    One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
    – torek
    Nov 12 '18 at 11:26





    One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
    – torek
    Nov 12 '18 at 11:26














    0














    I am using something like this for the time being:



    from zipfile import ZipFile, ZipInfo
    import struct

    orig_function = ZipInfo.FileHeader

    def new_function(self, zip64=None):
    header = orig_function(self, zip64)
    fmt = "B"*len(header)
    blist = list(struct.unpack(fmt, header))
    blist[7] |= 0x8
    return struct.pack(fmt, *blist)

    setattr(ZipInfo, "FileHeader", new_function)

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.file_size = len(content)
    zf.writestr(info, content)


    Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.






    share|improve this answer

























      0














      I am using something like this for the time being:



      from zipfile import ZipFile, ZipInfo
      import struct

      orig_function = ZipInfo.FileHeader

      def new_function(self, zip64=None):
      header = orig_function(self, zip64)
      fmt = "B"*len(header)
      blist = list(struct.unpack(fmt, header))
      blist[7] |= 0x8
      return struct.pack(fmt, *blist)

      setattr(ZipInfo, "FileHeader", new_function)

      with ZipFile("out.zip", 'w') as zf:
      content = "content"
      info = ZipInfo()
      info.filename = "file.txt"
      info.file_size = len(content)
      zf.writestr(info, content)


      Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.






      share|improve this answer























        0












        0








        0






        I am using something like this for the time being:



        from zipfile import ZipFile, ZipInfo
        import struct

        orig_function = ZipInfo.FileHeader

        def new_function(self, zip64=None):
        header = orig_function(self, zip64)
        fmt = "B"*len(header)
        blist = list(struct.unpack(fmt, header))
        blist[7] |= 0x8
        return struct.pack(fmt, *blist)

        setattr(ZipInfo, "FileHeader", new_function)

        with ZipFile("out.zip", 'w') as zf:
        content = "content"
        info = ZipInfo()
        info.filename = "file.txt"
        info.file_size = len(content)
        zf.writestr(info, content)


        Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.






        share|improve this answer












        I am using something like this for the time being:



        from zipfile import ZipFile, ZipInfo
        import struct

        orig_function = ZipInfo.FileHeader

        def new_function(self, zip64=None):
        header = orig_function(self, zip64)
        fmt = "B"*len(header)
        blist = list(struct.unpack(fmt, header))
        blist[7] |= 0x8
        return struct.pack(fmt, *blist)

        setattr(ZipInfo, "FileHeader", new_function)

        with ZipFile("out.zip", 'w') as zf:
        content = "content"
        info = ZipInfo()
        info.filename = "file.txt"
        info.file_size = len(content)
        zf.writestr(info, content)


        Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 12 '18 at 13:48









        Keeely

        31129




        31129



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254622%2fzipfile-header-language-encoding-bit-set-differently-between-python2-and-python3%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Use pre created SQLite database for Android project in kotlin

            Darth Vader #20

            Ondo