How to use Python3 to download a Pling package in a reliable manner?

tesla33 · September 4, 2019, 4:55pm

I would like to use a Python3 script to download a package from Pling in a reliable manner. Below is my script.
The url used was obtained from a previous download. It used to work, but now it does not work. I tried going to the package website (https://www.gnome-look.org/p/1102582 click on File Tab) to download the package. It worked once. Thereafter, the manual download would not work anymore.

How can I download a package from www.gnome-look.org in a reliable manner? The intention here is to automate the installation of the package in Ubuntu 18.04.

#!/usr/bin/python3.6
# -*- coding: utf-8 -*-

from pathlib import Path
from io import BytesIO
from urllib.request import Request, urlopen
from urllib.error import URLError
import tarfile


def get_url_response( url ):
    req = Request( url )
    try:
        response = urlopen( req )
    except URLError as e:
        if hasattr( e, 'reason' ):
            print( 'We failed to reach a server.' )
            print( 'Reason: ', e.reason )
    else:
        # everything is fine
        return response
url = 'https://dl.opendesktop.org/api/files/download/id/1563810337/s/a095c4c96dfa61e69c56b78027672158576c35bf6c89887a36ea8e4fbddefe0c8137840a06d21a234218ee2ad8c633334356669b3b2d49440722a114f45fbada/t/1566985818/lt/download/Cupertino.tar.xz'
dst = Path().cwd() / 'tmp'

response = get_url_response( url )
print( response.info() )
with tarfile.open( fileobj=BytesIO( response.read() ), mode='r:xz' ) as tfile:
    tfile.extractall( path=dst )

Output:

Date: Wed, 04 Sep 2019 15:47:01 GMT
Server: Apache
Vary: Host,Accept-Encoding
Set-Cookie: PlingItId=tc3gg9is7kq9ta5ljlcer6f0r3; expires=Thu, 03-Sep-2020 15:47:01 GMT; Max-Age=31536000; path=/; domain=www.pling.com; HttpOnly
Expires: Wed, 04 Sep 2019 16:17:01 GMT
Cache-Control: private, no-cache, must-revalidate
Pragma: no-cache
X-Frame-Options: ALLOWALL
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html


Traceback (most recent call last):
  File "/usr/lib/python3.6/tarfile.py", line 1700, in xzopen
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.6/tarfile.py", line 1619, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.6/tarfile.py", line 1482, in __init__
    self.firstmember = self.next()
  File "/usr/lib/python3.6/tarfile.py", line 2297, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/usr/lib/python3.6/tarfile.py", line 1092, in fromtarfile
    buf = tarfile.fileobj.read(BLOCKSIZE)
  File "/usr/lib/python3.6/lzma.py", line 200, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.6/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.6/_compression.py", line 103, in read
    data = self._decompressor.decompress(rawblock, size)
_lzma.LZMAError: Input format not supported by decoder

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/master/.0MySetup/setupUbuntu18.04/test_tar_2.py", line 30, in <module>
    with tarfile.open( fileobj=BytesIO( response.read() ), mode='r:xz' ) as tfile:
  File "/usr/lib/python3.6/tarfile.py", line 1589, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/usr/lib/python3.6/tarfile.py", line 1704, in xzopen
    raise ReadError("not an lzma file")
tarfile.ReadError: not an lzma file

gusreis1989 · September 4, 2019, 9:50pm

Two errors:

It points that the downloaded file is not a LZMA file, so the library tarfile does not support it.
The error is print( 'Error code: ', e.code ).

tesla33 · September 5, 2019, 2:36am

Thank you for identifying the errors.

On Error 2: I concur. This script was taken from documentation. After reading more about the urllib.error module, I realised that the exception urllib.error.URLError has only one attribute, i.e. “.reason”. I have removed

    elif hasattr( e, 'code'):
        print( 'The server couldn\'t fulfill the request.' )
        print( 'Error code: ', e.code )

On Error 1: I find myself in a dilemma. Originally, the website provided the hyperlink to the .tar.xz file as:

url = 'https://dl.opendesktop.org/api/files/download/id/1563810337/s/a095c4c96dfa61e69c56b78027672158576c35bf6c89887a36ea8e4fbddefe0c8137840a06d21a234218ee2ad8c633334356669b3b2d49440722a114f45fbada/t/1566985818/lt/download/Cupertino.tar.xz'

It worked. I was able to download the tar file. However, after some time the link failed to work. I just tried to download the same file manually “Cupertino.tar.xz” via the link in the download column. It now provided me with another hyperlink to successfully download the file.

url = 'https://dl.opendesktop.org/api/files/download/id/1563810337/s/f64a2cf2d0fe040db76c01a38a90882a060be2f8fafa5fd2892f9eef6b8dcd5e1b1c1bb772929586f67a071577fbd706d963886ec1d759f87dfe9492075a2a07/t/1567653482/c/8d9cc68f0dd7ffabeea461394efb9e8b72cc0162c73a65c41bceb8709a21d18b6571bd032384ec624eab379abfd69dfa038b8233f1b3c8c56e565e6f13402dbd/lt/download/Cupertino.tar.xz'

Why does the hyperlink change with time? How can I create a stable python script to download the .tar file?

opendesktop · September 5, 2019, 6:47pm

Hi tesla33,

we currently change the handling of direct dl urls to prevent misuse scripts from inflating download numbers.
Within the next 1-2 weeks, we are going to adjust the download system to allow such actions again, since registered user downloads and unknown external downloads will be handled seperately, unless we get also a user information together with the ocs api action.

tesla33 · September 12, 2019, 1:19am

Thank you for the good new. On the practical side, how would the changes that you will be implementing affect the code that I had posted above? Is there additional info that I need to provide?

milouse · July 6, 2020, 3:18pm

Any news on that? No direct link access is very problematic to ease packaging in linux distribution. Some developers don’t know how to use git tags and simply put some files here. It should be good to find a way to let anybody download content without web scraping. Thank you very much.

zayronXIO · August 17, 2020, 12:33pm

any news about this?

pierre2324 · December 14, 2021, 1:09am

Any news on how we can use direct download links?