Menu

Jared Spool – Parsing web pages with jQuery and Python

April 28, 2017 - jQuery, python

Today we will parse the library of videos on the page https://aycl.uie.com/.

Workflow

  1. Login to the page
  2. Reveal all links in seminars
  3. Use jQuery to get all links to subpages, load pages and set iframe vimeo src
  4. Copy string with all links to vimeo videos in console (Create a global variable, copy(temp0))
  5. Use python to break string of vimeo videos into list and download all videos with youtube_dl

Display all links to subpages on the web page

$('article .actions a.button').each(function() {console.log($(this)['0'].attributes[0].nodeValue);});

Create csv like file

var a = temp2.map((el,i,a)=>`${el.title}|${el.author}|${el.url}|${el.slides}|${el.vimeoSource}`);

GET ALL VIMEO LINKS FROM SEMINAR PAGES

Execute in console of: https://aycl.uie.com/ (first load all sessions to the page)

var vimeoStorage = '';
localStorage.setItem("vimeos", vimeoStorage);

$('article .actions a.button')
    .each(function () { /*   For each action button */

        var url = $(this)['0'].attributes[0].nodeValue; /* Get the href attribute with URL to subpage */

        $.get(url)   /*  Load each subpage */
            .done(function (data) { 
                var source = $("iframe", $(data)).attr("src"); /*  Get link to vimeo video in iframe */
                var vimeos = localStorage.getItem("vimeos"); /* Get string with vimeo urls from the browser */
                vimeos = vimeos.concat(source, ','); /* Add url to the string */
                localStorage.setItem("vimeos", vimeos); /* Save the string in browser*/
            });
    });
localStorage.getItem("vimeos"); /* Display all links */

GET ALL PRESENTATIONS DATA FROM SEMINAR PAGES

Execute in console of: https://aycl.uie.com/ (first load all sessions to the page)

var databaseplain = [];

$('article.row').each(function(i) {
    var item = {};
    item.author = $(this)['0'].childNodes[1].innerText; //console.log(author);
    item.title = $(this)['0'].childNodes[3].children[0].innerText; //console.log(title);
    item.url = $(this)['0'].childNodes[5].children[0].href; //console.log(url);

    // go to subpage and extract data
    $.get(item.url).done(function(data){
      item.vimeoSource = $("iframe", $(data)).attr("src"); //console.log(vimeoSource);
      item.slides = $("li.icon-download a", $(data)).attr("href") || 'No slides'; //console.log(slides);
    });
            databaseplain.push(item);
});

PYTHON Script to download Vimeo Videos

import youtube_dl
LINKS = "
# ADD ALL LINKS in lines
"
links = LINKS.split(',')   # change to array

for link in links:
    try:
        print(link)
        with youtube_dl.YoutubeDL() as ydl:
                ydl.download([link])
    except:
        print('Could not download the video in course: {}'.format(url))
print('DONE.')

Resource

Google Sheet