资讯 小学 初中 高中 语言 会计职称 学历提升 法考 计算机考试 医护考试 建工考试 教育百科
栏目分类:
子分类:
返回
空麓网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
空麓网 > 计算机考试 > 面试经验 > 面试问答

Python:无法在网页中使用selenium下载

面试问答 更新时间: 发布时间: 计算机考试归档 最新发布

Python:无法在网页中使用selenium下载

我认为您的主要问题可能是错误的模仿类型,但是,您的脚本有系统性问题的日志,这会使它充其量是不可靠的。此重写使用显式等待,这完全消除了对use的需要time.sleep(),从而使其能够尽可能快地运行,同时还消除了由于网络拥塞而导致的错误。

您需要执行以下操作以确保已安装所有模块:

pip install requests explicit selenium retry pyvirtualdisplay

The script:

#!/usr/bin/pythonfrom __future__ import print_function  # Makes your pre portableimport osimport globimport zipfilefrom contextlib import contextmanagerimport requestsfrom retry import retryfrom explicit import waiter, XPATH, IDfrom selenium import webdriverfrom pyvirtualdisplay import Displayfrom selenium.webdriver.common.keys import Keysfrom selenium.webdriver.support.wait import WebDriverWaitDOWNLOAD_DIR = "/tmp/shKLSE/"def build_profile():    profile = webdriver.FirefoxProfile()    profile.set_preference('browser.download.folderList', 2)    profile.set_preference('browser.download.manager.showWhenStarting', False)    profile.set_preference('browser.download.dir', DOWNLOAD_DIR)    # I think your `/zip` mime type was incorrect. This works for me    profile.set_preference('browser.helperApps.neverAsk.saveToDisk',     'application/vnd.ms-excel,application/zip')    return profile# Retry is an elegant way to retry the browser creation# Though you should narrow the scope to whatever the actual exception is you are# retrying on@retry(Exception, tries=5, delay=3)@contextmanager  # This turns get_browser into a context managerdef get_browser():    # Use a context manager with Display, so it will be closed even if an    # exception is thrown    profile = build_profile()    with Display(visible=0, size=(800, 600)):        browser = webdriver.Firefox(profile)        print("firefox")        try: yield browser        finally: # Let a try/finally block manage closing the browser, even if an # exception is called browser.quit()def main():    print("hello from python 2")    with get_browser() as browser:        browser.get("https://www.shareinvestor.com/my")        # Click the login button        # waiter is a helper function that makes it easy to use explicit waits        # with it you dont need to use time.sleep() calls at all        login_xpath = '//*/div[@]/a'        waiter.find_element(browser, login_xpath, XPATH).click()        print(browser.current_url)        # Log in        username = "bkcollection"        username_id = "sic_login_header_username"        password = "123456"        password_id = "sic_login_header_password"        waiter.find_write(browser, username_id, username, by=ID)        waiter.find_write(browser, password_id, password, by=ID, send_enter=True)        # Wait for login process to finish by locating an element only found        # after logging in, like the Logged In Nav        nav_id = 'sic_loggedInNav'        waiter.find_element(browser, nav_id, ID)        print("log in done")        # Load the target page        target_url = ("https://www.shareinvestor.com/prices/price_download.html#/?""type=price_download_all_stocks_bursa")        browser.get(target_url)        print(browser.current_url)        # CLick download button        all_data_xpath = ("//*[@href='/prices/price_download_zip_file.zip?"    "type=history_all&market=bursa']")        waiter.find_element(browser, all_data_xpath, XPATH).click()        # This is a bit challenging: You need to wait until the download is complete        # This file is 220 MB, it takes a while to complete. This method waits until        # there is at least one file in the dir, then waits until there are no        # filenames that end in `.part`        # Note that is is problematic if there is already a file in the target dir. I        # suggest looking into using the tempdir module to create a unique, temporary        # directory for downloading every time you run your script        print("Waiting for download to complete")        at_least_1 = lambda x: len(x("{0}/*.zip*".format(DOWNLOAD_DIR))) > 0        WebDriverWait(glob.glob, 300).until(at_least_1)        no_parts = lambda x: len(x("{0}/*.part".format(DOWNLOAD_DIR))) == 0        WebDriverWait(glob.glob, 300).until(no_parts)        print("Download Done")        # Now do whatever it is you need to do with the zip file        # zip_ref = zipfile.ZipFile(DOWNLOAD_DIR, 'r')        # zip_ref.extractall(DOWNLOAD_DIR)        # zip_ref.close()        # os.remove(zip_ref)        print("Done!")if __name__ == "__main__":    main()

完全公开:我维护显式模块。它旨在使显式等待变得更容易,因为在这种情况下,网站会根据用户交互缓慢加载动态内容。您可以用直接显式等待替换上面的所有waiter.XXX调用。



转载请注明:文章转载自 http://www.konglu.com/
本文地址:http://www.konglu.com/it/375423.html
免责声明:

我们致力于保护作者版权,注重分享,被刊用文章【Python:无法在网页中使用selenium下载】因无法核实真实出处,未能及时与作者取得联系,或有版权异议的,请联系管理员,我们会立即处理,本文部分文字与图片资源来自于网络,转载此文是出于传递更多信息之目的,若有来源标注错误或侵犯了您的合法权益,请立即通知我们,情况属实,我们会第一时间予以删除,并同时向您表示歉意,谢谢!

我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2023 成都空麓科技有限公司

ICP备案号:蜀ICP备2023000828号-2