Python库积累之Selenium（二）-Selenium中的一些问题与解决方法

本文为使用selenium中遇到的一些问题与解决方法，基础部分请查看这篇文章：http://smilecoc.vip/2021/07/25/Python_Selenium_part1/

1.selenium中出现提示’Your connection is not private’（你的连接不是私密连接）

在Chrome中需要添加忽略认证错误：

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('ignore-certificate-errors')

driver = webdriver.Chrome(chrome_options=options)
driver.get('https://cacert.org/')

driver.close()

在 Firefox中设置accept_untrusted_certs为True：

from selenium import webdriver

profile = webdriver.FirefoxProfile()
profile.accept_untrusted_certs = True

driver = webdriver.Firefox(firefox_profile=profile)
driver.get('https://cacert.org/')

driver.close()

原回答：https://stackoverflow.com/questions/24507078/how-to-deal-with-certificates-using-selenium

2.鼠标悬停与选择下拉

下拉列表的时候，存在两种情况。第一种是有select标签的，这种情况下可以通过from selenium.webdriver.support.ui import Select方式实现

具体selenium代码为：

'''
第一种情形：可以通过 from selenium.webdriver.support.ui import Select
'''

from selenium import webdriver
from selenium.webdriver.support.ui import Select

driver = webdriver.Chrome()
driver.get('https://www.17sucai.com/pins/demo-show?id=5926')
# 切换ifrane
driver.switch_to_frame(driver.find_element_by_id('iframe'))
# 找到下拉框
selectTag = Select(driver.find_element_by_name('country-wrap'))  # select标签
# 获得选择项
# 1.根据值来选择
selectTag.select_by_value('CA')
# 2.根据索引来选择
# selectTag.select_by_index(3)

但也存在没有select标签的下拉列表，这时候就需要我们手动链接到该位置。如图片情形所示，就是a标签，不是select标签，无法通过from selenium.webdriver.support.ui import Select方式实现

'''
第二种情形：手动点击
'''

from selenium import webdriver
from selenium.webdriver.support.ui import Select

driver = webdriver.Chrome()
driver.get('https://www.17sucai.com/pins/demo-show?id=5926')
# 切换ifrane
driver.switch_to_frame(driver.find_element_by_id('iframe'))
# 找到下拉框
selectTag = driver.find_element_by_xpath('//*[@id="dk_container_country-nofake"]').click() # 点击下拉列表位置
# 获得下拉选择项
driver.find_element_by_xpath('//*[@id="dk_container_country-nofake"]/div/ul/li[1]/a').click()

原文：https://blog.csdn.net/Claire_chen_jia/article/details/106523131

3.下载文件中文乱码/将浏览器设置为中文/改变编码

如果下载中文文件后文件名为乱码，则需要配置对应浏览器设置

options.add_argument('lang=zh_CN.UTF-8')

selenium+python配置chrome浏览器详解https://blog.csdn.net/zwq912318834/article/details/78933910

4.不显示UI调用浏览器

在不打开UI界面的情况下使用 Chrome 浏览器。用法：

option=webdriver.ChromeOptions()
option.add_argument('headless')
driver=webdriver.Chrome(chrome_options=option)

5.直接用cookie登录方法

先手动获取网页的cookie，将其序列化并存储在本地
使用到一个chrome插件EditThisCookiehttp://www.editthiscookie.com/
它有个导出功能，当你登录完后点击导出便会得到一个list格式的字符串,稍加修改就可以作为python的list来导入cookie了

#导入cookie
for item in cookies:
    driver.add_cookie(item)

https://www.jianshu.com/p/773c58406bdb

6.selenium下载文件到指定的文件夹

在爬虫的时候会遇到下载文件的情况，这时候如果用Chrome浏览器点击下载，文件会自动存放到默认文件夹，一般是我的电脑>下载这个路径，如果我们想下载到指定文件夹，有没有办法呢？，可以试试下面的方法，在启动driver的时候就指定一个默认下载路径

from selenium import webdriver
options = webdriver.ChromeOptions()
out_path = r'D:\Projects\Spiders'  # 是你想指定的路径
prefs = {'profile.default_content_settings.popups': 0, 'download.default_directory': out_path}
options.add_experimental_option('prefs', prefs)
browser = webdriver.Chrome(executable_path=r'D:\Repo 3\chromedriver.exe', chrome_options=options)

7.判断文件是否下载完成

https://stackoverflow.com/questions/34338897/python-selenium-find-out-when-a-download-has-completed