pyppeteer简明教程
Puppeteer它是一个Node库,提供了一个高级的API来控制DevTools协议上的无头版Chrome,可以自动化控制浏览器运行。pyppeteer是python版的实现。
centos7安装pyppeteer
yum -y install chromium
python3 -m pip install pyppeteer
测试
import asyncio
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
await page.goto('https://sxy91.com')
await page.screenshot({'path': 'sxy.png'})
await asyncio.sleep(30)
title = await page.title()
print(title)
asyncio.get_event_loop().run_until_complete(main())
会话关闭的解决办法
20秒不操作后会话关闭,会出现错误:Session closed. Most likely the page has been closed
pyppeteer的开发团队似乎比较忙,还没修复。可参考pyppeteer#159,修改源码pyppeteer/connection.py
,替换第44行源码。
self._ws = websockets.client.connect(
- self._url, max_size=None, loop=self._loop)
+ self._url, max_size=None, loop=self._loop,ping_interval=None, ping_timeout=None)
使用pyppeteer自动发送微博的例子
import asyncio
from pyppeteer import launch
home_url = 'https://weibo.com/'
ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
CHROME_PATH = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
img = '/Users/songyangcong/Pictures/6781573623694_.pic.jpg'
async def sendimg(path,headless=False,devtools=False):
browser = await launch(executablePath=path,headless=headless,devtools=devtools,userDataDir='./tmp/userdata',args=['--no-sandbox','--disable-infobars'])
pages = await browser.pages()
if len(pages) < 1:
page = await self.browser.newPage()
else:
page = pages[0]
await page.setViewport({'width':1366,'height':768})
await page.setUserAgent(ua)
await page.goto(home_url)
await asyncio.sleep(15) # 等待15秒,先手动登录一下
await page.type('textarea',"测试微博自动发送内容")
fileInput = await page.J('input[type=file]')
await fileInput.uploadFile(img) # pyppeteer upload image 发布图片
await asyncio.sleep(2)
await page.click('a[node-type="submit"]') # 点击发布
#await page.click('a[node-type="submit"]')
await asyncio.sleep(100)
def test():
asyncio.get_event_loop().run_until_complete(sendimg(CHROME_PATH))
if __name__ == '__main__':
test()
常用参数说明
- executablePath:运行Chromium或Chrome可执行文件的路径,而不是默认捆绑的Chromium
- headless:是否使用无头模式(无界面)运行
- devtools:是否打开开发者调试工具,打开后忽略headless参数自动改成False
- userDataDir:自动保存浏览器数据到磁盘,登录一次后可免登录(记录cookies)
参考