May 27, 2020

Summary of common commands for visiting grab web pages by python

python visits the common command to crawl web pages

Simple crawling of web pages:

import urllib.request
url="http://google.cn/"
response=urllib.request.urlopen(url)  # Return file object
page=response.read()

Save URL as a local file directly:

import urllib.request
url="http://google.cn/"
response=urllib.request.urlopen(url)  # Return file object
page=response.read()

POST way:

import urllib.parse
import urllib.request

url="http://liuxin-blog.appspot.com/messageboard/add"

values={"content":" The command line issues a web request test "}
data=urllib.parse.urlencode(values)

# Create the request object
req=urllib.request.Request(url,data)
# Get the data returned by the server
response=urllib.request.urlopen(req)
# Process the data
page=response.read()

GET way:

import urllib.parse
import urllib.request

url="http://www.google.cn/webhp"

values={"rls":"ig"}
data=urllib.parse.urlencode(values)

theurl=url+"?"+data
# Create the request object
req=urllib.request.Request(theurl)
# Get the data returned by the server
response=urllib.request.urlopen(req)
# Process the data
page=response.read()

There are two common methods,geturl(),info()

geturl() is set to tell if there is a server-side url redirect, while info() contains information for series 1.

encode() encoding and dencode() decoding will be used in the processing of Chinese problems:

Thank you for reading, I hope to help you, thank you for your support of this site!