Python_Language

Python TutorialGetting Started with PythonPython Basic SyntaxPython DatatypesPython IndentationPython Collection TypesPython Basic Input and OutputPython Built in Modules and FunctionsPython FunctionsChemPy - python packageCreating Python packagesFunctional Programming in PythonIncompatibilities moving from Python 2 to Python 3IoT Programming with Python and Raspberry PIKivy - Cross-platform Python Framework for NUI DevelopmentMutable vs Immutable (and Hashable) in PythonPyInstaller - Distributing Python CodePython *args and **kwargsPython 2to3 toolPython Abstract Base Classes (abc)Python Abstract syntax treePython Alternatives to switch statement from other languagesPython and ExcelPython Anti-PatternsPython ArcPyPython ArraysPython Asyncio ModulePython Attribute AccessPython AudioPython Binary DataPython Bitwise OperatorsPython Boolean OperatorsPython Checking Path Existence and PermissionsPython ClassesPython CLI subcommands with precise help outputPython Code blocks, execution frames, and namespacesPython Collections modulePython Comments and DocumentationPython Common PitfallsPython Commonwealth ExceptionsPython ComparisonsPython Complex mathPython concurrencyPython ConditionalsPython configparserPython Context Managers (with Statement)Python Copying dataPython CountingPython ctypesPython Data SerializationPython Data TypesPython Database AccessPython Date and TimePython Date FormattingPython DebuggingPython DecoratorsPython Defining functions with list argumentsPython DeploymentPython Deque ModulePython DescriptorPython Design PatternsPython DictionaryPython Difference between Module and PackagePython DistributionPython DjangoPython Dynamic code execution with `exec` and `eval`Python EnumPython ExceptionsPython ExponentiationPython Files & Folders I/OPython FilterPython FlaskPython Functools ModulePython Garbage CollectionPython GeneratorsPython getting start with GZipPython graph-toolPython groupby()Python hashlibPython HeapqPython Hidden FeaturesPython HTML ParsingPython HTTP ServerPython IdiomsPython ijsonPython Immutable datatypes(int, float, str, tuple and frozensets)Python Importing modulesPython Indexing and SlicingPython Input, Subset and Output External Data Files using PandasPython Introduction to RabbitMQ using AMQPStorm



Python HTML Parsing

From WikiOD

Using CSS selectors in BeautifulSoup[edit | edit source]

BeautifulSoup has a limited support for CSS selectors, but covers most commonly used ones. Use select() method to find multiple elements and select_one() to find a single element.

Basic example:

from bs4 import BeautifulSoup

data = """
<ul>
    <li class="item">item1</li>
    <li class="item">item2</li>
    <li class="item">item3</li>
</ul>
"""

soup = BeautifulSoup(data, "html.parser")

for item in soup.select("li.item"):
    print(item.get_text())

Prints:

item1
item2
item3

Locate a text after an element in BeautifulSoup[edit | edit source]

Imagine you have the following HTML:

<div>
    <label>Name:</label>
    John Smith
</div>

And you need to locate the text "John Smith" after the label element.

In this case, you can locate the label element by text and then use .next_sibling property:

from bs4 import BeautifulSoup

data = """
<div>
    <label>Name:</label>
    John Smith
</div>
"""

soup = BeautifulSoup(data, "html.parser")

label = soup.find("label", text="Name:")
print(label.next_sibling.strip())

Prints John Smith.

PyQuery[edit | edit source]

pyquery is a jquery-like library for python. It has very well support for css selectors.

from pyquery import PyQuery

html = """
<h1>Sales</h1>
<table id="table">
<tr>
    <td>Lorem</td>
    <td>46</td>
</tr>
<tr>
    <td>Ipsum</td>
    <td>12</td>
</tr>
<tr>
    <td>Dolor</td>
    <td>27</td>
</tr>
<tr>
    <td>Sit</td>
    <td>90</td>
</tr>
</table>
"""

doc = PyQuery(html)

title = doc('h1').text()

print title

table_data = []

rows = doc('#table > tr')
for row in rows:
    name = PyQuery(row).find('td').eq(0).text()
    value = PyQuery(row).find('td').eq(1).text()

    print "%s\t  %s" % (name, value)

Credit:Stack_Overflow_Documentation