Aug 28, 2015

Python-基础-数据结构小结

只是一篇笔记, 梳理了下

====================

序列

string

基本数据结构, 不解释

可以看下我之前的笔记

list

基本数据结构, 不解释

可以看下我之前的笔记

tuple

基本数据结构, 不解释

可以看下我之前的笔记

namedtuple

在collections中, 从名字可以看出是命名的tuple

collections.namedtuple(typename, field_names[, verbose=False][, rename=False])

好处, 文档中提到

Named tuple instances do not have per-instance dictionaries, so they are lightweight and require no more memory than regular tuples.

和一般class+自定义__slots__的功能类似, 不会给每个实例定义__dict__, 可以节省内存

所以, 优点

1. 可读性更好, 可以当做轻量的类来使用(only attributes)
2. 节省内存

文档的例子

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'], verbose=True)
class Point(tuple):
    'Point(x, y)'

    __slots__ = ()

    _fields = ('x', 'y')

    def __new__(_cls, x, y):
        'Create new instance of Point(x, y)'
        return _tuple.__new__(_cls, (x, y))

    @classmethod
    def _make(cls, iterable, new=tuple.__new__, len=len):
        'Make a new Point object from a sequence or iterable'
        result = new(cls, iterable)
        if len(result) != 2:
            raise TypeError('Expected 2 arguments, got %d' % len(result))
        return result

    def __repr__(self):
        'Return a nicely formatted representation string'
        return 'Point(x=%r, y=%r)' % self

    def _asdict(self):
        'Return a new OrderedDict which maps field names to their values'
        return OrderedDict(zip(self._fields, self))

    def _replace(_self, **kwds):
        'Return a new Point object replacing specified fields with new values'
        result = _self._make(map(kwds.pop, ('x', 'y'), _self))
        if kwds:
            raise ValueError('Got unexpected field names: %r' % kwds.keys())
        return result

    def __getnewargs__(self):
        'Return self as a plain tuple.  Used by copy and pickle.'
        return tuple(self)

    __dict__ = _property(_asdict)

    def __getstate__(self):
        'Exclude the OrderedDict from pickling'
        pass

    x = _property(_itemgetter(0), doc='Alias for field number 0')

    y = _property(_itemgetter(1), doc='Alias for field number 1')


>>> p = Point(x=11, y=22)
>>> p.x, p.y
(11, 22)
>>> p[0], p[1]
(11, 22)
>>> p
Point(x=11, y=22)

array

python 2 library: array

数组, 和列表的区别是, 一个数组只能存储一种类型的数据(即数组中所有元素类型一致), 类型是有限的集合

相对的, 优点是: 节省内存

class array.array(typecode[, initializer])

# 其中, typecode
Type code   C Type                            Python Type Minimum size in bytes
'c'         char                              character   1
'b'         signed char                       int         1
'B'         unsigned char                     int         1
'u'         Py_UNICODE          Unicode character         2(see note)
'h'         signed short                      int         2
'H'         unsigned short                    int         2
'i'         signed int                        int         2
'I'         unsigned int                      long        2
'l'         signed long                       int         4
'L'         unsigned long                     long        4
'f'         float                             float       4
'd'         double                            float       8

使用实例

>>> import array
>>> l = array.array('i', [1,2,3,4,5])
>>> l
array('i', [1, 2, 3, 4, 5])
>>> len(l)
5
>>> l[0]
1
>>> l.index(3)
2
>>> l.insert(0, 0)
>>> l
array('i', [0, 1, 2, 3, 4, 5])

linked list

似乎要在Python中用这个的场景非常之少…..

也似乎有两种选择

自己写一个
用其他数据结构替代

具体可以看看这个 Python Linked List

set

base set

基本数据结构, 不解释

可以看下我之前的笔记

Python-基础-集合小结

frozenset

标准库带, 简而言之: frozenset是set的不可变版本, 类似tuple和list的关系

文档

frozenset可以作为字典键

>>> s = set([1,2,2,3])
>>> s.add(4)
>>> s.add(4)
>>> s
set([1, 2, 3, 4])
>>> hash(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>>
>>> s2 = frozenset([1, 2, 2, 3])
>>> s2
frozenset([1, 2, 3])
>>> s2.add(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
>>>
>>> hash(s2)
-7699079583225461316

dict

base dict

基本数据结构, 不解释

可以看下我之前的笔记

ordered dict

dict的子类, 会记住放入字典键值对的顺序, 文档

>>> from collections import OrderedDict
>>> d = OrderedDict()
>>>
>>> d[3] = 'c'
>>> d[1] = 'b'
>>> d[2] = 'a'
>>> d
OrderedDict([(3, 'c'), (1, 'b'), (2, 'a')])
>>> d.items()
[(3, 'c'), (1, 'b'), (2, 'a')]
>>> d.items()
[(3, 'c'), (1, 'b'), (2, 'a')]
>>>
>>>
>>> d2 = dict()
>>> d2[3] = 'c'
>>> d2[1] = 'b'
>>> d2[2] = 'a'
>>> d2
{1: 'b', 2: 'a', 3: 'c'}
>>> d2.items()
[(1, 'b'), (2, 'a'), (3, 'c')]

default dict

defaultdict, 同样是dict的子类, 会自动设置value的默认值, 文档

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d
defaultdict(<type 'list'>, {})
>>> d['a'].append(1)
>>> d
defaultdict(<type 'list'>, {'a': [1]})
>>> d['a']
[1]
>>> d['a'].append(2)
>>> d['a']
[1, 2]
>>> d['notexists']
[]

others - MultiDict

一键多值的dict

bottle里面的MultiDict

werkzeug里面的版本

>>> d = MultiDict([('a', 'b'), ('a', 'c')])
>>> d
MultiDict([('a', 'b'), ('a', 'c')])
>>> d['a']
'b'
>>> d.getlist('a')
['b', 'c']
>>> 'a' in d
True

others - CaseInsensitiveDict

key大小写不明感的dict

CaseInsensitiveDict

cid = CaseInsensitiveDict()
cid['Accept'] = 'application/json'
cid['aCCEPT'] == 'application/json'  # True

others - CallbackDict

更新时会调用回调函数

CallbackDict

stack

Python标准库没有stack实现, 如果要处理, 可以自己写一个, 或者使用现有数据结构替代

use list as stack

class Stack:
    def __init__(self):
        self.items = []

    def isEmpty(self):
        return len(self.items) == 0

    def push(self, item):
        self.items.append(item)

    def pop(self):
        return self.items.pop()

    def peek(self):
        return self.items[len(self.items) - 1]

    def size(self):
        return len(self.items)

queue

base queue

use list as queue

class Queue:
    def __init__(self):
        self.items = []

    def isEmpty(self):
        return len(self.items) == 0

    def enqueue(self, item):
        self.items.insert(0, item)

    def dequeue(self):
        return self.items.pop()

    def size(self):
        return len(self.items)

其他, python标准库中的Queue模块, 文档

包含

Queue    FIFO
LifoQueue  LIFO
PriorityQueue  带优先级的

多用于多线程资源共享中(一般情况下很少用), 因为是线程安全的

deque

双端队列, 线程安全, 且左右两端出入队复杂度O(1), 文档

>>> from collections import deque
>>> d = deque([1, 2, 3])
>>> d
deque([1, 2, 3])
>>> d.append(4)
>>> d.appendleft(0)
>>> d
deque([0, 1, 2, 3, 4])
>>>
>>> d.pop()
4
>>> d
deque([0, 1, 2, 3])
>>> d.popleft()
0
>>> d
deque([1, 2, 3])

堆

最小堆实现

文档

>>> import heapq
>>>
>>> h = []
>>> heapq.heappush(h, (5, 'e'))
>>> heapq.heappush(h, (1, 'a'))
>>> heapq.heappush(h, (2, 'b'))
>>> heapq.heappush(h, (4, 'd'))
>>> heapq.heappush(h, (3, 'c'))
>>>
>>> h
[(1, 'a'), (3, 'c'), (2, 'b'), (5, 'e'), (4, 'd')]
>>> heapq.heappop(h)
(1, 'a')
>>> h
[(2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')]
>>> heapq.heappop(h)
(2, 'b')
>>> h
[(3, 'c'), (5, 'e'), (4, 'd')]

树

标准库没有tree的实现

可以看看这本书的讲解 Introductory Programming in Python Advanced Data Structures: Trees

base tree

自己写一个>_<

import collections

def Tree():
    return collections.defaultdict(Tree)

binary tree

二叉树, 关注下这个包 bintree

包括二叉树/红黑树/AVL树

图

这个暂时没有好的推荐, 一般处理成二维数组, 或者使用类机制实现节点/边

其他

计数counter

Counter文档

dict子类, 会记录某个key出现的次数, 在做计数/统计的时候非常有用

>>> from collections import Counter
>>> c = Counter('abracadabra')
>>> c
Counter({'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1})
>>> c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]

bisect

bisect, 维持一个有序列表, 可以用于快速检索

>>> import bisect
>>> bisect.insort_left(l, 1)
>>> bisect.insort_left(l, 5)
>>> bisect.insort_left(l, 2)
>>> bisect.insort_left(l, 7)
>>> l
[1, 2, 5, 7]

>>> bisect.bisect_left(l, 2)  # 返回位置或插入后的位置
1
>>> bisect.bisect_left(l, 20)
4

struct

处理和存储二进制数据的时候用到, 文档

>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)