Python-基础-数据结构小结
只是一篇笔记, 梳理了下
====================
序列
string
基本数据结构, 不解释
可以看下我之前的笔记
list
基本数据结构, 不解释
可以看下我之前的笔记
tuple
基本数据结构, 不解释
可以看下我之前的笔记
namedtuple
在collections中, 从名字可以看出是命名的tuple
collections.namedtuple(typename, field_names[, verbose=False][, rename=False])
好处, 文档中提到
Named tuple instances do not have per-instance dictionaries, so they are lightweight and require no more memory than regular tuples.
和一般class+自定义__slots__
的功能类似, 不会给每个实例定义__dict__
, 可以节省内存
所以, 优点
1. 可读性更好, 可以当做轻量的类来使用(only attributes)
2. 节省内存
文档的例子
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'], verbose=True)
class Point(tuple):
'Point(x, y)'
__slots__ = ()
_fields = ('x', 'y')
def __new__(_cls, x, y):
'Create new instance of Point(x, y)'
return _tuple.__new__(_cls, (x, y))
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new Point object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != 2:
raise TypeError('Expected 2 arguments, got %d' % len(result))
return result
def __repr__(self):
'Return a nicely formatted representation string'
return 'Point(x=%r, y=%r)' % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values'
return OrderedDict(zip(self._fields, self))
def _replace(_self, **kwds):
'Return a new Point object replacing specified fields with new values'
result = _self._make(map(kwds.pop, ('x', 'y'), _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % kwds.keys())
return result
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
__dict__ = _property(_asdict)
def __getstate__(self):
'Exclude the OrderedDict from pickling'
pass
x = _property(_itemgetter(0), doc='Alias for field number 0')
y = _property(_itemgetter(1), doc='Alias for field number 1')
>>> p = Point(x=11, y=22)
>>> p.x, p.y
(11, 22)
>>> p[0], p[1]
(11, 22)
>>> p
Point(x=11, y=22)
array
数组, 和列表的区别是, 一个数组只能存储一种类型的数据(即数组中所有元素类型一致), 类型是有限的集合
相对的, 优点是: 节省内存
class array.array(typecode[, initializer])
# 其中, typecode
Type code C Type Python Type Minimum size in bytes
'c' char character 1
'b' signed char int 1
'B' unsigned char int 1
'u' Py_UNICODE Unicode character 2(see note)
'h' signed short int 2
'H' unsigned short int 2
'i' signed int int 2
'I' unsigned int long 2
'l' signed long int 4
'L' unsigned long long 4
'f' float float 4
'd' double float 8
使用实例
>>> import array
>>> l = array.array('i', [1,2,3,4,5])
>>> l
array('i', [1, 2, 3, 4, 5])
>>> len(l)
5
>>> l[0]
1
>>> l.index(3)
2
>>> l.insert(0, 0)
>>> l
array('i', [0, 1, 2, 3, 4, 5])
linked list
似乎要在Python中用这个的场景非常之少…..
也似乎有两种选择
- 自己写一个
- 用其他数据结构替代
具体可以看看这个 Python Linked List
set
base set
基本数据结构, 不解释
可以看下我之前的笔记
frozenset
标准库带, 简而言之: frozenset是set的不可变版本, 类似tuple和list的关系
frozenset可以作为字典键
>>> s = set([1,2,2,3])
>>> s.add(4)
>>> s.add(4)
>>> s
set([1, 2, 3, 4])
>>> hash(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>>
>>> s2 = frozenset([1, 2, 2, 3])
>>> s2
frozenset([1, 2, 3])
>>> s2.add(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
>>>
>>> hash(s2)
-7699079583225461316
dict
base dict
基本数据结构, 不解释
可以看下我之前的笔记
ordered dict
dict的子类, 会记住放入字典键值对的顺序, 文档
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>>
>>> d[3] = 'c'
>>> d[1] = 'b'
>>> d[2] = 'a'
>>> d
OrderedDict([(3, 'c'), (1, 'b'), (2, 'a')])
>>> d.items()
[(3, 'c'), (1, 'b'), (2, 'a')]
>>> d.items()
[(3, 'c'), (1, 'b'), (2, 'a')]
>>>
>>>
>>> d2 = dict()
>>> d2[3] = 'c'
>>> d2[1] = 'b'
>>> d2[2] = 'a'
>>> d2
{1: 'b', 2: 'a', 3: 'c'}
>>> d2.items()
[(1, 'b'), (2, 'a'), (3, 'c')]
default dict
defaultdict, 同样是dict的子类, 会自动设置value的默认值, 文档
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d
defaultdict(<type 'list'>, {})
>>> d['a'].append(1)
>>> d
defaultdict(<type 'list'>, {'a': [1]})
>>> d['a']
[1]
>>> d['a'].append(2)
>>> d['a']
[1, 2]
>>> d['notexists']
[]
others - MultiDict
一键多值的dict
>>> d = MultiDict([('a', 'b'), ('a', 'c')])
>>> d
MultiDict([('a', 'b'), ('a', 'c')])
>>> d['a']
'b'
>>> d.getlist('a')
['b', 'c']
>>> 'a' in d
True
others - CaseInsensitiveDict
key大小写不明感的dict
cid = CaseInsensitiveDict()
cid['Accept'] = 'application/json'
cid['aCCEPT'] == 'application/json' # True
others - CallbackDict
更新时会调用回调函数
stack
Python标准库没有stack实现, 如果要处理, 可以自己写一个, 或者使用现有数据结构替代
use list as stack
class Stack:
def __init__(self):
self.items = []
def isEmpty(self):
return len(self.items) == 0
def push(self, item):
self.items.append(item)
def pop(self):
return self.items.pop()
def peek(self):
return self.items[len(self.items) - 1]
def size(self):
return len(self.items)
queue
base queue
use list as queue
class Queue:
def __init__(self):
self.items = []
def isEmpty(self):
return len(self.items) == 0
def enqueue(self, item):
self.items.insert(0, item)
def dequeue(self):
return self.items.pop()
def size(self):
return len(self.items)
其他, python标准库中的Queue
模块, 文档
包含
Queue FIFO
LifoQueue LIFO
PriorityQueue 带优先级的
多用于多线程资源共享中(一般情况下很少用), 因为是线程安全的
deque
双端队列, 线程安全, 且左右两端出入队复杂度O(1), 文档
>>> from collections import deque
>>> d = deque([1, 2, 3])
>>> d
deque([1, 2, 3])
>>> d.append(4)
>>> d.appendleft(0)
>>> d
deque([0, 1, 2, 3, 4])
>>>
>>> d.pop()
4
>>> d
deque([0, 1, 2, 3])
>>> d.popleft()
0
>>> d
deque([1, 2, 3])
堆
最小堆实现
>>> import heapq
>>>
>>> h = []
>>> heapq.heappush(h, (5, 'e'))
>>> heapq.heappush(h, (1, 'a'))
>>> heapq.heappush(h, (2, 'b'))
>>> heapq.heappush(h, (4, 'd'))
>>> heapq.heappush(h, (3, 'c'))
>>>
>>> h
[(1, 'a'), (3, 'c'), (2, 'b'), (5, 'e'), (4, 'd')]
>>> heapq.heappop(h)
(1, 'a')
>>> h
[(2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')]
>>> heapq.heappop(h)
(2, 'b')
>>> h
[(3, 'c'), (5, 'e'), (4, 'd')]
树
标准库没有tree的实现
可以看看这本书的讲解 Introductory Programming in Python Advanced Data Structures: Trees
base tree
自己写一个>_<
import collections
def Tree():
return collections.defaultdict(Tree)
binary tree
二叉树, 关注下这个包 bintree
包括二叉树/红黑树/AVL树
图
这个暂时没有好的推荐, 一般处理成二维数组, 或者使用类机制实现节点/边
其他
计数counter
dict子类, 会记录某个key出现的次数, 在做计数/统计的时候非常有用
>>> from collections import Counter
>>> c = Counter('abracadabra')
>>> c
Counter({'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1})
>>> c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
bisect
bisect, 维持一个有序列表, 可以用于快速检索
>>> import bisect
>>> bisect.insort_left(l, 1)
>>> bisect.insort_left(l, 5)
>>> bisect.insort_left(l, 2)
>>> bisect.insort_left(l, 7)
>>> l
[1, 2, 5, 7]
>>> bisect.bisect_left(l, 2) # 返回位置或插入后的位置
1
>>> bisect.bisect_left(l, 20)
4
struct
处理和存储二进制数据的时候用到, 文档
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)