于贵洋的博客

BI、数据分析


  • 首页

  • 分类

  • 标签

  • 归档

  • 站点地图

  • 公益404

  • 关于

  • 搜索

numpy手册(2)-常用操作杂记

发表于 2017-08-06 | 分类于 Python-Numpy

Python
Numpy知识总结


这里记录下numpy常用的一些操作,一些散乱的知识点。

数组和标量之间的运算

就是对数组进行批量的运算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import numpy as np
a = np.arange(15).reshape(3,5)
a
Out[3]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
a+2
Out[4]:
array([[ 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16]])
a+a
Out[5]:
array([[ 0, 2, 4, 6, 8],
[10, 12, 14, 16, 18],
[20, 22, 24, 26, 28]])

阅读全文 »

Python基础(5)- csv

发表于 2017-08-05 | 分类于 Python-基础

这里简单介绍下Python中的csv模块,应该蛮常用的。
和csv有关,一定要回合打开文件这类操作有关,这里先看下这个open函数
官方文档:https://docs.python.org/3/library/functions.html#open

1.open

open(file, mode=’r’, buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)Open file and return a corresponding file object. If the file cannot be opened, an OSError is raised.

file,就是我们要打开的文件地址;
mode,就是打开的方式,默认是只读文本(’rt’)

newline,和换行符有关,在网上找了个资料:https://www.zhihu.com/question/19751023

换行符看起来有点儿乱,以后如果一段问题了再研究下。
下面,我们先来看看csv

2. csv.reader

csv.reader(csvfile, dialect=’excel’, **fmtparams)Return a reader object which will iterate over lines in the given csvfile.

我们的csv文件是这样的

1
2
3
4
5
6
7
8
9
10
11
import csv
with open(r'D:\document\python_demo\employee_data.csv') as csvfile:
emp_reader = csv.reader(csvfile)
for row in emp_reader:
print(row)
##
runfile('D:/document/python_demo/demo_open.py', wdir='D:/document/python_demo')
['lufei', '20', 'leader', 'onepiece', '100']
['namei', '19', 'teacher', 'onepiece', '999']

就csv文件来说,会有几个特点,比如字段之间的分隔符,换行符等,我们使用上面的dialect来指定
如果我们,现在将分隔符,替换为^

我们再次执行,就无法正确分割数据了

1
2
3
runfile('D:/document/python_demo/demo_open.py', wdir='D:/document/python_demo')
['lufei^20^leader^onepiece^100']
['namei^19^teacher^onepiece^999']

我们修改下代码,加上delimiter就行了,详情参考官网:https://docs.python.org/3/library/csv.html#csv-fmt-params

1
2
3
4
5
6
7
8
9
10
11
import csv
with open(r'D:\document\python_demo\employee_data.csv') as csvfile:
emp_reader = csv.reader(csvfile,delimiter='^')
for row in emp_reader:
print(row)
##
runfile('D:/document/python_demo/demo_open.py', wdir='D:/document/python_demo')
['lufei', '20', 'leader', 'onepiece', '100']
['namei', '19', 'teacher', 'onepiece', '999']

这时候,如果我们的数据中,含有分隔符,我们需要再加上封闭符,一般都会使用双引号,这里使用参数quotechar指定,默认是双引号

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
csv data:
lufei^20^leader^$one^_^piece$^100
namei^19^teacher^onepiece^999
import csv
with open(r'D:\document\python_demo\employee_data.csv') as csvfile:
emp_reader = csv.reader(csvfile,delimiter='^',quotechar='$')
for row in emp_reader:
print(row)
result:
runfile('D:/document/python_demo/demo_open.py', wdir='D:/document/python_demo')
['lufei', '20', 'leader', 'one^_^piece', '100']
['namei', '19', 'teacher', 'onepiece', '999']

3.csv.writer

csv.writer(csvfile, dialect=’excel’, **fmtparams)Return a writer object responsible for converting the user’s data into delimited strings on the given file-like object. csvfile can be any object with a write() method.

这里的用法都差不多,我们简单举个小例子,用官网的例子

1
2
3
4
5
6
7
8
9
with open('eggs.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely |Spam', 'Wonderful Spam'])
result:
|Spam| |Spam| |Spam| |Spam| |Spam| |Baked Beans|
|Spam| |Lovely ||Spam| |Wonderful Spam|

这里用到了另一个参数:quoting,这个参数是针对quotechar来说的,
quotechar在ETL工具中叫做封闭符,是为了防止字段内容中出现分割符,我们需要区分到底是分隔符,还是字段内容,所以需要根据quotechar去判断;
quoting则是控制在什么情况下使用封闭符,他有几个选项

1
2
3
4
csv.QUOTE_ALL #所有字段都添加封闭符
csv.QUOTE_NONNUMERIC #在非数值字段加封闭符
csv.QUOTE_NONE #所有字段都不加
csv.QUOTE_MINIMAL #只在出现分隔符的字段旁加封闭符,默认

好了,csv的就简单分享到这里了。

Python基础(4)- collections

发表于 2017-08-05 | 分类于 Python-基础

昨天用到了这个collections模块,挺好用的,这里记录下。
官网介绍:https://docs.python.org/3/library/collections.html
博客:廖雪峰的博客
这里介绍些好玩儿的例子。

namedtuple

collections.namedtuple(typename, field_names, *, verbose=False, rename=False, module=None)
Returns a new tuple subclass named typename. The new subclass is used to create tuple-like objects that have fields accessible by attribute lookup as well as being indexable and iterable. Instances of the subclass also have a helpful docstring (with typename and field_names) and a helpful repr() method which lists the tuple contents in a name=value format.

namedtuple是一个工厂函数,返回一个自定义的tuple类,可读性更强些。
通常我们使用tuple的时候,像这样

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
point_a = 1,3
point_b = 2,6
point_a
Out[37]: (1, 3)
point_b
Out[38]: (2, 6)
point_a[0]
Out[39]: 1
point_a[1]
Out[40]: 3

我们是那个namedtuple就可以这样了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from collections import namedtuple
Point = namedtuple('Point',['x','y'])
point_a = Point(2,2)
point_b = Point(3,3)
point_a
Out[45]: Point(x=2, y=2)
point_b
Out[46]: Point(x=3, y=3)
point_a.x
Out[47]: 2
point_b.y
Out[48]: 3

这样使用一个坐标位置,是不是可读性更强呢,而且用起来也很方便
我们可以看看这个Point是怎样定义的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
print(point_a._source)
from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict
class Point(tuple):
'Point(x, y)'
__slots__ = ()
_fields = ('x', 'y')
def __new__(_cls, x, y):
'Create new instance of Point(x, y)'
return _tuple.__new__(_cls, (x, y))
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new Point object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != 2:
raise TypeError('Expected 2 arguments, got %d' % len(result))
return result
def _replace(_self, **kwds):
'Return a new Point object replacing specified fields with new values'
result = _self._make(map(kwds.pop, ('x', 'y'), _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % list(kwds))
return result
def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + '(x=%r, y=%r)' % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values.'
return OrderedDict(zip(self._fields, self))
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
x = _property(_itemgetter(0), doc='Alias for field number 0')
y = _property(_itemgetter(1), doc='Alias for field number 1')

下面还有个更好用的地方,我们再读取CSV或者数据库的时候,会返回结果集,这个时候用起来更方便,比如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import csv
from collections import namedtuple
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')
for emp in map(EmployeeRecord._make, csv.reader(open(r'D:\document\python_demo\employee_data.csv'))):
print(emp.name, emp.title)
print('emp:',emp)
runfile('D:/document/python_demo/demo_hi.py', wdir='D:/document/python_demo')
lufei leader
emp: EmployeeRecord(name='lufei', age='20', title='leader', department='onepiece', paygrade='100')
namei teacher
emp: EmployeeRecord(name='namei', age='19', title='teacher', department='onepiece', paygrade='999')

_make

1
2
somenamedtuple._make(iterable)
Class method that makes a new instance from an existing sequence or iterable.

deque

我们使用list的时候,用下标查找很快,数据量大的时候,插入删除比较慢,deque是为了高效实现插入和删除的双向队列。

deque:double-ended queue

class collections.deque([iterable[, maxlen]])
Returns a new deque object initialized left-to-right (using append()) with data from iterable. If iterable is not specified, the new deque is empty.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from collections import deque
a = deque(list('abcdef'))
a
Out[80]: deque(['a', 'b', 'c', 'd', 'e', 'f'])
a.append('x')
a.append('y')
a
Out[83]: deque(['a', 'b', 'c', 'd', 'e', 'f', 'x', 'y'])
a.appendleft('w')
a
Out[85]: deque(['w', 'a', 'b', 'c', 'd', 'e', 'f', 'x', 'y'])
a.pop()
Out[86]: 'y'
a.popleft()
Out[87]: 'w'

这里扩展了很多方便的函数,appendleft(),popleft()等等

defaultdict

可以设置默认值的dict,平时我们使用dict的时候,如果key不存在,会报错

class collections.defaultdict([default_factory[, …]])
Returns a new dictionary-like object. defaultdict is a subclass of the built-in dict class. It overrides one method and adds one writable instance variable. The remaining functionality is the same as for the dict class and is not documented here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
a = {'name':'lufe','age':20}
a
Out[105]: {'age': 20, 'name': 'lufe'}
a['name']
Out[106]: 'lufe'
a['age']
Out[107]: 20
a['score']
Traceback (most recent call last):
File "<ipython-input-108-99f54e089332>", line 1, in <module>
a['score']
KeyError: 'score'

我们使用defaultdict就可以避免这个错误

1
2
3
4
5
6
7
8
9
10
11
from collections import defaultdict
b = defaultdict(int)
b['name']='lufei'
b
Out[123]: defaultdict(int, {'name': 'lufei'})
b['age']
Out[124]: 0

这里我们设置默认是int型,默认值为0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
x = defaultdict(0)
Traceback (most recent call last):
File "<ipython-input-125-dd2052e23af0>", line 1, in <module>
x = defaultdict(0)
TypeError: first argument must be callable or None
x = defaultdict(lambda : 100)
x
Out[127]: defaultdict(<function __main__.<lambda>>, {})
x['name']
Out[128]: 100

Counter

是一个简单的计数器,

class collections.Counter([iterable-or-mapping])
A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from collections import Counter
cnt = Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
cnt
Out[131]: Counter({'blue': 3, 'green': 1, 'red': 2})
cnt.most_common(1)
Out[132]: [('blue', 3)]
cnt.most_common(-1)
Out[133]: []
cnt.elements
Out[134]: <bound method Counter.elements of Counter({'blue': 3, 'red': 2, 'green': 1})>
cnt.most_common(3)[:-2:-1]
Out[137]: [('green', 1)]

这个most_common最好用了感觉,根据次数进行排名

当然,collections中还有很多其他的好用的类,我们可以参考官方文档。

numpy手册(1)-ndarray

发表于 2017-08-02 | 分类于 Python-Numpy

前面我们算是简单入门了Pandas,numpy也是数据分析中常用的,这里我们也来简单学习下。

1.numpy基本介绍


numpy是Python的一种开源数值计算扩展,这种工具可以用来存储和处理大型矩阵。
一个用Python实现的科学计算包。
from 百度百科

numpy有2种基本对象,

1
ndarray(N-dimensional array object)和 ufunc(universal function object)

ndarray是存储单一数据类型的多维数组,ufunc是能够对数组进行处理的函数。

阅读全文 »

Cognos资料汇总贴

发表于 2017-08-01 | 分类于 数据可视化-Cognos

以前搞过Cognos,写过很多基础的教程,应该是14年的样子,都在CSDN上,这里贴个汇总贴吧,想要看的同学可以去看看,希望有帮助。

ReportStudio入门教程:http://blog.csdn.net/column/details/ygy-reportstudio.html

Framework Manage入门教程:http://blog.csdn.net/column/details/ygy-frameworkmanager.html

Cognos函数手册:http://blog.csdn.net/column/details/ygy-cognos-function.html

Cognos相关的其他资料(主页不同的类别下看看):http://blog.csdn.net/yuguiyang1990

cognos-doc-main.png

好了,感兴趣的同学,可以自行去看看,好久不搞了,估计有疑问也解决不了了…

1…151617…23
于贵洋

于贵洋

111 日志
17 分类
30 标签
RSS
GitHub
友情链接
  • 很久之前的CSDN博客
0%
© 2017 于贵洋
由 Hexo 强力驱动
|
主题 — NexT.Pisces v5.1.3
Hosted by GitHub Pages
本站访客数 人次 本站总访问量 次