0%

python绘图-文本数据读取方法

python绘图-文本读取

写在最前面-pandas

python加载数据用于绘图,方法比较多,但建议采用pandas读取excelcsv数据并进行处理和绘图。

Getting started tutorials — pandas 1.2.4 documentation (pydata.org)

read_*to_*

  • csv
  • excel
  • sql
  • json
  • parquet
  • ...

pd.dtypes

pd.info()

pd.head()

pd.tail()

  • Import the package, aka import pandas as pd
  • A table of data is stored as a pandas DataFrame
  • Each column in a DataFrame is a Series
  • You can do things by applying a method to a DataFrame or Series
  • Getting data in to pandas from many different file formats or data sources is supported by read_* functions.
  • Exporting data out of pandas is provided by different to_*methods.
  • The head/tail/info methods and the dtypes attribute are convenient for a first check.
  • When selecting subsets of data, square brackets [] are used.
  • Inside these brackets, you can use a single column/row label, a list of column/row labels, a slice of labels, a conditional expression or a colon.
  • Select specific rows and/or columns using loc when using the row and column names
  • Select specific rows and/or columns using iloc when using the positions in the table
  • You can assign new values to a selection based on loc/iloc.
  • The .plot.* methods are applicable on both Series and DataFrames
  • By default, each of the columns is plotted as a different element (line, boxplot,…)
  • Any plot created by pandas is a Matplotlib object.
  • Create a new column by assigning the output to the DataFrame with a new column name in between the [].
  • Operations are element-wise, no need to loop over rows.
  • Use rename with a dictionary or function to rename row labels or column names.
  • Aggregation statistics can be calculated on entire columns or rows
  • groupby provides the power of the split-apply-combine pattern
  • value_counts is a convenient shortcut to count the number of entries in each category of a variable
  • Sorting by one or more columns is supported by sort_values
  • The pivot function is purely restructuring of the data, pivot_table supports aggregations
  • The reverse of pivot (long to wide format) is melt (wide to long format)
  • Multiple tables can be concatenated both column-wise and row-wise using the concat function.
  • For database-like merging/joining of tables, use the merge function.
  • Valid date strings can be converted to datetime objects using to_datetime function or as part of read functions.
  • Datetime objects in pandas support calculations, logical operations and convenient date-related properties using the dt accessor.
  • A DatetimeIndex contains these date-related properties and supports convenient slicing.
  • Resample is a powerful method to change the frequency of a time series.
  • String methods are available using the str accessor.
  • String methods work element-wise and can be used for conditional indexing.
  • The replace method is a convenient method to convert values according to a given dictionary.

下面是网络上查到的一些数据加载方法

excel文本读取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np
import matplotlib.pyplot as plt
import xlrd # 需要用1.2.0版本,高版本无法读取xlxs

## 将当前路径项目添加到系统搜索路径中
import sys
sys.path.append('..')


plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号


## 或者采用os添加子文件夹
import os
filename = 'spectrum.xlsx'
workbook = xlrd.open_workbook(os.path.join('.\spectrum',filename))
x_data =[]
y_data1 =[]
y_data2 =[]
y_data3 =[]
y_data4 =[]

for s in workbook.sheets():
# print('Sheet:',s.name)
for row in range(s.nrows):
# print('the row is:',row)
values = []
for col in range(s.ncols):
values.append(s.cell(row,col).value)
print(values)
x_data.append(values[0])
y_data1.append(values[1])
y_data2.append(values[2])
y_data3.append(values[3])
y_data4.append(values[4])

pandas读取excel数据

1
2
3
4
5
#https://blog.csdn.net/weixin_38546295/article/details/83537558
import pandas as pd
io = r'C:\Users\Administrator\Desktop\data.xlsx'
data = pd.read_excel(io, sheet_name = 1)
data.head()

txt文本读取

单列数据读取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from lmfit import Model

plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
# Define function
def func(x, a):
return 1. - np.exp(a * x)


# Read data from text
x = np.linspace(-30, 30, 61)
y1 = np.loadtxt('data_s.txt')
y2 = np.loadtxt('data_f.txt')
y3 = np.loadtxt('data_ff.txt')

多列数据读取

逗号分割情况

6.1101,17.592

5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
5.734,3.1551
8.4084,7.2258
5.6407,0.71618
5.3794,3.5129
6.3654,5.3048
5.1301,0.56077
6.4296,3.6518
7.0708,5.3893
6.1891,3.1386
20.27,21.767
5.4901,4.263
6.3261,5.1875
5.5649,3.0825
18.945,22.638
12.828,13.501
10.957,7.0467
13.176,14.692
22.203,24.147
5.2524,-1.22
6.5894,5.9966
9.2482,12.134
5.8918,1.8495
8.2111,6.5426
7.9334,4.5623
8.0959,4.1164
5.6063,3.3928
12.836,10.117
6.3534,5.4974
5.4069,0.55657
6.8825,3.9115
11.708,5.3854
5.7737,2.4406
7.8247,6.7318
7.0931,1.0463
5.0702,5.1337
5.8014,1.844
11.7,8.0043
5.5416,1.0179
7.5402,6.7504
5.3077,1.8396
7.4239,4.2885
7.6031,4.9981
6.3328,1.4233
6.3589,-1.4211
6.2742,2.4756
5.6397,4.6042
9.3102,3.9624
9.4536,5.4141
8.8254,5.1694
5.1793,-0.74279
21.279,17.929
14.908,12.054
18.959,17.054
7.2182,4.8852
8.2951,5.7442
10.236,7.7754
5.4994,1.0173
20.341,20.992
10.136,6.6799
7.3345,4.0259
6.0062,1.2784
7.2259,3.3411
5.0269,-2.6807
6.5479,0.29678
7.5386,3.8845
5.0365,5.7014
10.274,6.7526
5.1077,2.0576
5.7292,0.47953
5.1884,0.20421
6.3557,0.67861
9.7687,7.5435
6.5159,5.3436
8.5172,4.2415
9.1802,6.7981
6.002,0.92695
5.5204,0.152
5.0594,2.8214
5.7077,1.8451
7.6366,4.2959
5.8707,7.2029
5.3054,1.9869
8.2934,0.14454
13.394,9.0551
5.4369,0.61705
————————————————
版权声明:本文为CSDN博主「dazuo01」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/dazuo01/article/details/20841909

1
2
3
4
5
6
7
8
9
10
11
12
13
## 从.txt文件中读取数据
def loadData(flieName):
inFile = open(flieName, 'r')#以只读方式打开某fileName文件

#定义两个空list,用来存放文件中的数据
X = []
y = []
for line in inFile:
trainingSet = line.split(',') #对于每一行,按','把数据分开,这里是分成两部分
X.append(trainingSet[0]) #第一部分,即文件中的第一列数据逐一添加到list X 中
y.append(trainingSet[1]) #第二部分,即文件中的第二列数据逐一添加到list y 中

return (X, y) # X,y组成一个元组,这样可以通过函数一次性返回

datat.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
0	93
30 96
60 84
90 84
120 48
150 38
180 51
210 57
240 40
270 45
300 50
330 75
360 80
390 60
420 72
450 67
480 71
510 7
540 74
570 63
600 69
  • 切片方法:
1
2
x = a[:,0] # 取第一列数据
y = a[:,1] # 取第二列数据

另外一种使用 pandas 切片的方法:

方法1:使用 np.loadtxt( ) 方法读取数据

Pandas 读取数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# code:utf-8  	Ubuntu
import matplotlib.pyplot as plt
from scipy import interpolate
import numpy as np

import matplotlib.font_manager as mpt
zhfont=mpt.FontProperties(fname='/usr/share/fonts/custom/msyh.ttf') #显示中文字体
#导入数据
file = 'data.txt'
a = np.loadtxt(file)
# 数组切片
x = a[:,0] # 取第一列数据
y = a[:,1] # 取第二列数据
# 进行样条插值
tck = interpolate.splrep(x,y)
xx = np.linspace(min(x),max(x),100)
yy = interpolate.splev(xx,tck,der=0)
print(xx)
# 画图
plt.plot(x,y,'o',xx,yy)
plt.legend(['true','Cubic-Spline'])
plt.xlabel('距离(cm)', fontproperties=zhfont) #注意后面的字体属性
plt.ylabel('%')
plt.title('管线仪实测剖面图', fontproperties=zhfont)
# 保存图片
plt.savefig('out.jpg')
plt.show()

方法2:使用 Pandas 读取数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# code:utf-8  	Windows 7 Utilmate
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy import interpolate

# 解决中文显示问题
import matplotlib as mpl
mpl.rcParams["font.sans-serif"] = ["SimHei"]
mpl.rcParams["axes.unicode_minus"] = False

# 导入数据
file = pd.read_csv('data.txt',sep='\s+',
header = None, skiprows=[17],
# skiprows 跳过第18行
names = ['x', 'value'])

data = pd.DataFrame(file)
# 数组切片
x = data['x'] # 取第一列数据
y = data['value'] # 取第二列数据
# 进行样条插值
tck = interpolate.splrep(x,y)
xx = np.linspace(min(x),max(x),100)
yy = interpolate.splev(xx,tck,der=0)
print(yy)
# 画图
plt.plot(x,y,'o',xx,yy)
plt.legend(['true','Cubic-Spline'])
plt.xlabel('距离(cm)')
plt.ylabel('%')
plt.title('管线仪实测剖面图')
# 保存图片
plt.savefig('out2.png',dpi=600 )
# 设置需要保存图片的分辨率
plt.show()

csv数据读取

1
2
3
4
import pandas as pd
CSV_FILE_PATH = './test.csv'
df = pd.read_csv(CSV_FILE_PATH, skiprows=1)
print(df.head(5))

参考文献:

https://blog.csdn.net/qq_41365597/article/details/90676249

https://blog.csdn.net/dazuo01/article/details/20841909

https://blog.csdn.net/weixin_38546295/article/details/83537558

https://www.jianshu.com/p/7ac36fafebea

Donate comment here.

欢迎关注我的其它发布渠道