h5py 常用在大數據的資料儲存,相較於 csv 更快速讀寫,並且更符合Pthon 方式。
“HDF” stands for “Hierarchical Data Format”. Every object in an HDF5 file has a name, and they’re arranged in a POSIX-style hierarchy with /
-separators:
最基本的操作,讀,寫,更新
Read HDF5
import h5py
filename = 'file.hdf5'
f = h5py.File(filename, 'r')
# List all groups
print("Keys: %s" % f.keys())
a_group_key = list(f.keys())[0]
# Get the data
data = list(f[a_group_key])
Write HDF5
#!/usr/bin/env python
import h5py
# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))
# Write data to HDF5
data_file = h5py.File('file.hdf5', 'w')
data_file.create_dataset('group_name', data=data_matrix)
data_file.close()
另一個讀取範例
Reading the file
import h5py
f = h5py.File(file_name, mode)
Studying the structure of the file by printing what HDF5 groups are present
for key in f.keys():
print(key) #Names of the groups in HDF5 file.
Extracting the data
#Get the HDF5 group
group = f[key]
#Checkout what keys are inside that group.
for key in group.keys():
print(key)
data = group[some_key_inside_the_group].value
#Do whatever you want with data
#After you are done
f.close()
另一個寫入範例
(1) 準備資料
# setting frame = np.zeros((1, 60, 80))
生成一個 dataset ,並預設此 dataset 可以彈性成長
# initialwith h5py.File("mytestfile.hdf5", "w") as f: dset = f.create_dataset('video', data=frame,maxshape=(None, 60, 80), chunks=True)
讀取這個檔案所有的 dataset,目前只有 u"video"
# get key with h5py.File("mytestfile.hdf5", "r") as f: print(f.keys())
先擴增 dataset 的大小後,再塞入新的 frame
# extend datasetwith h5py.File("mytestfile.hdf5", "a") as hf:
hf['video'].resize((hf['video'].shape[0] + 1), axis=0)
hf['video'][-1:] = frame
官網範例及說明:
>>> import h5py
>>> f = h5py.File('mytestfile.hdf5', 'r')
The File object is your starting point. What is stored in this file? Remember h5py.File
acts like a Python dictionary, thus we can check the keys,
>>> list(f.keys())
['mydataset']
Based on our observation, there is one data set, mydataset
in the file. Let us examine the data set as a Dataset object
>>> dset = f['mydataset']
The object we obtained isn’t an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:
>>> dset.shape
(100,)
>>> dset.dtype
dtype('int32')
參考資料:
http://docs.h5py.org/en/latest/index.html
留言列表