python HDF5 操作－史坦利Stanley程式Maker的部落格

h5py 常用在大數據的資料儲存，相較於 csv 更快速讀寫，並且更符合Pthon 方式。

“HDF” stands for “Hierarchical Data Format”. Every object in an HDF5 file has a name, and they’re arranged in a POSIX-style hierarchy with /-separators:

最基本的操作,讀,寫,更新

Read HDF5

import h5py
filename = 'file.hdf5'
f = h5py.File(filename, 'r')

# List all groups
print("Keys: %s" % f.keys())
a_group_key = list(f.keys())[0]

# Get the data
data = list(f[a_group_key])

Write HDF5

#!/usr/bin/env python
import h5py

# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))

# Write data to HDF5
data_file = h5py.File('file.hdf5', 'w')
data_file.create_dataset('group_name', data=data_matrix)
data_file.close()

另一個讀取範例

Reading the file

import h5py

f = h5py.File(file_name, mode)

Studying the structure of the file by printing what HDF5 groups are present

for key in f.keys():
    print(key) #Names of the groups in HDF5 file.

Extracting the data

#Get the HDF5 group
group = f[key]

#Checkout what keys are inside that group.
for key in group.keys():
    print(key)

data = group[some_key_inside_the_group].value
#Do whatever you want with data

#After you are done
f.close()

另一個寫入範例

(1) 準備資料

# setting frame = np.zeros((1, 60, 80))

生成一個 dataset ，並預設此 dataset 可以彈性成長

# initialwith h5py.File("mytestfile.hdf5", "w") as f: dset = f.create_dataset('video', data=frame,maxshape=(None, 60, 80), chunks=True)

讀取這個檔案所有的 dataset，目前只有 u"video"# get key with h5py.File("mytestfile.hdf5", "r") as f: print(f.keys())

先擴增 dataset 的大小後，再塞入新的 frame

# extend datasetwith h5py.File("mytestfile.hdf5", "a") as hf:
    hf['video'].resize((hf['video'].shape[0] + 1), axis=0)
    hf['video'][-1:] = frame

官網範例及說明:

>>> import h5py
>>> f = h5py.File('mytestfile.hdf5', 'r')

The File object is your starting point. What is stored in this file? Remember h5py.File acts like a Python dictionary, thus we can check the keys,

>>> list(f.keys())
['mydataset']

Based on our observation, there is one data set, mydataset in the file. Let us examine the data set as a Dataset object

>>> dset = f['mydataset']

The object we obtained isn’t an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:

>>> dset.shape
(100,)
>>> dset.dtype
dtype('int32')

參考資料:

http://docs.h5py.org/en/latest/index.html

stanley

史坦利Stanley程式Maker的部落格

stanley 發表在痞客邦留言(0) 人氣()

E-mail轉寄

史坦利Stanley程式Maker的部落格

歡迎光臨史坦利Stanley程式Maker痞客邦的小天地