[Python] 모듈 datetime 및 pandas에서의 시계열 처리

Python

[Python] 모듈 datetime 및 pandas에서의 시계열 처리

곽가누 2023. 7. 5. 16:50

ㅁ datetime 모듈에는 import 할 수 있는 것이 두가지가 있다.

1. datetime : 기본적인 함수 사용 가능

ex. datetime.now(), datetime.year()

2. timedelta : datetime간의 산술 연산이 필요할 때 사용

from datetime import datatime
from datetime import timedelta
t1 = datetime(2023, 7, 5)
t1 + timedelta(12)

>>> datetime.datetime(2023, 6, 23)

ㅁ 데이터를 datatime 형태로 변환하고 싶을 때

data1 = '2019-08-01'
datetime.strptime(data1, '%Y-%m-%d')

data2 = '08/01/19'
datetime.strptime(data2, '%m/%d/%y)

>>> datetime.datetime(2019, 8, 1, 0, 0)

ㅁ datatime을 str타입으로 변환하고 싶을 때

datetime(2019,8,1).strftime('%Y-%m-%d')
datetime.strftime(datetime.now(), '%y.%m.%d')

>>>'2019-08-01'

>>>'23.07.01'

ㅁ Pandas에서의 활용 : 객체 생성

dates = [ datetime(2019,8,1), datetime(2019,8,3), datetime(2019,8,8)]

ts1 = pd.Series(np.arange(4), index = dates)
#np.arange 는 [0,1,2,3] 생성

ts1

>>>

2019-08-01 0

2019-08-03 1

2019-08-08 2

dtype : int32

ㅁ Pandas 에서의 활용 : 규칙이 있는 객체 생성

pd.date_range('1/1/2020', periods = 10)

>>>

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
'2020-01-09', '2020-01-10'],
dtype='datetime64[ns]', freq='D')

ts1.shift(2)

>>>

2000-03-31    0
2000-04-30    1
2000-05-31    2
2000-06-30    3
2000-07-31    4
2000-08-31    5
2000-09-30    6
2000-10-31    7
2000-11-30    8
2000-12-31    9
Freq: M, dtype: int32

ㅁ Indexing & Slicing

ts1.index[1]

>>> Timestamp('2000-02-29 00:00:00', freq='M')

ts1['2000-01-31']

>>> 0

#요따구로 입력해도 찾아준다
ts1['20000131']
ts1['01/31/2000']

>>> 0

ㅁ 추가하기

ts1['2000-02-29'] = 100
ts1

ㅁ Resampling : 특정한 기준을 가지고 주어진 시계열 데이터셋을 줄이거나 늘리는 것

1. Downsampling : 데이터의 빈도를 줄이는 행동

원본 데이터의 시간 단위가 실용적이지 않은 경우
특정 주기에 집중하는 경우
더 낮은 빈도의 데이터에 맞추는 경우 등.

dates = pd.date_range('1/1/2019', periods = 14, freq = 'T')
ts = pd.Series(np.arange(14), index = dates)
ts

>>>

2019-01-01 00:00:00     0
2019-01-01 00:01:00     1
2019-01-01 00:02:00     2
2019-01-01 00:03:00     3
2019-01-01 00:04:00     4
2019-01-01 00:05:00     5
2019-01-01 00:06:00     6
2019-01-01 00:07:00     7
2019-01-01 00:08:00     8
2019-01-01 00:09:00     9
2019-01-01 00:10:00    10
2019-01-01 00:11:00    11
2019-01-01 00:12:00    12
2019-01-01 00:13:00    13
Freq: T, dtype: int32

ts.resample('5min').apply(sum)

>>>

2019-01-01 00:00:00 10 #다음 데이터까지의 거리(0+1+2+3+4)

2019-01-01 00:05:00 35 #(5+6+7+8+9)
2019-01-01 00:10:00 46 #데이터가 13까지밖에 없으니까. (10+11+12+13)
Freq: 5T, dtype: int32

ts.resample('5min').apply(sum).shift(10, freq = '5min')

>>>

2019-01-01 00:50:00    10
2019-01-01 00:55:00    35
2019-01-01 01:00:00    46
Freq: 5T, dtype: int32

2. Upsampling

'Python' 카테고리의 다른 글

[Python] 경사도 자동 크롤링 코드 (0)	2023.07.24
[Python] Pandas에서 파일 데이터 조작 (0)	2023.07.13
[Python] VS Code 인터프리터 바꾸는 법 (아나콘다가 VS code에서 자꾸 실행될 때) (0)	2023.07.07
[Python] Beautiful Soup 과 Selenium 설치하기 (0)	2023.04.09
[Python] Class 총정리 (0)	2023.03.06

현재글[Python] 모듈 datetime 및 pandas에서의 시계열 처리

가누의 코딩로그

죽이 되든 밥이 되든

NewsAPI, tecnico lisboa, 2023소프트웨어대전, 리스본대학교, 상근타워, 11060, Python, pandas, 도서배달로봇, 점프 점프, C++, 소프트웨어융합학과, 13414, 17829, universidade de lisboa, 경희대, 3541, 단어감지프로그램, 레벨 1, 백준,

Today :
Yesterday :

가누의 코딩로그