ISO 8601 format
For duration:
P
(duration, always at the beginning of the duration),Y
(year),M
(month),W
(week),D
(day),T
(time designator, always precedes the time components),H
(hour),M
(minute),S
(second).
- Example:
P3Y6M4DT12H30M5S
-- a duration of three years, six months, four days, twelve hours, thirty minutes, and five seconds.
Converter,
1from datetime import datetime, timedelta
2from isodate import duration_isoformat # used only for datetime.timedelta
3
4def get_isoformat(time):
5 """
6 Convert pd.Timedelta, pd.Timestamp,
7 datetimme.datetime, datetime.time, datetime.date, datetime.timedelta
8 to isoformat
9 """
10 if not isinstance(time, timedelta):
11 return time.isoformat()
12 else:
13 return duration_isoformat(time)
To
Timedelta
1# numpy.timedelta64(208206000000000,'ns') → Timedelta('2 days 09:50:06')
2pd.Timedelta(time, unit='ns')
1# DateOffsets ('14T') → Timedelta('0 days 00:14:00')
2pd.to_timedelta('14T')
1# Can't use 'T' as '1T'?
2from pandas.tseries.frequencies import to_offset
3pd.to_timedelta(to_offset('T'))
From
Timedelta
1# Timedelta('0 days 00:01:20') -> 80 (s)
2# (SINGLE VALUE)
3td.total_seconds() # float
1# Timedelta('0 days 00:01:20') -> 80 (s) (FLOAT)
2# (ONLY WORK with a series, not a single value)
3series.astype('timedelta64[s]') # or 'ms'
1# '1 minutes' -> '1T'
2def timedelta_to_string(timedelta):
3 units = ['D', 'H', 'T', 'S', 'L', 'U', 'N']
4 time_format = ''
5 for i, c in enumerate(timedelta.components):
6 if c != 0: time_format += str(c) + units[i]
7 return time_format
1## EXAMPLE
2import pandas as pd
3test = pd.Timedelta('1 minutes')
4timedelta_to_string(test)
1# output
2Timedelta('0 days 00:01:00')
3'1T'
TimedeltaIndex
differencesThere is no
.diff
method with TimedeltaIndex
, you can use,1np.subtract(df[1:], df[:-1])
2
3# convert to hour
4np.subtract(df[1:], df[:-1]) / pd.Timedelta('1 hour')
We wanna compare
150S
(150 seconds) with 1T
(1 minutes).1import pandas as pd
2pd.to_timedelta('150S') > pd.to_timedelta('1T')
3pd.to_timedelta('120S') == pd.to_timedelta('1T')
4pd.to_timedelta('120S') == pd.to_timedelta('2T')
1# output
2True
3False
4True
Timestamps
1from datetime import datetime
1# to same timezone (UTC, +0)
2df['timestamp'] = pd.to_datetime(df['timestamp'], utc=True, infer_datetime_format=True, cache=True)
1# UTC+0 to UNIX timestamp (POSIX timestamp)
2df['timestamp'] = df['timestamp'].apply(lambda x: int(datetime.timestamp(x)*1000)) # miliseconds
3
4# unix timestamp to Timestamps
5datetime.fromtimestamp(unix_ts//1000)
1# UNIX float (ms) -> datetime64
2df['timestamp'] = df['timestamp'].astype('datetime64[ms]')
3# change `ms` with others, e.g. `ns` for nanosecond
1# remove timezone
2df['time'].dt.tz_localize(None)
Get info timestamps
1def set_index(data, col_time):
2 """
3 Make a copy of a time-series dataframe `df` and set the column-time be the
4 index of the dataframe.
5 In the case index has no name, we set it as `'index'`.
6 """
7 df0 = data.copy()
8 if col_time != 'index': # col_time is not the index
9 df0 = df0.set_index(col_time)
10 else:
11 if df0.index.name is None:
12 df0.index.name = 'index'
13 return df0
1def get_info_timestamps(df, col_date='index'):
2 # make sure timestamps are on index
3 df = set_index(df, col_date)
4 index_name = df.index.name
5 df = df.reset_index()
6 print('Time range: ', df[index_name].max() - df[index_name].min())
7 print('Number of different time steps: ', df[index_name].diff().value_counts().count())
8 print('Max time step: ', df[index_name].diff().max())
9 print('Min time step: ', df[index_name].diff().min())
10 print('The most popular time step: ', df[index_name].diff().value_counts().index[0])
11 print('timestamps are monotonic increasing? ', df[index_name].is_monotonic)
12 print('Are there duplicate timestamps? ', df[index_name].duplicated().any())
13 print('How many unique duplicates? ', df[index_name].duplicated().sum(), ' (in total ',df.shape[0], ')')
14 print('How many repeated duplicates? ', df[index_name].duplicated(keep=False).sum(), ' (in total ',df.shape[0], ')')
Check timestamps are well sorted?
1# CHECK
2df.date.is_monotonic # monotonic increasing?
3df.date.is_monotonic_decreasing # decreasing?
4
5# if using groupby
6def check_monotonic(group):
7 return group.is_monotonic
8df.groupby('label').agg({'timestamp': [check_monotonic] })
1# ARRANGE THEM
2df.sort_values(by='date', inplace=True)
Detect time series frequency
Find the different time steps in a datetime columns,
1# count the number of elements for each time steps
2df.date.diff().value_counts()
3
4# count number of different time steps
5df.date.diff().value_counts().count()
6
7# take the index of the largest
8df.date.diff().value_counts().index[0]
9
10# take the index of the smallest
11df.date.diff().value_counts().index[-1]
1# output
200:01:00 11
300:03:00 2
400:02:00 1
500:04:00 1
6Name: date, dtype: int64
7
84
9
10Timedelta('0 days 00:01:00')
11
12Timedelta('0 days 00:04:00')
One can couple with function
timedelta_to_string
in the previous section to find out the most-appeared time steps to feed into df.resample()
's rule
.List of resampling rules
Official ref here — search "DateOffsets" to jump to the table.
1B business day frequency
2C custom business day frequency (experimental)`
3D calendar day frequency
4W weekly frequency
5M month end frequency
6SM semi-month end frequency (15th and end of month)
7BM business month end frequency
8CBM custom business month end frequency
9MS month start frequency
10SMS semi-month start frequency (1st and 15th)
11BMS business month start frequency
12CBMS custom business month start frequency
13Q quarter end frequency
14BQ business quarter endfrequency
15QS quarter start frequency
16BQS business quarter start frequency
17A year end frequency
18BA, BY business year end frequency
19AS, YS year start frequency
20BAS, BYS business year start frequency
21BH business hour frequency
22H hourly frequency
23T, min minutely frequency
24S secondly frequency
25L, ms milliseconds
26U, us microseconds
27N nanoseconds