Date / Time discrete note (DS)

Anh-Thi Dinh
ISO 8601 format
For duration:
  • P (duration, always at the beginning of the duration), Y (year), M (month), W (week), D (day), T (time designator, always precedes the time components), H (hour), M (minute), S (second).
  • Example: P3Y6M4DT12H30M5S -- a duration of three years, six months, four days, twelve hours, thirty minutes, and five seconds.
Converter,
1from datetime import datetime, timedelta
2from isodate import duration_isoformat # used only for datetime.timedelta
3
4def get_isoformat(time):
5    """
6    Convert pd.Timedelta, pd.Timestamp,
7    datetimme.datetime, datetime.time, datetime.date, datetime.timedelta
8    to isoformat
9    """
10    if not isinstance(time, timedelta):
11        return time.isoformat()
12    else:
13        return duration_isoformat(time)
To Timedelta
1# numpy.timedelta64(208206000000000,'ns') → Timedelta('2 days 09:50:06')
2pd.Timedelta(time, unit='ns')
1# DateOffsets ('14T') → Timedelta('0 days 00:14:00')
2pd.to_timedelta('14T')
1# Can't use 'T' as '1T'?
2from pandas.tseries.frequencies import to_offset
3pd.to_timedelta(to_offset('T'))
From Timedelta
1# Timedelta('0 days 00:01:20') -> 80 (s)
2# (SINGLE VALUE)
3td.total_seconds() # float
1# Timedelta('0 days 00:01:20') -> 80 (s) (FLOAT)
2# (ONLY WORK with a series, not a single value)
3series.astype('timedelta64[s]') # or 'ms'
1# '1 minutes' -> '1T'
2def timedelta_to_string(timedelta):
3    units = ['D', 'H', 'T', 'S', 'L', 'U', 'N']
4    time_format = ''
5    for i, c in enumerate(timedelta.components):
6        if c != 0: time_format += str(c) + units[i]
7    return time_format
1## EXAMPLE
2import pandas as pd
3test = pd.Timedelta('1 minutes')
4timedelta_to_string(test)
1# output
2Timedelta('0 days 00:01:00')
3'1T'
 
TimedeltaIndex differences
There is no .diff method with TimedeltaIndex, you can use,
1np.subtract(df[1:], df[:-1])
2
3# convert to hour
4np.subtract(df[1:], df[:-1]) / pd.Timedelta('1 hour')

Compare/Make arithmetic different frequency strings

We wanna compare 150S (150 seconds) with 1T (1 minutes).
1import pandas as pd
2pd.to_timedelta('150S') > pd.to_timedelta('1T')
3pd.to_timedelta('120S') == pd.to_timedelta('1T')
4pd.to_timedelta('120S') == pd.to_timedelta('2T')
1# output
2True
3False
4True
Timestamps
1from datetime import datetime
1# to same timezone (UTC, +0)
2df['timestamp'] = pd.to_datetime(df['timestamp'], utc=True, infer_datetime_format=True, cache=True)
1# UTC+0 to UNIX timestamp (POSIX timestamp)
2df['timestamp'] = df['timestamp'].apply(lambda x: int(datetime.timestamp(x)*1000)) # miliseconds
3
4# unix timestamp to Timestamps
5datetime.fromtimestamp(unix_ts//1000)
1# UNIX float (ms) -> datetime64
2df['timestamp'] = df['timestamp'].astype('datetime64[ms]')
3# change `ms` with others, e.g. `ns` for nanosecond
1# remove timezone
2df['time'].dt.tz_localize(None)
Get info timestamps
1def set_index(data, col_time):
2    """
3    Make a copy of a time-series dataframe `df` and set the column-time be the
4    index of the dataframe.
5    In the case index has no name, we set it as `'index'`.
6    """
7    df0 = data.copy()
8    if col_time != 'index': # col_time is not the index
9        df0 = df0.set_index(col_time)
10    else:
11        if df0.index.name is None:
12            df0.index.name = 'index'
13    return df0
1def get_info_timestamps(df, col_date='index'):
2    # make sure timestamps are on index
3    df = set_index(df, col_date)
4    index_name = df.index.name
5    df = df.reset_index()
6    print('Time range: ', df[index_name].max() - df[index_name].min())
7    print('Number of different time steps: ', df[index_name].diff().value_counts().count())
8    print('Max time step: ', df[index_name].diff().max())
9    print('Min time step: ', df[index_name].diff().min())
10    print('The most popular time step: ', df[index_name].diff().value_counts().index[0])
11    print('timestamps are monotonic increasing? ', df[index_name].is_monotonic)
12    print('Are there duplicate timestamps? ', df[index_name].duplicated().any())
13    print('How many unique duplicates? ', df[index_name].duplicated().sum(), ' (in total ',df.shape[0], ')')
14    print('How many repeated duplicates? ', df[index_name].duplicated(keep=False).sum(), ' (in total ',df.shape[0], ')')
Check timestamps are well sorted?
1# CHECK
2df.date.is_monotonic # monotonic increasing?
3df.date.is_monotonic_decreasing # decreasing?
4
5# if using groupby
6def check_monotonic(group):
7    return group.is_monotonic
8df.groupby('label').agg({'timestamp': [check_monotonic] })
1# ARRANGE THEM
2df.sort_values(by='date', inplace=True)
Detect time series frequency
Find the different time steps in a datetime columns,
1# count the number of elements for each time steps
2df.date.diff().value_counts()
3
4# count number of different time steps
5df.date.diff().value_counts().count()
6
7# take the index of the largest
8df.date.diff().value_counts().index[0]
9
10# take the index of the smallest
11df.date.diff().value_counts().index[-1]
1# output
200:01:00    11
300:03:00     2
400:02:00     1
500:04:00     1
6Name: date, dtype: int64
7
84
9
10Timedelta('0 days 00:01:00')
11
12Timedelta('0 days 00:04:00')
One can couple with function timedelta_to_string in the previous section to find out the most-appeared time steps to feed into df.resample()'s rule.
List of resampling rules
Official ref here — search "DateOffsets" to jump to the table.
1B         business day frequency
2C         custom business day frequency (experimental)`
3D         calendar day frequency
4W         weekly frequency
5M         month end frequency
6SM        semi-month end frequency (15th and end of month)
7BM        business month end frequency
8CBM       custom business month end frequency
9MS        month start frequency
10SMS       semi-month start frequency (1st and 15th)
11BMS       business month start frequency
12CBMS      custom business month start frequency
13Q         quarter end frequency
14BQ        business quarter endfrequency
15QS        quarter start frequency
16BQS       business quarter start frequency
17A         year end frequency
18BA, BY    business year end frequency
19AS, YS    year start frequency
20BAS, BYS  business year start frequency
21BH        business hour frequency
22H         hourly frequency
23T, min    minutely frequency
24S         secondly frequency
25L, ms     milliseconds
26U, us     microseconds
27N         nanoseconds