亚洲高清中文字幕一区二区三区,久久精品国产免费,www.欧美激情

01設置環(huán)境

首先，讓我們安裝必要的庫: Transformers、Datasets、Evaluate、Accelerate 和 GluonTS。

正如我們將展示的那樣，GluonTS 將用于轉(zhuǎn)換數(shù)據(jù)以創(chuàng)建特征以及創(chuàng)建適當?shù)挠柧?、驗證和測試批次。

!pip install -q transformers

!pip install -q datasets

!pip install -q evaluate

!pip install -q accelerate

!pip install -q gluonts ujson

02加載數(shù)據(jù)集

在這篇博文中，我們將使用 Hugging Face Hub 上提供的 tourism_monthly 數(shù)據(jù)集。該數(shù)據(jù)集包含澳大利亞 366 個地區(qū)的每月旅游流量。

此數(shù)據(jù)集是 Monash Time Series Forecasting 存儲庫的一部分，該存儲庫收納了是來自多個領域的時間序列數(shù)據(jù)集。它可以看作是時間序列預測的 GLUE 基準。

from datasets import load_dataset

dataset = load_dataset("monash_tsf", "tourism_monthly")

可以看出，數(shù)據(jù)集包含 3 個片段: 訓練、驗證和測試。

dataset

DatasetDict({

        train: Dataset({

            features: ['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'],

            num_rows: 366

        })

        test: Dataset({

            features: ['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'],

            num_rows: 366

        })

        validation: Dataset({

            features: ['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'],

            num_rows: 366

        })

    })

每個示例都包含一些鍵，其中 start 和 target 是最重要的鍵。讓我們看一下數(shù)據(jù)集中的第一個時間序列:

train_example = dataset['train'][0]

train_example.keys()



dict_keys(['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'])

start 僅指示時間序列的開始 (類型為 datetime) ，而 target 包含時間序列的實際值。

start 將有助于將時間相關的特征添加到時間序列值中，作為模型的額外輸入 (例如“一年中的月份”) 。因為我們已經(jīng)知道數(shù)據(jù)的頻率是?每月，所以也能推算第二個值的時間戳為 1979-02-01，等等。

print(train_example['start'])

print(train_example['target'])

1979-01-01 00:00:00

    [1149.8699951171875, 1053.8001708984375, ..., 5772.876953125]

驗證集包含與訓練集相同的數(shù)據(jù)，只是數(shù)據(jù)時間范圍延長了 prediction_length 那么多。這使我們能夠根據(jù)真實情況驗證模型的預測。

與驗證集相比，測試集還是比驗證集多包含 prediction_length 時間的數(shù)據(jù) (或者使用比訓練集多出數(shù)個 prediction_length 時長數(shù)據(jù)的測試集，實現(xiàn)在多重滾動窗口上的測試任務)。

validation_example = dataset['validation'][0]

validation_example.keys()



dict_keys(['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'])

驗證的初始值與相應的訓練示例完全相同：

print(validation_example['start'])

print(validation_example['target'])



1979-01-01 00:00:00

    [1149.8699951171875, 1053.8001708984375, ..., 5985.830078125]

但是，與訓練示例相比，此示例具有 prediction_length=24 個額外的數(shù)據(jù)。讓我們驗證一下。

freq = "1M"

prediction_length = 24



assert len(train_example["target"]) + prediction_length == len(

    validation_example["target"]

)

讓我們可視化一下：

import matplotlib.pyplot as plt



figure, axes = plt.subplots()

axes.plot(train_example["target"], color="blue")

axes.plot(validation_example["target"], color="red", alpha=0.5)



plt.show()

03將 start 更新為 pd.Period

我們要做的第一件事是根據(jù)數(shù)據(jù)的?freq?值將每個時間序列的?start?特征轉(zhuǎn)換為 pandas 的?Period?索引:

from functools import lru_cache



import pandas as pd

import numpy as np



@lru_cache(10_000)

def convert_to_pandas_period(date, freq):

    return pd.Period(date, freq)



def transform_start_field(batch, freq):

    batch["start"] = [convert_to_pandas_period(date, freq) for date in batch["start"]]

    return batch

這里我們使用?datasets?的?set_transform?來實現(xiàn):

from functools import partial



train_dataset.set_transform(partial(transform_start_field, freq=freq))

test_dataset.set_transform(partial(transform_start_field, freq=freq))

定義模型

接下來，讓我們實例化一個模型。該模型將從頭開始訓練，因此我們不使用 from_pretrained 方法，而是從 config 中隨機初始化模型。

我們?yōu)槟Ｐ椭付藥讉€附加參數(shù):

prediction_length (在我們的例子中是 24 個月) : 這是 Transformer 的解碼器將學習預測的范圍;
context_length: 如果未指定 context_length，模型會將 context_length (編碼器的輸入) 設置為等于 prediction_length;
給定頻率的 lags(滯后): 這將決定模型“回頭看”的程度，也會作為附加特征。例如對于 Daily 頻率，我們可能會考慮回顧 [1, 2, 7, 30, …]，也就是回顧 1、2……天的數(shù)據(jù)，而對于 Minute數(shù)據(jù)，我們可能會考慮 [1, 30, 60, 60*24, …] 等;
時間特征的數(shù)量: 在我們的例子中設置為 2，因為我們將添加 MonthOfYear 和 Age 特征;
靜態(tài)類別型特征的數(shù)量: 在我們的例子中，這將只是 1，因為我們將添加一個“時間序列 ID”特征;
基數(shù): 將每個靜態(tài)類別型特征的值的數(shù)量構成一個列表，對于本例來說將是 [366]，因為我們有 366 個不同的時間序列;
嵌入維度: 每個靜態(tài)類別型特征的嵌入維度，也是構成列表。例如 [3] 意味著模型將為每個 366 時間序列 (區(qū)域) 學習大小為 3 的嵌入向量。

讓我們使用 GluonTS 為給定頻率 (“每月”) 提供的默認滯后值:

from gluonts.time_feature import get_lags_for_frequency



lags_sequence = get_lags_for_frequency(freq)

print(lags_sequence)



>>> [1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 23, 24, 25, 35, 36, 37]

這意味著我們每個時間步將回顧長達 37 個月的數(shù)據(jù)，作為附加特征。我們還檢查 GluonTS 為我們提供的默認時間特征:

from gluonts.time_feature import time_features_from_frequency_str



time_features = time_features_from_frequency_str(freq)

print(time_features)



>>> [<function month_of_year at 0x7fa496d0ca70>]

在這種情況下，只有一個特征，即“一年中的月份”。這意味著對于每個時間步長，我們將添加月份作為標量值 (例如，如果時間戳為 “january”，則為 1；如果時間戳為 “february”，則為 2，等等) 。

我們現(xiàn)在準備好定義模型需要的所有內(nèi)容了:

from transformers import TimeSeriesTransformerConfig, TimeSeriesTransformerForPrediction



config = TimeSeriesTransformerConfig(

    prediction_length=prediction_length,

    # context length:

    context_length=prediction_length * 2,

    # lags coming from helper given the freq:

    lags_sequence=lags_sequence,

    # we'll add 2 time features ("month of year" and "age", see further):

    num_time_features=len(time_features) + 1,

    # we have a single static categorical feature, namely time series ID:

    num_static_categorical_features=1,

    # it has 366 possible values:

    cardinality=[len(train_dataset)],

    # the model will learn an embedding of size 2 for each of the 366 possible values:

    embedding_dimension=[2],



    # transformer params:

    encoder_layers=4,

    decoder_layers=4,

    d_model=32,

)



model = TimeSeriesTransformerForPrediction(config)

請注意，與 Transformers 庫中的其他模型類似，TimeSeriesTransformerModel 對應于沒有任何頂部前置頭的編碼器-解碼器 Transformer，而 TimeSeriesTransformerForPrediction 對應于頂部有一個分布前置頭 (distribution head) 的 TimeSeriesTransformerForPrediction。默認情況下，該模型使用 Student-t 分布 (也可以自行配置):

model.config.distribution_output



>>> student_t

這是具體實現(xiàn)層面與用于 NLP 的 Transformers 的一個重要區(qū)別，其中頭部通常由一個固定的分類分布組成，實現(xiàn)為 nn.Linear 層。

定義轉(zhuǎn)換

接下來，我們定義數(shù)據(jù)的轉(zhuǎn)換，尤其是需要基于樣本數(shù)據(jù)集或通用數(shù)據(jù)集來創(chuàng)建其中的時間特征。

同樣，我們用到了 GluonTS 庫。這里定義了一個 Chain (有點類似于圖像訓練的 torchvision.transforms.Compose) 。它允許我們將多個轉(zhuǎn)換組合到一個流水線中。

from gluonts.time_feature import (

    time_features_from_frequency_str,

    TimeFeature,

    get_lags_for_frequency,

)

from gluonts.dataset.field_names import FieldName

from gluonts.transform import (

    AddAgeFeature,

    AddObservedValuesIndicator,

    AddTimeFeatures,

    AsNumpyArray,

    Chain,

    ExpectedNumInstanceSampler,

    InstanceSplitter,

    RemoveFields,

    SelectFields,

    SetField,

    TestSplitSampler,

    Transformation,

    ValidationSplitSampler,

    VstackFeatures,

    RenameFields,

)

下面的轉(zhuǎn)換代碼帶有注釋供大家查看具體的操作步驟。從全局來說，我們將迭代數(shù)據(jù)集的各個時間序列并添加、刪除某些字段或特征:

from transformers import PretrainedConfig



def create_transformation(freq: str, config: PretrainedConfig) -> Transformation:

    remove_field_names = []

    if config.num_static_real_features == 0:

        remove_field_names.append(FieldName.FEAT_STATIC_REAL)

    if config.num_dynamic_real_features == 0:

        remove_field_names.append(FieldName.FEAT_DYNAMIC_REAL)

    if config.num_static_categorical_features == 0:

        remove_field_names.append(FieldName.FEAT_STATIC_CAT)



    # a bit like torchvision.transforms.Compose

    return Chain(

        # step 1: remove static/dynamic fields if not specified

        [RemoveFields(field_names=remove_field_names)]

        # step 2: convert the data to NumPy (potentially not needed)

        + (

            [

                AsNumpyArray(

                    field=FieldName.FEAT_STATIC_CAT,

                    expected_ndim=1,

                    dtype=int,

                )

            ]

            if config.num_static_categorical_features > 0

            else []

        )

        + (

            [

                AsNumpyArray(

                    field=FieldName.FEAT_STATIC_REAL,

                    expected_ndim=1,

                )

            ]

            if config.num_static_real_features > 0

            else []

        )

        + [

            AsNumpyArray(

                field=FieldName.TARGET,

                # we expect an extra dim for the multivariate case:

                expected_ndim=1 if config.input_size == 1 else 2,

            ),

            # step 3: handle the NaN's by filling in the target with zero

            # and return the mask (which is in the observed values)

            # true for observed values, false for nan's

            # the decoder uses this mask (no loss is incurred for unobserved values)

            # see loss_weights inside the xxxForPrediction model

            AddObservedValuesIndicator(

                target_field=FieldName.TARGET,

                output_field=FieldName.OBSERVED_VALUES,

            ),

            # step 4: add temporal features based on freq of the dataset

            # month of year in the case when freq="M"

            # these serve as positional encodings

            AddTimeFeatures(

                start_field=FieldName.START,

                target_field=FieldName.TARGET,

                output_field=FieldName.FEAT_TIME,

                time_features=time_features_from_frequency_str(freq),

                pred_length=config.prediction_length,

            ),

            # step 5: add another temporal feature (just a single number)

            # tells the model where in its life the value of the time series is,

            # sort of a running counter

            AddAgeFeature(

                target_field=FieldName.TARGET,

                output_field=FieldName.FEAT_AGE,

                pred_length=config.prediction_length,

                log_scale=True,

            ),

            # step 6: vertically stack all the temporal features into the key FEAT_TIME

            VstackFeatures(

                output_field=FieldName.FEAT_TIME,

                input_fields=[FieldName.FEAT_TIME, FieldName.FEAT_AGE]

                + (

                    [FieldName.FEAT_DYNAMIC_REAL]

                    if config.num_dynamic_real_features > 0

                    else []

                ),

            ),

            # step 7: rename to match HuggingFace names

            RenameFields(

                mapping={

                    FieldName.FEAT_STATIC_CAT: "static_categorical_features",

                    FieldName.FEAT_STATIC_REAL: "static_real_features",

                    FieldName.FEAT_TIME: "time_features",

                    FieldName.TARGET: "values",

                    FieldName.OBSERVED_VALUES: "observed_mask",

                }

            ),

        ]

    )

InstanceSplitter

對于訓練、驗證、測試步驟，接下來我們創(chuàng)建一個 InstanceSplitter，用于從數(shù)據(jù)集中對窗口進行采樣 (因為由于時間和內(nèi)存限制，我們無法將整個歷史值傳遞給 Transformer)。

實例拆分器從數(shù)據(jù)中隨機采樣大小為 context_length 和后續(xù)大小為 prediction_length 的窗口，并將 past_?或 future_?鍵附加到各個窗口的任何臨時鍵。這確保了 values 被拆分為 past_values 和后續(xù)的 future_values 鍵，它們將分別用作編碼器和解碼器的輸入。同樣我們還需要修改 time_series_fields 參數(shù)中的所有鍵:

from gluonts.transform.sampler import InstanceSampler

from typing import Optional



def create_instance_splitter(

    config: PretrainedConfig,

    mode: str,

    train_sampler: Optional[InstanceSampler] = None,

    validation_sampler: Optional[InstanceSampler] = None,

) -> Transformation:

    assert mode in ["train", "validation", "test"]



    instance_sampler = {

        "train": train_sampler

        or ExpectedNumInstanceSampler(

            num_instances=1.0, min_future=config.prediction_length

        ),

        "validation": validation_sampler

        or ValidationSplitSampler(min_future=config.prediction_length),

        "test": TestSplitSampler(),

    }[mode]



    return InstanceSplitter(

        target_field="values",

        is_pad_field=FieldName.IS_PAD,

        start_field=FieldName.START,

        forecast_start_field=FieldName.FORECAST_START,

        instance_sampler=instance_sampler,

        past_length=config.context_length + max(config.lags_sequence),

        future_length=config.prediction_length,

        time_series_fields=["time_features", "observed_mask"],

    )

創(chuàng)建 DataLoader

有了數(shù)據(jù)，下一步需要創(chuàng)建 PyTorch DataLoaders。它允許我們批量處理成對的 (輸入, 輸出) 數(shù)據(jù)，即 (past_values, future_values)。

from typing import Iterable



import torch

from gluonts.itertools import Cached, Cyclic

from gluonts.dataset.loader import as_stacked_batches



def create_train_dataloader(

    config: PretrainedConfig,

    freq,

    data,

    batch_size: int,

    num_batches_per_epoch: int,

    shuffle_buffer_length: Optional[int] = None,

    cache_data: bool = True,

    **kwargs,

) -> Iterable:

    PREDICTION_INPUT_NAMES = [

        "past_time_features",

        "past_values",

        "past_observed_mask",

        "future_time_features",

    ]

    if config.num_static_categorical_features > 0:

        PREDICTION_INPUT_NAMES.append("static_categorical_features")



    if config.num_static_real_features > 0:

        PREDICTION_INPUT_NAMES.append("static_real_features")



    TRAINING_INPUT_NAMES = PREDICTION_INPUT_NAMES + [

        "future_values",

        "future_observed_mask",

    ]



    transformation = create_transformation(freq, config)

    transformed_data = transformation.apply(data, is_train=True)

    if cache_data:

        transformed_data = Cached(transformed_data)



    # we initialize a Training instance

    instance_splitter = create_instance_splitter(config, "train")



    # the instance splitter will sample a window of

    # context length + lags + prediction length (from the 366 possible transformed time series)

    # randomly from within the target time series and return an iterator.

    stream = Cyclic(transformed_data).stream()

    training_instances = instance_splitter.apply(

        stream, is_train=True

    )



    return as_stacked_batches(

        training_instances,

        batch_size=batch_size,

        shuffle_buffer_length=shuffle_buffer_length,

        field_names=TRAINING_INPUT_NAMES,

        output_type=torch.tensor,

        num_batches_per_epoch=num_batches_per_epoch,

    )

def create_test_dataloader(

    config: PretrainedConfig,

    freq,

    data,

    batch_size: int,

    **kwargs,

):

    PREDICTION_INPUT_NAMES = [

        "past_time_features",

        "past_values",

        "past_observed_mask",

        "future_time_features",

    ]

    if config.num_static_categorical_features > 0:

        PREDICTION_INPUT_NAMES.append("static_categorical_features")



    if config.num_static_real_features > 0:

        PREDICTION_INPUT_NAMES.append("static_real_features")



    transformation = create_transformation(freq, config)

    transformed_data = transformation.apply(data, is_train=False)



    # we create a Test Instance splitter which will sample the very last

    # context window seen during training only for the encoder.

    instance_sampler = create_instance_splitter(config, "test")



    # we apply the transformations in test mode

    testing_instances = instance_sampler.apply(transformed_data, is_train=False)



    return as_stacked_batches(

        testing_instances,

        batch_size=batch_size,

        output_type=torch.tensor,

        field_names=PREDICTION_INPUT_NAMES,

    )

train_dataloader = create_train_dataloader(

    config=config,

    freq=freq,

    data=train_dataset,

    batch_size=256,

    num_batches_per_epoch=100,

)



test_dataloader = create_test_dataloader(

    config=config,

    freq=freq,

    data=test_dataset,

    batch_size=64,

)

讓我們檢查第一批:

batch = next(iter(train_dataloader))

for k, v in batch.items():

    print(k, v.shape, v.type())



>>> past_time_features torch.Size([256, 85, 2]) torch.FloatTensor

    past_values torch.Size([256, 85]) torch.FloatTensor

    past_observed_mask torch.Size([256, 85]) torch.FloatTensor

    future_time_features torch.Size([256, 24, 2]) torch.FloatTensor

    static_categorical_features torch.Size([256, 1]) torch.LongTensor

    future_values torch.Size([256, 24]) torch.FloatTensor

    future_observed_mask torch.Size([256, 24]) torch.FloatTensor

可以看出，我們沒有將 input_ids 和 attention_mask 提供給編碼器 (訓練 NLP 模型時也是這種情況)，而是提供 past_values，以及 past_observed_mask、past_time_features、static_categorical_features 和 static_real_features 幾項數(shù)據(jù)。

解碼器的輸入包括 future_values、future_observed_mask 和 future_time_features。future_values 可以看作等同于 NLP 訓練中的 decoder_input_ids。

前向傳播

讓我們對剛剛創(chuàng)建的批次執(zhí)行一次前向傳播:

# perform forward pass

outputs = model(

    past_values=batch["past_values"],

    past_time_features=batch["past_time_features"],

    past_observed_mask=batch["past_observed_mask"],

    static_categorical_features=batch["static_categorical_features"]

    if config.num_static_categorical_features > 0

    else None,

    static_real_features=batch["static_real_features"]

    if config.num_static_real_features > 0

    else None,

    future_values=batch["future_values"],

    future_time_features=batch["future_time_features"],

    future_observed_mask=batch["future_observed_mask"],

    output_hidden_states=True,

)

print("Loss:", outputs.loss.item())



>>> Loss: 9.069628715515137

目前，該模型返回了損失值。這是由于解碼器會自動將 future_values 向右移動一個位置以獲得標簽。這允許計算預測結果和標簽值之間的誤差。

另請注意，解碼器使用 Causal Mask 來避免預測未來，因為它需要預測的值在 future_values 張量中。

訓練模型

是時候訓練模型了！我們將使用標準的 PyTorch 訓練循環(huán)。

這里我們用到了 Accelerate 庫，它會自動將模型、優(yōu)化器和數(shù)據(jù)加載器放置在適當?shù)?device 上。

from accelerate import Accelerator

from torch.optim import AdamW



accelerator = Accelerator()

device = accelerator.device



model.to(device)

optimizer = AdamW(model.parameters(), lr=6e-4, betas=(0.9, 0.95), weight_decay=1e-1)



model, optimizer, train_dataloader = accelerator.prepare(

    model,

    optimizer,

    train_dataloader,

)



model.train()

for epoch in range(40):

    for idx, batch in enumerate(train_dataloader):

        optimizer.zero_grad()

        outputs = model(

            static_categorical_features=batch["static_categorical_features"].to(device)

            if config.num_static_categorical_features > 0

            else None,

            static_real_features=batch["static_real_features"].to(device)

            if config.num_static_real_features > 0

            else None,

            past_time_features=batch["past_time_features"].to(device),

            past_values=batch["past_values"].to(device),

            future_time_features=batch["future_time_features"].to(device),

            future_values=batch["future_values"].to(device),

            past_observed_mask=batch["past_observed_mask"].to(device),

            future_observed_mask=batch["future_observed_mask"].to(device),

        )

        loss = outputs.loss



        # Backpropagation

        accelerator.backward(loss)

        optimizer.step()



        if idx % 100 == 0:

            print(loss.item())

模型推理

在推理時，建議使用 generate()?方法進行自回歸生成，類似于 NLP 模型。

預測的過程會從測試實例采樣器中獲得數(shù)據(jù)。采樣器會將數(shù)據(jù)集的每個時間序列的最后 context_length 那么長時間的數(shù)據(jù)采樣出來，然后輸入模型。請注意，這里需要把提前已知的 future_time_features 傳遞給解碼器。

該模型將從預測分布中自回歸采樣一定數(shù)量的值，并將它們傳回解碼器最終得到預測輸出:

model.eval()



forecasts = []



for batch in test_dataloader:

    outputs = model.generate(

        static_categorical_features=batch["static_categorical_features"].to(device)

        if config.num_static_categorical_features > 0

        else None,

        static_real_features=batch["static_real_features"].to(device)

        if config.num_static_real_features > 0

        else None,

        past_time_features=batch["past_time_features"].to(device),

        past_values=batch["past_values"].to(device),

        future_time_features=batch["future_time_features"].to(device),

        past_observed_mask=batch["past_observed_mask"].to(device),

    )

    forecasts.append(outputs.sequences.cpu().numpy())

該模型輸出一個表示結構的張量 (batch_size, number of samples, prediction length)。

下面的輸出說明: 對于大小為?64?的批次中的每個示例，我們將獲得接下來?24?個月內(nèi)的?100?個可能的值:

forecasts[0].shape



>>> (64, 100, 24)

我們將垂直堆疊它們，以獲得測試數(shù)據(jù)集中所有時間序列的預測:

forecasts = np.vstack(forecasts)

print(forecasts.shape)



>>> (366, 100, 24)

我們可以根據(jù)測試集中存在的樣本值，根據(jù)真實情況評估生成的預測。這里我們使用數(shù)據(jù)集中的每個時間序列的 MASE 和 sMAPE 指標 (metrics) 來評估:

from evaluate import load

from gluonts.time_feature import get_seasonality



mase_metric = load("evaluate-metric/mase")

smape_metric = load("evaluate-metric/smape")



forecast_median = np.median(forecasts, 1)



mase_metrics = []

smape_metrics = []

for item_id, ts in enumerate(test_dataset):

    training_data = ts["target"][:-prediction_length]

    ground_truth = ts["target"][-prediction_length:]

    mase = mase_metric.compute(

        predictions=forecast_median[item_id], 

        references=np.array(ground_truth), 

        training=np.array(training_data), 

        periodicity=get_seasonality(freq))

    mase_metrics.append(mase["mase"])



    smape = smape_metric.compute(

        predictions=forecast_median[item_id], 

        references=np.array(ground_truth), 

    )

    smape_metrics.append(smape["smape"])

print(f"MASE: {np.mean(mase_metrics)}")



>>> MASE: 1.2564196892177717



print(f"sMAPE: {np.mean(smape_metrics)}")



>>> sMAPE: 0.1609541520852549

我們還可以單獨繪制數(shù)據(jù)集中每個時間序列的結果指標，并觀察到其中少數(shù)時間序列對最終測試指標的影響很大:

plt.scatter(mase_metrics, smape_metrics, alpha=0.3)

plt.xlabel("MASE")

plt.ylabel("sMAPE")

plt.show()

為了根據(jù)基本事實測試數(shù)據(jù)繪制任何時間序列的預測，我們定義了以下輔助繪圖函數(shù):

index = pd.period_range(

        start=test_dataset[ts_index][FieldName.START],

        periods=len(test_dataset[ts_index][FieldName.TARGET]),

        freq=freq,

    ).to_timestamp()



    # Major ticks every half year, minor ticks every month,

    ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=(1, 7)))

    ax.xaxis.set_minor_locator(mdates.MonthLocator())



    ax.plot(

        index[-2*prediction_length:], 

        test_dataset[ts_index]["target"][-2*prediction_length:],

        label="actual",

    )



    plt.plot(

        index[-prediction_length:], 

        np.median(forecasts[ts_index], axis=0),

        label="median",

    )



    plt.fill_between(

        index[-prediction_length:],

        forecasts[ts_index].mean(0) - forecasts[ts_index].std(axis=0), 

        forecasts[ts_index].mean(0) + forecasts[ts_index].std(axis=0), 

        alpha=0.3, 

        interpolate=True,

        label="+/- 1-std",

    )

    plt.legend()

    plt.show()