Magicode logo
Magicode
0
84 min read

【Python】草コインわらしべ長者自動売買プロジェクト【XLM】【仮想通貨】(2022.06.29更新)

https://cdn.apollon.ai/media/notebox/d590ea54-7ba4-4fa8-a8a8-b07c57270123.jpeg
【重要】Googleアカウントが無効になってしまったため「ゲーム自動化研究所@Polaris」から「ゲーム自動化研究所Ver2@Polaris 」へアカウントを作り直して移行いたしました.フォローして下さった方々再度フォローして頂けると幸いです.

1. 更新履歴

  1. 2022/6/22 初版投稿
  2. 2022/6/25 投稿データ移行
  3. 2022/6/26 「4章:データ可視化」追記
  4. 2022/6/29 「5章:テクニカル指標」追記


class: center, middle

2. はじめに


2.1. この企画について

この企画は最底辺の草コイン(ステラルーメン)を活用して資産を形成しようと考えている企画です.
ステラルーメンはXLMで表記された底辺のコインです.

2.2. もっと底辺の仮想通貨あるじゃん?

シンボル(XYM)やネム(XEM)の方が底辺ですが,ビットフライヤーのAPIでサポートされていないため除外します.
*そのため,ここでの底辺とはビットフライヤーで取り扱いがあるコインの中で底辺ということになります.
このような草コインならリスクを最小に抑えつつ,取引するプログラミング,価格を予測するデータサイエンスや自動売買するサーバーなどを勉強できて,旨くいけば資産も形成できる,1石4鳥の欲張り企画です.

class: center, middle

3. 価格データ取得


3.1. 取得データのサイト

bitflyerいなごフライヤーのデータを取得します.
今後,データ解析中に他のサイトを追加するかもしれません.

3.2. 仮想環境の構築

conda で構築していきます.
ちなみに,Minicondaを使っています.
conda create -n PolaMbot
仮想環境に入ります.
conda activate PolaMbot

3.3. パッケージインストール

必要パッケージをインストールしていきます.
pip install ccxt
pip install websocket-client==0.47.0
pip install pandas
pip install selenium
selenium でchromedriverを使うのでこちらからダウンロードしておいてくださ い.
chromedriverとselenium関係は▼▼▼こちら▼▼▼をご参照ください.

3.4. 取得プログラム

import json
import websocket
import os 
import time
from time import sleep

from logging import getLogger,INFO,StreamHandler
logger = getLogger(__name__)
handler = StreamHandler()
handler.setLevel(INFO)
logger.setLevel(INFO)
logger.addHandler(handler)

import pprint
from datetime import datetime as dt

from selenium import webdriver
from selenium.webdriver.chrome.options import Options


"""

データセット収集プログラムです.

"""
# TARGET_PRODUCT = "BTC"
TARGET_PRODUCT = "XLM"
# TARGET_PRODUCT = "ETH"

class RealtimeBFlyInago(object):
    def __init__(self, url, channel, webdriver):
        #########################################################
        # 初期化
        #
        self.url = url
        self.channel = channel
        #
        # 保存フォルダを指定
        self.save_dir = '../log/{}/output'.format(TARGET_PRODUCT)
        #
        # 保存フォルダが無ければ作成
        os.makedirs(self.save_dir, exist_ok=True)

        self.webdriver = webdriver
        
        self.ws = websocket.WebSocketApp(self.url,header=None,on_open=self.on_open, on_message=self.on_message, on_error=self.on_error, on_close=self.on_close)
        websocket.enableTrace(True)


    # -------------------------------------------------------------------------------------------
    # BF api
    #
    def run(self):
        #ws has loop. To break this press ctrl + c to occur Keyboard Interruption Exception.
        self.ws.run_forever()   
        logger.info('Web Socket process ended.')
    """
    Below are callback functions of websocket.
    """
    #########################################################
    # 基本的にこの関数を編集する形になります.
    #
    # when we get message
    def on_message(self, ws, message):
        output = json.loads(message)['params']
        # ---------------------------------------------------------
        # csv write 
        #
        # 日付情報を文字列に変換します.
        #
        tdatetime = dt.now()
        tstr   = tdatetime.strftime('%Y%m%d')
        tstr2  = tdatetime.strftime('%Y-%m-%d %H:%M:%S.%f')
        tstr_hour  = tdatetime.strftime('%H')
        tstr_min   = tdatetime.strftime('%M')
        tstr3  = tdatetime.strftime('%Y%m%d-%H%M%S%f')
        t_unix = tdatetime.timestamp()
        #
        # create dir 
        # 日付のフォルダを作成します.
        #
        self.save_dir_day = self.save_dir + "/" + tstr
        os.makedirs(self.save_dir_day, exist_ok=True)
        #
        #
        # 日付のフォルダ内にsingleフォルダを作成します.
        #
        self.save_dir_day_single = self.save_dir + "/" + tstr + "/single/" + tstr_hour + "/" + tstr_min
        os.makedirs(self.save_dir_day_single, exist_ok=True)

        # --------------------------
        # inago
        #
        total_ask_vol, total_bid_vol = self._get_inago_vol()
        output['message']["total_ask_vol"] = total_ask_vol
        output['message']["total_bid_vol"] = total_bid_vol


        # --------------------------
        # create json files
        #
        # 板情報は1つのjsonファイルとして保存します.
        #
        with open('{}/{}-{}.json'.format(self.save_dir_day_single, tstr3, t_unix), 'w') as f:
            json.dump(output['message'], f, indent=2, ensure_ascii=False)
        print("time : {}, unix : {}".format(tstr2, t_unix))

    # when error occurs
    def on_error(self, ws, error):
        logger.error(error)
    # when websocket closed.
    def on_close(self, ws):
        logger.info('disconnected streaming server')
    # when websocket opened.
    def on_open(self, ws):
        logger.info('connected streaming server')
        output_json = json.dumps(
            {'method' : 'subscribe',
            'params' : {'channel' : self.channel}
            }
        )
        ws.send(output_json)

    # -------------------------------------------------------------------------------------------
    # Inago
    #
    def _get_inago_vol(self):
        for buyvol in self.webdriver.find_elements_by_id("buyVolumePerMeasurementTime"):
            total_ask_vol = buyvol.text
        for sellvol in self.webdriver.find_elements_by_id("sellVolumePerMeasurementTime"):
            total_bid_vol = sellvol.text
        return total_ask_vol, total_bid_vol
        
if __name__ == '__main__':

    # -----------------------------------------------
    # Inago
    #
    options = Options()
    # ヘッドレスモードを有効にす。指定しない場合、Chromeの画面が表示される
    options.add_argument('--headless')

    # Windows
    # webdriver  = webdriver.Chrome(options=options, executable_path=r'modules\chromedriver_win32\chromedriver.exe')

    # linux 
    webdriver  = webdriver.Chrome(options=options, executable_path=r'modules\chromedriver_linux64_v101\chromedriver')

    # Inago flyer URL
    webdriver.get("https://inagoflyer.appspot.com/btcmac")

    # -----------------------------------------------
    # BF API endpoint
    #
    url = 'wss://ws.lightstream.bitflyer.com/json-rpc'
    # channel = 'lightning_board_snapshot_BTC_JPY'
    channel = 'lightning_board_snapshot_{}_JPY'.format(TARGET_PRODUCT)
    json_rpc = RealtimeBFlyInago(url=url, channel=channel, webdriver=webdriver)

    #########################################################
    # エラーがでても10秒後には復帰します
    #
    while 1:
        try:        
            json_rpc.run()
            #ctrl + cで終了
        except Exception as e:
            print(e)
            time.sleep(10)

3.5. 実行

上記プログラムを実行します.
python RealtimeBFlyInago.py
そうするとこんな感じでログがでてきます.
(PolaMbot) H:\マイドライブ\PROJECT\503_PolaMbot_v5>python RealtimeBFlyInago.py
H:\マイドライブ\PROJECT\503_PolaMbot_v5\RealtimeBFlyInago.py:144: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
  webdriver  = webdriver.Chrome(options=options, executable_path=r'modules\chromedriver_win32\chromedriver.exe')

DevTools listening on ws://127.0.0.1:53496/devtools/browser/7fc549e0-0b84-493e-a36f-59137e82739f
[0622/003944.027:INFO:CONSOLE(62)] "Mixed Content: The page at 'https://inagoflyer.appspot.com/btcmac' was loaded over HTTPS, but requested an insecure frame 'http://developers.google.com/#_methods=onPlusOne%2C_ready%2C_close%2C_open%2C_resizeMe%2C_renderstart%2Concircled%2Cdrefresh%2Cerefresh%2Conload&id=I0_1655825983700&_gfid=I0_1655825983700&parent=https%3A%2F%2Finagoflyer.appspot.com&pfname=&rpctoken=68181274'. This request has been blocked; the content must be served over HTTPS.", source: https://apis.google.com/js/platform.js (62)
--- request header ---
GET /json-rpc HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Host: ws.lightstream.bitflyer.com
Origin: http://ws.lightstream.bitflyer.com
Sec-WebSocket-Key: dUufRdTTFLl/nE1fE7PJBw==
Sec-WebSocket-Version: 13


-----------------------
--- response header ---
HTTP/1.1 101 Switching Protocols
Date: Tue, 21 Jun 2022 15:39:52 GMT
Connection: upgrade
Upgrade: websocket
Sec-WebSocket-Accept: im5K7JSbgAB+5L0le2ODye31XAE=
-----------------------
connected streaming server
send: b'\x81\xd2\x9d\xbb\xeaG\xe6\x99\x87"\xe9\xd3\x85#\xbf\x81\xcae\xee\xce\x884\xfe\xc9\x83%\xf8\x99\xc6g\xbf\xcb\x8b5\xfc\xd6\x99e\xa7\x9b\x91e\xfe\xd3\x8b)\xf3\xde\x86e\xa7\x9b\xc8+\xf4\xdc\x823\xf3\xd2\x84 \xc2\xd9\x85&\xef\xdf\xb54\xf3\xda\x9a4\xf5\xd4\x9e\x18\xc5\xf7\xa7\x18\xd7\xeb\xb3e\xe0\xc6'
H:\マイドライブ\PROJECT\503_PolaMbot_v5\RealtimeBFlyInago.py:128: DeprecationWarning: find_elements_by_id is deprecated. Please use find_elements(by=By.ID, value=id_) instead
  for buyvol in self.webdriver.find_elements_by_id("buyVolumePerMeasurementTime"):
H:\マイドライブ\PROJECT\503_PolaMbot_v5\RealtimeBFlyInago.py:130: DeprecationWarning: find_elements_by_id is deprecated. Please use find_elements(by=By.ID, value=id_) instead
  for sellvol in self.webdriver.find_elements_by_id("sellVolumePerMeasurementTime"):
time : 2022-06-22 00:39:48.908462, unix : 1655825988.908462, total_ask_vol : 0.02, total_bid_vol : 0.41
time : 2022-06-22 00:39:53.996633, unix : 1655825993.996633, total_ask_vol : 6.31, total_bid_vol : 14.77

3.6. ログの見方

指定したパスに保存されます.デフォルトではプログラムを実行した階層の1つ上の階層にlogフォルダを作成して保存します.
中身は取引された価格と,各板情報が記載されています.

一番下にはいなごフライヤーの情報が記載されています.
ログファイルの例を下に記載しておきます.
{
  "mid_price": 15.928,
  "bids": [
    {
      "price": 7.0,
      "size": 12975.0
    },
    {
      "price": 7.4,
      "size": 10.0
    },
    {
      "price": 7.6,
      "size": 10.0
    },


      "price": 33.0,
      "size": 5000.0
    },
    {
      "price": 33.3,
      "size": 13968.0
    },
    {
      "price": 34.0,
      "size": 5000.0
    },
    {
      "price": 35.0,
      "size": 5000.0
    },
    {
      "price": 36.0,
      "size": 5000.0
    }
  ],
  "total_ask_vol": "186.14",
  "total_bid_vol": "45.29"
}

3.7. サーバーの構築

自分が普段使っているデスクトップで動かしてもよいのですが,重くなったときなどに再起動をした場合に再度プログラムを動かすのは面倒なのでノートPC(レッツノート)にKali Linuxを導入してサーバー化し,そこで動いてもらいます.
*ちなみにノートPCをサーバー化する際には内部の掃除をきちんとして,バッテリーは外した状態で運用しましょう.

Kali linuxは結構参考になる記事が多いのでここでは割愛致します. とても便利です.

Ubuntu系統なので,ここのサイトを見てMinicondaの 環境を整えました.
仮想環境の構築などは上記と同じです.
また,chromedriverはLinux版をダウンロードしておいてください.
あとは,コードをKali側に持っていって下記コマンドを実行
nohup python RealtimeBFlyInago.py > outlog.txt &
こうすることで,SSH側で起動してSSH接続を閉じてもタスクが終了しずにデータの取得が続きます.

class: center, middle

4. データ可視化


4.1. ログデータの整形

4.1.1. ログデータ

ログデータは下記のような形式になっています.
{
  "mid_price": 15.928,
  "bids": [
    {
      "price": 7.0,
      "size": 12975.0
    },


    {
      "price": 36.0,
      "size": 5000.0
    }
  ],
  "total_ask_vol": "186.14",
  "total_bid_vol": "45.29"
}

4.1.2. データのマージ

使用したプログラムは下記です *一部抜粋
ログファイルのリストを1つずつ取得してDataframeにしていきます
for log_path in tqdm(log_file_list):
            with open(log_path) as f:
                log_data_dict = json.load(f)

            # --------------
            # time stamp
            #
            t_str = float(log_path.split("\\")[-1].split("-")[-1].split(".json")[0])

            # --------------
            # dataframe
            #
            _one_time = pd.DataFrame(  data = [[t_str, log_data_dict["mid_price"], 
                                        float(log_data_dict["total_ask_vol"]), 
                                        float(log_data_dict["total_bid_vol"]), 
                                        float(log_data_dict["total_ask_vol"])+float(log_data_dict["total_bid_vol"])]], 
                                        columns = self.load_data_columns,
                                    )
変換したDataframeをマージして1つのDataframeにしていきます.
# --------------
            # merge
            #
            realtime_total_df = pd.concat([realtime_total_df, _one_time])

4.1.3. キャンドルデータ整形への下準備

マージしたDataframeを下記のキャンドル形式に整形していきます.
Low    High    Open   Close  total_vol  total_bid_vol  \
time2
2022-06-21 23:56:00  16.134  16.151  16.151  16.134     849.46         565.39
2022-06-21 23:57:00  16.130  16.136  16.134  16.135    2746.67        1832.74
2022-06-21 23:58:00  16.124  16.139  16.133  16.139    1914.65         986.43
まずはunix時間から日付の形式に変換し,キャンドルデータに整形します.
# --------------
        # unix time -> data time
        #
        realtime_total_df = self.df_unix2date(realtime_total_df=realtime_total_df)
        logger.info("{:=^60}".format(" realtime_total_df "))
        print(realtime_total_df[:10])
        realtime_total_df.to_csv(self.datasets_path + "/" + self.realtime_total_name)

        # --------------
        # Candle Data
        #
        candle_summary = self.totalization(realtime_total_df=realtime_total_df)
        logger.info("{:=^60}".format(" candle_summary_df "))
        print(candle_summary[:10])
        candle_summary.to_csv(self.datasets_path + "/" + self.option.candle_summary_name1)

4.1.4. 整形プログラム

この整形するプログラムは,外部からも呼び出したいので,@staticmethodを付けておきます.
@staticmethod
    def totalization(realtime_total_df):
        # 1分毎の安値と高値(min, max)、始値と終値(first, last)、および取引量(size) を集計
        candle_summary = realtime_total_df[['time2', 'mid_price']].groupby(['time2']).min().rename(columns={'mid_price': 'Low'})

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'mid_price']].groupby(['time2']).max().rename(columns={'mid_price': 'High'}),
            left_index=True, right_index=True)

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'mid_price']].groupby(['time2']).first().rename(columns={'mid_price': 'Open'}),
            left_index=True, right_index=True)
        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'mid_price']].groupby(['time2']).last().rename(columns={'mid_price': 'Close'}),
            left_index=True, right_index=True)
        
        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'total_vol']].groupby(['time2']).sum(),
            left_index=True, right_index=True)

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'total_bid_vol']].groupby(['time2']).sum(),
            left_index=True, right_index=True)

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'total_ask_vol']].groupby(['time2']).sum(),
            left_index=True, right_index=True)
        return candle_summary
これでキャンドルデータへの変換完了です.

4.2. 可視化

下記プログラムで以下のようなグラフが出ます.
def visual_summary(self, candle_summary_name):

        candle_summary = pd.read_csv(self.datasets_path + '/' + candle_summary_name, index_col=0, parse_dates=True)
        print(candle_summary)

        # -------------------
        # チャートの設定
        #
        plt.rcParams['figure.figsize']=100,20
        fig, ax1 = plt.subplots(1)

        # >>>>>>>>>>>>>>>>>>>>>>>>>>>
        # candle plot
        #        
        ax1 = self._candle_chart(stock_prices=candle_summary[:100], ax1=ax1)

        # ---------------
        # label plot
        #
        ax1.set_title( "log data", fontsize=14)
        plt.xticks(rotation=90)
        plt.grid(True)
        plt.savefig(self.option.technical_graph_dir + "/candle_summary.png",bbox_inches = 'tight', pad_inches = 0.05)

4.3. 全体プログラム

import glob
import re
import os
import argparse
import pprint
import json

import pandas as pd
import datetime
from tqdm import tqdm


# --------
# technical
#
import talib as ta
import talib

# --------
# logger
#
from logging import getLogger, config
with open('./log_config.json', 'r') as f:
    log_conf = json.load(f)
config.dictConfig(log_conf)
# ここからはいつもどおり
logger = getLogger(__name__)

# --------
# plot
#
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.dates as mdates

import seaborn as sns
# sns.set()
# plt.rcParams["image.cmap"] = "jet"
import matplotlib.dates as mpl_dates

# --------
# pycaret
#
import warnings
# 不要な警告文非表示
warnings.filterwarnings("ignore")
from pycaret.regression import *

class CryptoAnalysis():
    def __init__(self, option):
        #########################################################
        # 初期化
        #
        self.option             = option
        self.log_dir            = option.log_dir
        self.load_data_columns  = ["timestamp", "mid_price","total_ask_vol", "total_bid_vol", "total_vol"]
        self.datasets_path      = option.datasets_path
        self.realtime_total_name= option.realtime_total_name
        # self.analysis_range     = 
        logger.info('Initialization is complete.')

    def get_datasets(self, datasets_name):
        logger.info("{:=^60}".format(" datasets_df "))
        datasets_df = pd.read_csv(self.datasets_path + '/' + datasets_name, index_col=0)
        print(datasets_df)
        return datasets_df

    def load_datasets(self):
        log_file_list           = self._load_log_list()
        logger.info("log_file_list : {}".format(len(log_file_list)))
        self.log_data           = self._load_log_data(log_file_list)




    def _load_log_list(self):
        log_file_list = sorted(glob.glob(self.log_dir + r"\**\*.json", recursive=True))
        return log_file_list

    @staticmethod
    def df_unix2date(realtime_total_df):
        realtime_total_df['time'] = realtime_total_df['timestamp'].apply(datetime.datetime.fromtimestamp)
        realtime_total_df['time2'] = realtime_total_df['time'].map(lambda x: x.replace(second=0, microsecond=0))
        realtime_total_df = realtime_total_df.set_index("time")
        return realtime_total_df
    
    @staticmethod
    def totalization(realtime_total_df):
        # 1分毎の安値と高値(min, max)、始値と終値(first, last)、および取引量(size) を集計
        candle_summary = realtime_total_df[['time2', 'mid_price']].groupby(['time2']).min().rename(columns={'mid_price': 'Low'})

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'mid_price']].groupby(['time2']).max().rename(columns={'mid_price': 'High'}),
            left_index=True, right_index=True)

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'mid_price']].groupby(['time2']).first().rename(columns={'mid_price': 'Open'}),
            left_index=True, right_index=True)
        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'mid_price']].groupby(['time2']).last().rename(columns={'mid_price': 'Close'}),
            left_index=True, right_index=True)
        
        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'total_vol']].groupby(['time2']).sum(),
            left_index=True, right_index=True)

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'total_bid_vol']].groupby(['time2']).sum(),
            left_index=True, right_index=True)

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'total_ask_vol']].groupby(['time2']).sum(),
            left_index=True, right_index=True)
        return candle_summary

    def _load_log_data(self, log_file_list, visual=True):

        logger.info("load_data_columns : ".format(self.load_data_columns))

        realtime_total_df = pd.DataFrame(data={}, columns=self.load_data_columns)


        for log_path in tqdm(log_file_list):
            with open(log_path) as f:
                log_data_dict = json.load(f)

            # --------------
            # time stamp
            #
            t_str = float(log_path.split("\\")[-1].split("-")[-1].split(".json")[0])

            # --------------
            # dataframe
            #
            _one_time = pd.DataFrame(  data = [[t_str, log_data_dict["mid_price"], 
                                                float(log_data_dict["total_ask_vol"]), 
                                                float(log_data_dict["total_bid_vol"]), 
                                                float(log_data_dict["total_ask_vol"])+float(log_data_dict["total_bid_vol"])]], 
                                        columns = self.load_data_columns,
                                    )

            # --------------
            # merge
            #
            realtime_total_df = pd.concat([realtime_total_df, _one_time])


        # --------------
        # unix time -> data time
        #
        realtime_total_df = self.df_unix2date(realtime_total_df=realtime_total_df)
        logger.info("{:=^60}".format(" realtime_total_df "))
        print(realtime_total_df[:10])
        realtime_total_df.to_csv(self.datasets_path + "/" + self.realtime_total_name)

        # --------------
        # Candle Data
        #
        candle_summary = self.totalization(realtime_total_df=realtime_total_df)
        logger.info("{:=^60}".format(" candle_summary_df "))
        print(candle_summary[:10])
        candle_summary.to_csv(self.datasets_path + "/" + self.option.candle_summary_name1)

        return candle_summary

    def visual_summary(self, candle_summary_name):

        candle_summary = pd.read_csv(self.datasets_path + '/' + candle_summary_name, index_col=0, parse_dates=True)
        # candle_summary['tmp'] = candle_summary['Close']
        # candle_summary['Close'] = candle_summary['Open']
        # candle_summary['Open'] = candle_summary['tmp']
        print(candle_summary)


        # -------------------
        # チャートの設定
        #
        plt.rcParams['figure.figsize']=100,20
        fig, ax1 = plt.subplots(1)
        # plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)

        # >>>>>>>>>>>>>>>>>>>>>>>>>>>
        # candle plot
        #        
        ax1 = self._candle_chart(stock_prices=candle_summary[:100], ax1=ax1)

        # ---------------
        # label plot
        #
        # ax1.xaxis.set_major_locator(ticker.MultipleLocator(10))
        # ax1.xaxis.set_minor_locator(ticker.MultipleLocator(5))
        ax1.set_title( "log data", fontsize=14)
        # ax1.set_xlim([candle_summary.index[0], candle_summary.index[-1]])
        plt.xticks(rotation=90)
        plt.grid(True)
        # plt.legend()
        plt.savefig(self.option.technical_graph_dir + "/candle_summary.png",bbox_inches = 'tight', pad_inches = 0.05)

    def create_technical_indicators(self, technical_df):
        # ----------------
        # bollinger_band
        #
        # technical_df = self._bollinger_band_visual(technical_df=datasets_df)

        # ----------------
        # SMA
        #
        technical_df = self._SMA_visual(technical_df=technical_df, visual=True)


        # ----------------
        # gt
        #
        technical_df = self._create_gt(technical_df=technical_df)
        technical_df.to_csv(self.datasets_path + "/" + self.option.technical_df_name)

        # ----------------
        # split train & test
        #
        train_df, test_df = self._creata_train_test(technical_df)
        train_df.to_csv(self.datasets_path + "/" + self.option.train_df_name)
        test_df.to_csv(self.datasets_path + "/" + self.option.test_df_name)

    
    def _creata_train_test(self, technical_df):
        
        data_num    = len(technical_df)
        logger.info("technical_df len : {}".format(data_num))
        
        train_df    = technical_df[:int(option.split_train_rate*data_num)]
        test_df     = technical_df[int(option.split_train_rate*data_num):]
        

        logger.info("train_df len : {}".format(len(train_df)))
        logger.info("test_df  len : {}".format(len(test_df)))

        return train_df, test_df
        

    @staticmethod
    def _create_gt(technical_df):
        # --------------
        # mid price
        #
        technical_df['mid'] = (technical_df['High'] + technical_df['Low'])/2
        technical_df['gt1'] = technical_df['mid'].shift(-1)
        technical_df['gt3'] = technical_df['Close'].shift(-1)

        # --------------
        # diff
        #
        technical_df['gt2'] = technical_df['gt1'] - technical_df["mid"]

        # logger.info("{:=^60}".format(" GT_df "))
        # print(technical_df[:10])
        return technical_df


    # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # SMA
    #
    @staticmethod
    def calc_SMA(technical_df, n):
        technical_df["SMA{}".format(n)] = talib.SMA(technical_df["Close"].values, n)  # 25移動平均
        return technical_df

    def _SMA_visual(self, technical_df, visual=True):

        # ==============================
        # calc SMA
        #
        technical_df = self.calc_SMA(technical_df=technical_df, n=25)
        technical_df = self.calc_SMA(technical_df=technical_df, n=75)
        technical_df = self.calc_SMA(technical_df=technical_df, n=200)
        logger.info("{:=^60}".format(" technical_df "))


        # ==============================
        # visual SMA
        #
        if(visual):
            # -------------------
            # チャートの設定
            #
            plt.rcParams['figure.figsize']=100,20
            fig, ax1 = plt.subplots(1)
            plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)

            # >>>>>>>>>>>>>>>>>>>>>>>>>>>
            # main plot
            #
            ax1.plot(technical_df.index, technical_df['Close'], label='XLM', color='black')

            # ---------------
            # SMA plot 
            #
            ax1.plot(technical_df.index, technical_df['SMA25'], label='SMA25', color=plt.cm.plasma(220))
            ax1.plot(technical_df.index, technical_df['SMA75'], label='SMA75', color=plt.cm.plasma(100))
            ax1.plot(technical_df.index, technical_df['SMA200'], label='SMA200', color=plt.cm.plasma(50))

            # ---------------
            # SMA area 
            #
            ax1.fill_between(technical_df.index, 
                            y1=technical_df['SMA25'], 
                            y2=technical_df['SMA200'], 
                            color='grey', 
                            alpha=0.3)

            # ---------------
            # label plot
            #
            ax1.set_title( "SMA 25/75/200", fontsize=14)
            ax1.set_xlim(technical_df.index[0], technical_df.index[-1])
            plt.xticks(rotation=90)
            plt.grid(True)

            # ---------------
            # save
            #
            plt.savefig(self.option.technical_graph_dir + "/SMA.png",bbox_inches = 'tight', pad_inches = 0.05)
            print(self.option.technical_graph_dir)

        return technical_df



    # -------------------------------------------------------------------------------------------
    # pred result
    #
    def pred_result(self):

        # ==============================
        # laod datasets
        #
        pred_test_final_model_no_turned_df      = pd.read_csv(r"datasets\pred_test_final_model_no_turned.csv", index_col=0, parse_dates=True)
        pred_train_final_model_no_turned_df     = pd.read_csv(r"datasets\pred_train_final_model_no_turned.csv", index_col=0, parse_dates=True)
        pred_test_final_model_R2_df             = pd.read_csv(r"datasets\pred_test_final_model_R2.csv", index_col=0, parse_dates=True)
        pred_train_final_model_R2_df            = pd.read_csv(r"datasets\pred_train_final_model_R2.csv", index_col=0, parse_dates=True)
        pred_test_final_model_RMSE_df           = pd.read_csv(r"datasets\pred_test_final_model_RMSE.csv", index_col=0, parse_dates=True)
        pred_train_final_model_RMSE_df          = pd.read_csv(r"datasets\pred_train_final_model_RMSE.csv", index_col=0, parse_dates=True)


        # ==============================
        # チャートの設定
        #
        plt.style.use('ggplot')
        plt.rcParams['figure.figsize']=100,40
        fig, ax1 = plt.subplots()
        # plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)


        ax1.axvspan(pred_train_final_model_RMSE_df.index[0], pred_train_final_model_RMSE_df.index[-1], facecolor=plt.cm.plasma(20), alpha=0.2)
        ax1.axvspan(pred_test_final_model_RMSE_df.index[0], pred_test_final_model_RMSE_df.index[-1], facecolor=plt.cm.plasma(100), alpha=0.2)

        # >>>>>>>>>>>>>>>>>>>>>>>>>>>
        # candle plot
        #        
        ax1 = self._candle_chart(stock_prices=pred_train_final_model_no_turned_df, ax1=ax1)
        ax1 = self._candle_chart(stock_prices=pred_test_final_model_no_turned_df, ax1=ax1)


        # >>>>>>>>>>>>>>>>>>>>>>>>>>>
        # main plot
        #
        
        ax1.plot(pred_train_final_model_no_turned_df.index, pred_train_final_model_no_turned_df['gt3'], label='GT', color='black')
        # ax1.plot(pred_test_final_model_no_turned_df.index, pred_test_final_model_no_turned_df['gt1'], label='GT', color='green')

        # ax1.plot(pred_test_final_model_no_turned_df.index, pred_test_final_model_no_turned_df['Label'], linewidth = 2.0, label='test_final_model_no_turned', color=plt.cm.plasma(20))
        # ax1.plot(pred_train_final_model_no_turned_df.index, pred_train_final_model_no_turned_df['Label'], linewidth = 2.0, label='train_final_model_no_turned', color=plt.cm.plasma(40))

        ax1.plot(pred_test_final_model_R2_df.index, pred_test_final_model_R2_df['Label'], linewidth = 1.0, label='test_final_model_R2', color=plt.cm.plasma(150))
        # ax1.plot(pred_train_final_model_R2_df.index, pred_train_final_model_R2_df['Label'], linewidth = 1.0, label='train_final_model_R2', color=plt.cm.plasma(100))

        # ax1.plot(pred_train_final_model_RMSE_df.index, pred_train_final_model_RMSE_df['Label'], linewidth = 0.5, label='train_final_model_R2', color=plt.cm.plasma(220))
        # ax1.plot(pred_test_final_model_RMSE_df.index, pred_test_final_model_RMSE_df['Label'], linewidth = 0.5, label='test_final_model_R2', color=plt.cm.plasma(240))



        # ---------------
        # label plot
        #
        # ax1.xaxis.set_major_locator(ticker.MultipleLocator(10))
        # ax1.xaxis.set_minor_locator(ticker.MultipleLocator(5))
        ax1.set_title( "pred_result", fontsize=14)
        print([min(pred_train_final_model_R2_df.index[0], pred_test_final_model_R2_df.index[0]), max(pred_train_final_model_R2_df.index[-1], pred_test_final_model_R2_df.index[-1])])
        ax1.set_xlim([min(pred_train_final_model_R2_df.index[0], pred_test_final_model_R2_df.index[0]), max(pred_train_final_model_R2_df.index[-1], pred_test_final_model_R2_df.index[-1])])
        plt.xticks(rotation=90)
        plt.grid(True)
        plt.legend()
        plt.savefig(self.option.technical_graph_dir + "/pred_result3.png",bbox_inches = 'tight', pad_inches = 0.05)

    # -------------------------------------------------------------------------------------------
    # plot utils
    #
    def _area_plot(self, ax2, technical_df):
        for j, i in enumerate(range(0, len(technical_df.index), 60)):
            if(len(technical_df.index) < i+60):
                ax2.axvspan(technical_df.index[i], technical_df.index[-1], facecolor=plt.cm.jet(j*10), alpha=0.2)
            else:
                ax2.axvspan(technical_df.index[i], technical_df.index[i+60], facecolor=plt.cm.jet(j*10), alpha=0.2)
        return ax2

    def _candle_chart(self, stock_prices, ax1):
        up = stock_prices[stock_prices.Close >= stock_prices.Open]
        down = stock_prices[stock_prices.Close < stock_prices.Open]

        # When the stock prices have decreased, then it
        # will be represented by blue color candlestick
        col1 = 'blue'
        
        # When the stock prices have increased, then it 
        # will be represented by green color candlestick
        col2 = 'red'
        
        # Setting width of candlestick elements
        width = .0006
        width2 = .00006
        
        # Plotting up prices of the stock
        ax1.bar(up.index, up.Close-up.Open, width, bottom=up.Open, color=col1)
        ax1.bar(up.index, up.High-up.Close, width2, bottom=up.Close, color=col1)
        ax1.bar(up.index, up.Low-up.Open, width2, bottom=up.Open, color=col1)
        
        # Plotting down prices of the stock
        ax1.bar(down.index, down.Close-down.Open, width, bottom=down.Open, color=col2)
        ax1.bar(down.index, down.High-down.Open, width2, bottom=down.Open, color=col2)
        ax1.bar(down.index, down.Low-down.Close, width2, bottom=down.Close, color=col2)

        return ax1


    # -------------------------------------------------------------------------------------------
    # BF api
    #
    def run(self):
        pass



if __name__ == '__main__':

    #########################################################
    # 初期化
    #
    parser = argparse.ArgumentParser(description='CryptoAnalysis option')    # 2. パーサを作る
    option = parser.parse_args()

    option.datasets_path        = r"datasets"
    # option.log_dir              = r"H:\マイドライブ\PROJECT\log\XLM"
    option.log_dir              = r"X:\a002_PolaMbot_v4.4\log\XLM"
    option.candle_summary_name1 = "candle_summary_v4.csv"
    option.candle_summary_name2 = "candle_summary_v4.csv"
    option.realtime_total_name  = "realtime_total_v4.csv"
    option.technical_graph_dir  = "technical_graph"
    option.technical_df_name    = "technical_df_v4.csv"
    option.train_df_name        = "train_df_v4.csv"
    option.test_df_name         = "test_df_v4.csv"
    option.split_train_rate     = 0.8

    #########################################################
    # object create 
    #
    CANLS = CryptoAnalysis(option=option)

    # -------
    # load log data
    #
    # CANLS.load_datasets()

    # -------
    # visual summary data
    #
    CANLS.visual_summary(candle_summary_name=option.candle_summary_name2)

    # -------
    # get datasets
    #
    # datasets_df = CANLS.get_datasets(datasets_name=option.candle_summary_name2)

    # -------
    # get datasets
    #
    # CANLS.create_technical_indicators(technical_df=datasets_df)


    # -------
    # plot
    #
    # CANLS.pred_result()

class: center, middle

5. テクニカル指標の算出


5.1. Ta-lib のインストール

まずはPythonのVerの確認
(PolaMbot) H:\マイドライブ\PROJECT\503_PolaMbot_v5>python
Python 3.9.12 (main, Apr  4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32

Warning:
This Python interpreter is in a conda environment, but the environment has
not been activated.  Libraries may fail to load.  To activate this environment
please see https://conda.io/activation

Type "help", "copyright", "credits" or "license" for more information.
>>> ^Z
次に下記のサイトから
「Ta-lib」のPythonのVerに合ったものをダウンロード
そうしたら,pip installでインストール
(PolaMbot) H:\マイドライブ\PROJECT\503_PolaMbot_v5>pip install download\TA_Lib-0.4.24-cp39-cp39-win_amd64.whl
Processing h:\マイドライブ\project\503_polambot_v5\download\ta_lib-0.4.24-cp39-cp39-win_amd64.whl
Requirement already satisfied: numpy in c:\users\polaris2\miniconda3\lib\site-packages (from TA-Lib==0.4.24) (1.22.4)
Installing collected packages: TA-Lib
Successfully installed TA-Lib-0.4.24

5.2. テクニカル指標を算出してみる

移動平均のプログラムで算出してみます.
Closeの値と移動平均の窓枠の数を与えるだけで算出されます.
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # SMA
    #
    @staticmethod
    def calc_SMA(technical_df, n):
        technical_df["SMA{}".format(n)] = talib.SMA(technical_df["Close"].values, n)  # 25移動平均
        return technical_df
「datasets_df」が入力のデータセットのdataframeです.
2022-06-29 14:32:48,381  __main__:       63         get_datasets [ INFO]: ======================= datasets_df ========================
                        Low    High    Open   Close  total_vol  total_bid_vol  \
time2
2022-06-21 23:56:00  16.134  16.151  16.134  16.151     849.46         565.39
2022-06-21 23:57:00  16.130  16.136  16.135  16.134    2746.67        1832.74
2022-06-21 23:58:00  16.124  16.139  16.139  16.133    1914.65         986.43
2022-06-21 23:59:00  16.138  16.154  16.140  16.147    2500.28        1484.01
2022-06-22 00:00:00  16.112  16.150  16.122  16.140    1781.19        1278.39
...                     ...     ...     ...     ...        ...            ...
2022-06-25 09:03:00  17.235  17.251  17.235  17.251    2102.32        1059.27
2022-06-25 09:04:00  17.218  17.248  17.218  17.239    1633.32         881.08
2022-06-25 09:05:00  17.167  17.220  17.184  17.220    2228.24        1412.95
2022-06-25 09:06:00  17.169  17.193  17.169  17.176    2590.31        1579.67
2022-06-25 09:07:00  17.162  17.191  17.188  17.162    1521.38         824.26
                     total_ask_vol
time2
2022-06-21 23:56:00         284.07
2022-06-21 23:57:00         913.93
2022-06-21 23:58:00         928.22
2022-06-21 23:59:00        1016.27
2022-06-22 00:00:00         502.80
...                            ...
2022-06-25 09:03:00        1043.05
2022-06-25 09:04:00         752.24
2022-06-25 09:05:00         815.29
2022-06-25 09:06:00        1010.64
2022-06-25 09:07:00         697.12

「technical_df」がテクニカル指標が算出された後のdataframeです.
2022-06-29 14:32:48,417  __main__:      275          _SMA_visual [ INFO]: ======================= technical_df =======================
                        Low    High    Open   Close  total_vol  total_bid_vol  \
time2
2022-06-25 08:58:00  17.262  17.282  17.264  17.267    1830.70         921.64
2022-06-25 08:59:00  17.269  17.284  17.284  17.270    2315.10        1006.03
2022-06-25 09:00:00  17.261  17.284  17.261  17.284    2142.77         853.87
2022-06-25 09:01:00  17.247  17.261  17.248  17.261    2490.42        1015.52
2022-06-25 09:02:00  17.239  17.259  17.252  17.251    1797.63         690.92
2022-06-25 09:03:00  17.235  17.251  17.235  17.251    2102.32        1059.27
2022-06-25 09:04:00  17.218  17.248  17.218  17.239    1633.32         881.08
2022-06-25 09:05:00  17.167  17.220  17.184  17.220    2228.24        1412.95
2022-06-25 09:06:00  17.169  17.193  17.169  17.176    2590.31        1579.67
2022-06-25 09:07:00  17.162  17.191  17.188  17.162    1521.38         824.26

                     total_ask_vol     SMA25      SMA75     SMA200
time2
2022-06-25 08:58:00         909.06  17.39504  17.330160  17.272270
2022-06-25 08:59:00        1309.07  17.38976  17.329907  17.272770
2022-06-25 09:00:00        1288.90  17.38440  17.329733  17.273395
2022-06-25 09:01:00        1474.90  17.37816  17.329240  17.273835
2022-06-25 09:02:00        1106.71  17.37152  17.328400  17.274385
2022-06-25 09:03:00        1043.05  17.36492  17.327533  17.274990
2022-06-25 09:04:00         752.24  17.35756  17.326507  17.275395
2022-06-25 09:05:00         815.29  17.34908  17.325267  17.275715
2022-06-25 09:06:00        1010.64  17.33904  17.323427  17.275820
2022-06-25 09:07:00         697.12  17.32904  17.321333  17.275830

実行から出力までの結果一覧はこちら.
(PolaMbot38) H:\マイドライブ\PROJECT\503_PolaMbot_v5>python CryptoAnalysisBeta.py
2022-06-29 14:32:48,378  __main__:       60             __init__ [ INFO]: Initialization is complete.
2022-06-29 14:32:48,381  __main__:       63         get_datasets [ INFO]: ======================= datasets_df ========================
                        Low    High    Open   Close  total_vol  total_bid_vol  \
time2
2022-06-21 23:56:00  16.134  16.151  16.134  16.151     849.46         565.39
2022-06-21 23:57:00  16.130  16.136  16.135  16.134    2746.67        1832.74
2022-06-21 23:58:00  16.124  16.139  16.139  16.133    1914.65         986.43
2022-06-21 23:59:00  16.138  16.154  16.140  16.147    2500.28        1484.01
2022-06-22 00:00:00  16.112  16.150  16.122  16.140    1781.19        1278.39
...                     ...     ...     ...     ...        ...            ...
2022-06-25 09:03:00  17.235  17.251  17.235  17.251    2102.32        1059.27
2022-06-25 09:04:00  17.218  17.248  17.218  17.239    1633.32         881.08
2022-06-25 09:05:00  17.167  17.220  17.184  17.220    2228.24        1412.95
2022-06-25 09:06:00  17.169  17.193  17.169  17.176    2590.31        1579.67
2022-06-25 09:07:00  17.162  17.191  17.188  17.162    1521.38         824.26

                     total_ask_vol
time2
2022-06-21 23:56:00         284.07
2022-06-21 23:57:00         913.93
2022-06-21 23:58:00         928.22
2022-06-21 23:59:00        1016.27
2022-06-22 00:00:00         502.80
...                            ...
2022-06-25 09:03:00        1043.05
2022-06-25 09:04:00         752.24
2022-06-25 09:05:00         815.29
2022-06-25 09:06:00        1010.64
2022-06-25 09:07:00         697.12

[4853 rows x 7 columns]
2022-06-29 14:32:48,417  __main__:      275          _SMA_visual [ INFO]: ======================= technical_df =======================
                        Low    High    Open   Close  total_vol  total_bid_vol  \
time2
2022-06-25 08:58:00  17.262  17.282  17.264  17.267    1830.70         921.64
2022-06-25 08:59:00  17.269  17.284  17.284  17.270    2315.10        1006.03
2022-06-25 09:00:00  17.261  17.284  17.261  17.284    2142.77         853.87
2022-06-25 09:01:00  17.247  17.261  17.248  17.261    2490.42        1015.52
2022-06-25 09:02:00  17.239  17.259  17.252  17.251    1797.63         690.92
2022-06-25 09:03:00  17.235  17.251  17.235  17.251    2102.32        1059.27
2022-06-25 09:04:00  17.218  17.248  17.218  17.239    1633.32         881.08
2022-06-25 09:05:00  17.167  17.220  17.184  17.220    2228.24        1412.95
2022-06-25 09:06:00  17.169  17.193  17.169  17.176    2590.31        1579.67
2022-06-25 09:07:00  17.162  17.191  17.188  17.162    1521.38         824.26

                     total_ask_vol     SMA25      SMA75     SMA200
time2
2022-06-25 08:58:00         909.06  17.39504  17.330160  17.272270
2022-06-25 08:59:00        1309.07  17.38976  17.329907  17.272770
2022-06-25 09:00:00        1288.90  17.38440  17.329733  17.273395
2022-06-25 09:01:00        1474.90  17.37816  17.329240  17.273835
2022-06-25 09:02:00        1106.71  17.37152  17.328400  17.274385
2022-06-25 09:03:00        1043.05  17.36492  17.327533  17.274990
2022-06-25 09:04:00         752.24  17.35756  17.326507  17.275395
2022-06-25 09:05:00         815.29  17.34908  17.325267  17.275715
2022-06-25 09:06:00        1010.64  17.33904  17.323427  17.275820
2022-06-25 09:07:00         697.12  17.32904  17.321333  17.275830
Traceback (most recent call last):
これで無事に算出されました.

5.3. テクニカル指標を可視化してみる

最後に,テクニカル指標を可視化してみます.
2つの移動平均で囲まれた範囲も描画するようにしました.

描画用のプログラムはこのような感じです.
# ==============================
        # visual SMA
        #
        if(visual):
            # -------------------
            # チャートの設定
            #
            plt.rcParams['figure.figsize']=100,20
            fig, ax1 = plt.subplots(1)
            plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)

            # >>>>>>>>>>>>>>>>>>>>>>>>>>>
            # main plot
            #
            ax1.plot(technical_df.index, technical_df['Close'], label='XLM', color='black')

            # ---------------
            # SMA plot 
            #
            ax1.plot(technical_df.index, technical_df['SMA25'], label='SMA25', color=plt.cm.plasma(220))
            ax1.plot(technical_df.index, technical_df['SMA75'], label='SMA75', color=plt.cm.plasma(100))
            ax1.plot(technical_df.index, technical_df['SMA200'], label='SMA200', color=plt.cm.plasma(50))

            # ---------------
            # SMA area 
            #
            ax1.fill_between(technical_df.index, 
                            y1=technical_df['SMA25'], 
                            y2=technical_df['SMA200'], 
                            color='grey', 
                            alpha=0.3)

            # ---------------
            # label plot
            #
            ax1.set_title( "SMA 25/75/200", fontsize=14)
            ax1.set_xlim(technical_df.index[0], technical_df.index[-1])
            plt.xticks(rotation=90)
            plt.grid(True)

            # ---------------
            # save
            #
            plt.savefig(self.option.technical_graph_dir + "/SMA.png",bbox_inches = 'tight', pad_inches = 0.05)
            print(self.option.technical_graph_dir)

class: center, middle

6. 中間まとめ

========= 執筆中 =========

6.1. ソースコード

ここまでのソースコードのまとめになります.
import glob
import re
import os
import argparse
import pprint
import json

import pandas as pd
import datetime
from tqdm import tqdm


# --------
# technical
#
import talib as ta
import talib

# --------
# logger
#
from logging import getLogger, config
with open('./log_config.json', 'r') as f:
    log_conf = json.load(f)
config.dictConfig(log_conf)
# ここからはいつもどおり
logger = getLogger(__name__)

# --------
# plot
#
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.dates as mdates

import seaborn as sns
# sns.set()
# plt.rcParams["image.cmap"] = "jet"
import matplotlib.dates as mpl_dates

# --------
# pycaret
#
import warnings
# 不要な警告文非表示
warnings.filterwarnings("ignore")
from pycaret.regression import *

class CryptoAnalysis():
    def __init__(self, option):
        #########################################################
        # 初期化
        #
        self.option             = option
        self.log_dir            = option.log_dir
        self.load_data_columns  = ["timestamp", "mid_price","total_ask_vol", "total_bid_vol", "total_vol"]
        self.datasets_path      = option.datasets_path
        self.realtime_total_name= option.realtime_total_name
        # self.analysis_range     = 
        logger.info('Initialization is complete.')

    def get_datasets(self, datasets_name):
        logger.info("{:=^60}".format(" datasets_df "))
        datasets_df = pd.read_csv(self.datasets_path + '/' + datasets_name, index_col=0)
        print(datasets_df)
        return datasets_df

    def load_datasets(self):
        log_file_list           = self._load_log_list()
        logger.info("log_file_list : {}".format(len(log_file_list)))
        self.log_data           = self._load_log_data(log_file_list)




    def _load_log_list(self):
        log_file_list = sorted(glob.glob(self.log_dir + r"\**\*.json", recursive=True))
        return log_file_list

    @staticmethod
    def df_unix2date(realtime_total_df):
        realtime_total_df['time'] = realtime_total_df['timestamp'].apply(datetime.datetime.fromtimestamp)
        realtime_total_df['time2'] = realtime_total_df['time'].map(lambda x: x.replace(second=0, microsecond=0))
        realtime_total_df = realtime_total_df.set_index("time")
        return realtime_total_df
    
    @staticmethod
    def totalization(realtime_total_df):
        # 1分毎の安値と高値(min, max)、始値と終値(first, last)、および取引量(size) を集計
        candle_summary = realtime_total_df[['time2', 'mid_price']].groupby(['time2']).min().rename(columns={'mid_price': 'Low'})

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'mid_price']].groupby(['time2']).max().rename(columns={'mid_price': 'High'}),
            left_index=True, right_index=True)

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'mid_price']].groupby(['time2']).first().rename(columns={'mid_price': 'Open'}),
            left_index=True, right_index=True)
        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'mid_price']].groupby(['time2']).last().rename(columns={'mid_price': 'Close'}),
            left_index=True, right_index=True)
        
        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'total_vol']].groupby(['time2']).sum(),
            left_index=True, right_index=True)

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'total_bid_vol']].groupby(['time2']).sum(),
            left_index=True, right_index=True)

        candle_summary = candle_summary.merge(
            realtime_total_df[['time2', 'total_ask_vol']].groupby(['time2']).sum(),
            left_index=True, right_index=True)
        return candle_summary

    def _load_log_data(self, log_file_list, visual=True):

        logger.info("load_data_columns : ".format(self.load_data_columns))

        realtime_total_df = pd.DataFrame(data={}, columns=self.load_data_columns)


        for log_path in tqdm(log_file_list):
            with open(log_path) as f:
                log_data_dict = json.load(f)

            # --------------
            # time stamp
            #
            t_str = float(log_path.split("\\")[-1].split("-")[-1].split(".json")[0])

            # --------------
            # dataframe
            #
            _one_time = pd.DataFrame(  data = [[t_str, log_data_dict["mid_price"], 
                                                float(log_data_dict["total_ask_vol"]), 
                                                float(log_data_dict["total_bid_vol"]), 
                                                float(log_data_dict["total_ask_vol"])+float(log_data_dict["total_bid_vol"])]], 
                                        columns = self.load_data_columns,
                                    )

            # --------------
            # merge
            #
            realtime_total_df = pd.concat([realtime_total_df, _one_time])


        # --------------
        # unix time -> data time
        #
        realtime_total_df = self.df_unix2date(realtime_total_df=realtime_total_df)
        logger.info("{:=^60}".format(" realtime_total_df "))
        print(realtime_total_df[:10])
        realtime_total_df.to_csv(self.datasets_path + "/" + self.realtime_total_name)

        # --------------
        # Candle Data
        #
        candle_summary = self.totalization(realtime_total_df=realtime_total_df)
        logger.info("{:=^60}".format(" candle_summary_df "))
        print(candle_summary[:10])
        candle_summary.to_csv(self.datasets_path + "/" + self.option.candle_summary_name1)

        return candle_summary

    def visual_summary(self, candle_summary_name):

        candle_summary = pd.read_csv(self.datasets_path + '/' + candle_summary_name, index_col=0, parse_dates=True)
        # candle_summary['tmp'] = candle_summary['Close']
        # candle_summary['Close'] = candle_summary['Open']
        # candle_summary['Open'] = candle_summary['tmp']
        print(candle_summary)


        # -------------------
        # チャートの設定
        #
        plt.rcParams['figure.figsize']=100,20
        fig, ax1 = plt.subplots(1)
        # plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)

        # >>>>>>>>>>>>>>>>>>>>>>>>>>>
        # candle plot
        #        
        ax1 = self._candle_chart(stock_prices=candle_summary[:100], ax1=ax1)

        # ---------------
        # label plot
        #
        # ax1.xaxis.set_major_locator(ticker.MultipleLocator(10))
        # ax1.xaxis.set_minor_locator(ticker.MultipleLocator(5))
        ax1.set_title( "log data", fontsize=14)
        # ax1.set_xlim([candle_summary.index[0], candle_summary.index[-1]])
        plt.xticks(rotation=90)
        plt.grid(True)
        # plt.legend()
        plt.savefig(self.option.technical_graph_dir + "/candle_summary.png",bbox_inches = 'tight', pad_inches = 0.05)

    def create_technical_indicators(self, technical_df):
        # ----------------
        # bollinger_band
        #
        # technical_df = self._bollinger_band_visual(technical_df=datasets_df)

        # ----------------
        # SMA
        #
        technical_df = self._SMA_visual(technical_df=technical_df, visual=True)


        # ----------------
        # gt
        #
        technical_df = self._create_gt(technical_df=technical_df)
        technical_df.to_csv(self.datasets_path + "/" + self.option.technical_df_name)

        # ----------------
        # split train & test
        #
        train_df, test_df = self._creata_train_test(technical_df)
        train_df.to_csv(self.datasets_path + "/" + self.option.train_df_name)
        test_df.to_csv(self.datasets_path + "/" + self.option.test_df_name)

    
    def _creata_train_test(self, technical_df):
        
        data_num    = len(technical_df)
        logger.info("technical_df len : {}".format(data_num))
        
        train_df    = technical_df[:int(option.split_train_rate*data_num)]
        test_df     = technical_df[int(option.split_train_rate*data_num):]
        

        logger.info("train_df len : {}".format(len(train_df)))
        logger.info("test_df  len : {}".format(len(test_df)))

        return train_df, test_df
        

    @staticmethod
    def _create_gt(technical_df):
        # --------------
        # mid price
        #
        technical_df['mid'] = (technical_df['High'] + technical_df['Low'])/2
        technical_df['gt1'] = technical_df['mid'].shift(-1)
        technical_df['gt3'] = technical_df['Close'].shift(-1)

        # --------------
        # diff
        #
        technical_df['gt2'] = technical_df['gt1'] - technical_df["mid"]

        # logger.info("{:=^60}".format(" GT_df "))
        # print(technical_df[:10])
        return technical_df


    # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # SMA
    #
    @staticmethod
    def calc_SMA(technical_df, n):
        technical_df["SMA{}".format(n)] = talib.SMA(technical_df["Close"].values, n)  # 25移動平均
        return technical_df

    def _SMA_visual(self, technical_df, visual=True):

        # ==============================
        # calc SMA
        #
        technical_df = self.calc_SMA(technical_df=technical_df, n=25)
        technical_df = self.calc_SMA(technical_df=technical_df, n=75)
        technical_df = self.calc_SMA(technical_df=technical_df, n=200)
        logger.info("{:=^60}".format(" technical_df "))
        print(technical_df.tail(10))



        # ==============================
        # visual SMA
        #
        if(visual):
            # -------------------
            # チャートの設定
            #
            plt.rcParams['figure.figsize']=100,20
            fig, ax1 = plt.subplots(1)
            plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)

            # >>>>>>>>>>>>>>>>>>>>>>>>>>>
            # main plot
            #
            ax1.plot(technical_df.index, technical_df['Close'], label='XLM', color='black')

            # ---------------
            # SMA plot 
            #
            ax1.plot(technical_df.index, technical_df['SMA25'], label='SMA25', color=plt.cm.plasma(220))
            ax1.plot(technical_df.index, technical_df['SMA75'], label='SMA75', color=plt.cm.plasma(100))
            ax1.plot(technical_df.index, technical_df['SMA200'], label='SMA200', color=plt.cm.plasma(50))

            # ---------------
            # SMA area 
            #
            ax1.fill_between(technical_df.index, 
                            y1=technical_df['SMA25'], 
                            y2=technical_df['SMA200'], 
                            color='grey', 
                            alpha=0.3)

            # ---------------
            # label plot
            #
            ax1.set_title( "SMA 25/75/200", fontsize=14)
            ax1.set_xlim(technical_df.index[0], technical_df.index[-1])
            plt.xticks(rotation=90)
            plt.grid(True)

            # ---------------
            # save
            #
            plt.savefig(self.option.technical_graph_dir + "/SMA.png",bbox_inches = 'tight', pad_inches = 0.05)
            print(self.option.technical_graph_dir)

        return technical_df



    # -------------------------------------------------------------------------------------------
    # pred result
    #
    def pred_result(self):

        # ==============================
        # laod datasets
        #
        pred_test_final_model_no_turned_df      = pd.read_csv(r"datasets\pred_test_final_model_no_turned.csv", index_col=0, parse_dates=True)
        pred_train_final_model_no_turned_df     = pd.read_csv(r"datasets\pred_train_final_model_no_turned.csv", index_col=0, parse_dates=True)
        pred_test_final_model_R2_df             = pd.read_csv(r"datasets\pred_test_final_model_R2.csv", index_col=0, parse_dates=True)
        pred_train_final_model_R2_df            = pd.read_csv(r"datasets\pred_train_final_model_R2.csv", index_col=0, parse_dates=True)
        pred_test_final_model_RMSE_df           = pd.read_csv(r"datasets\pred_test_final_model_RMSE.csv", index_col=0, parse_dates=True)
        pred_train_final_model_RMSE_df          = pd.read_csv(r"datasets\pred_train_final_model_RMSE.csv", index_col=0, parse_dates=True)


        # ==============================
        # チャートの設定
        #
        plt.style.use('ggplot')
        plt.rcParams['figure.figsize']=100,40
        fig, ax1 = plt.subplots()
        # plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)


        ax1.axvspan(pred_train_final_model_RMSE_df.index[0], pred_train_final_model_RMSE_df.index[-1], facecolor=plt.cm.plasma(20), alpha=0.2)
        ax1.axvspan(pred_test_final_model_RMSE_df.index[0], pred_test_final_model_RMSE_df.index[-1], facecolor=plt.cm.plasma(100), alpha=0.2)

        # >>>>>>>>>>>>>>>>>>>>>>>>>>>
        # candle plot
        #        
        ax1 = self._candle_chart(stock_prices=pred_train_final_model_no_turned_df, ax1=ax1)
        ax1 = self._candle_chart(stock_prices=pred_test_final_model_no_turned_df, ax1=ax1)


        # >>>>>>>>>>>>>>>>>>>>>>>>>>>
        # main plot
        #
        
        ax1.plot(pred_train_final_model_no_turned_df.index, pred_train_final_model_no_turned_df['gt3'], label='GT', color='black')
        # ax1.plot(pred_test_final_model_no_turned_df.index, pred_test_final_model_no_turned_df['gt1'], label='GT', color='green')

        # ax1.plot(pred_test_final_model_no_turned_df.index, pred_test_final_model_no_turned_df['Label'], linewidth = 2.0, label='test_final_model_no_turned', color=plt.cm.plasma(20))
        # ax1.plot(pred_train_final_model_no_turned_df.index, pred_train_final_model_no_turned_df['Label'], linewidth = 2.0, label='train_final_model_no_turned', color=plt.cm.plasma(40))

        ax1.plot(pred_test_final_model_R2_df.index, pred_test_final_model_R2_df['Label'], linewidth = 1.0, label='test_final_model_R2', color=plt.cm.plasma(150))
        # ax1.plot(pred_train_final_model_R2_df.index, pred_train_final_model_R2_df['Label'], linewidth = 1.0, label='train_final_model_R2', color=plt.cm.plasma(100))

        # ax1.plot(pred_train_final_model_RMSE_df.index, pred_train_final_model_RMSE_df['Label'], linewidth = 0.5, label='train_final_model_R2', color=plt.cm.plasma(220))
        # ax1.plot(pred_test_final_model_RMSE_df.index, pred_test_final_model_RMSE_df['Label'], linewidth = 0.5, label='test_final_model_R2', color=plt.cm.plasma(240))



        # ---------------
        # label plot
        #
        # ax1.xaxis.set_major_locator(ticker.MultipleLocator(10))
        # ax1.xaxis.set_minor_locator(ticker.MultipleLocator(5))
        ax1.set_title( "pred_result", fontsize=14)
        print([min(pred_train_final_model_R2_df.index[0], pred_test_final_model_R2_df.index[0]), max(pred_train_final_model_R2_df.index[-1], pred_test_final_model_R2_df.index[-1])])
        ax1.set_xlim([min(pred_train_final_model_R2_df.index[0], pred_test_final_model_R2_df.index[0]), max(pred_train_final_model_R2_df.index[-1], pred_test_final_model_R2_df.index[-1])])
        plt.xticks(rotation=90)
        plt.grid(True)
        plt.legend()
        plt.savefig(self.option.technical_graph_dir + "/pred_result3.png",bbox_inches = 'tight', pad_inches = 0.05)

    # -------------------------------------------------------------------------------------------
    # plot utils
    #
    def _area_plot(self, ax2, technical_df):
        for j, i in enumerate(range(0, len(technical_df.index), 60)):
            if(len(technical_df.index) < i+60):
                ax2.axvspan(technical_df.index[i], technical_df.index[-1], facecolor=plt.cm.jet(j*10), alpha=0.2)
            else:
                ax2.axvspan(technical_df.index[i], technical_df.index[i+60], facecolor=plt.cm.jet(j*10), alpha=0.2)
        return ax2

    def _candle_chart(self, stock_prices, ax1):
        up = stock_prices[stock_prices.Close >= stock_prices.Open]
        down = stock_prices[stock_prices.Close < stock_prices.Open]

        # When the stock prices have decreased, then it
        # will be represented by blue color candlestick
        col1 = 'blue'
        
        # When the stock prices have increased, then it 
        # will be represented by green color candlestick
        col2 = 'red'
        
        # Setting width of candlestick elements
        width = .0006
        width2 = .00006
        
        # Plotting up prices of the stock
        ax1.bar(up.index, up.Close-up.Open, width, bottom=up.Open, color=col1)
        ax1.bar(up.index, up.High-up.Close, width2, bottom=up.Close, color=col1)
        ax1.bar(up.index, up.Low-up.Open, width2, bottom=up.Open, color=col1)
        
        # Plotting down prices of the stock
        ax1.bar(down.index, down.Close-down.Open, width, bottom=down.Open, color=col2)
        ax1.bar(down.index, down.High-down.Open, width2, bottom=down.Open, color=col2)
        ax1.bar(down.index, down.Low-down.Close, width2, bottom=down.Close, color=col2)

        return ax1


    # -------------------------------------------------------------------------------------------
    # BF api
    #
    def run(self):
        pass



if __name__ == '__main__':

    #########################################################
    # 初期化
    #
    parser = argparse.ArgumentParser(description='CryptoAnalysis option')    # 2. パーサを作る
    option = parser.parse_args()

    option.datasets_path        = r"datasets"
    # option.log_dir              = r"H:\マイドライブ\PROJECT\log\XLM"
    option.log_dir              = r"X:\a002_PolaMbot_v4.4\log\XLM"
    option.candle_summary_name1 = "candle_summary_v4.csv"
    option.candle_summary_name2 = "candle_summary_v4.csv"
    option.realtime_total_name  = "realtime_total_v4.csv"
    option.technical_graph_dir  = "technical_graph"
    option.technical_df_name    = "technical_df_v4.csv"
    option.train_df_name        = "train_df_v4.csv"
    option.test_df_name         = "test_df_v4.csv"
    option.split_train_rate     = 0.8

    #########################################################
    # object create 
    #
    CANLS = CryptoAnalysis(option=option)

    # -------
    # load log data
    #
    # CANLS.load_datasets()

    # -------
    # visual summary data
    #
    # CANLS.visual_summary(candle_summary_name=option.candle_summary_name2)

    # -------
    # get datasets
    #
    datasets_df = CANLS.get_datasets(datasets_name=option.candle_summary_name2)

    # -------
    # get datasets
    #
    CANLS.create_technical_indicators(technical_df=datasets_df)


    # -------
    # plot
    #
    # CANLS.pred_result()

6.2. フォルダ構成

フォルダ構成はこのようになっています.
H:.
│  README.md
│  RealtimeBFlyInago.py ← データを取得するプログラムです.(3)
.gitignore
│  log_config.json
│  CryptoAnalysisBeta.py  ← データを解析するプログラムです.(4~5)
├─modules
│  │  GetInago.py
│  │  
│  ├─chromedriver_win32
│  │      chromedriver.exe
│  │      
│  ├─chromedriver_linux64
│  │      chromedriver
│  │      
│  ├─chromedriver_linux64_v101
│  │      chromedriver
│  │      
├─memo
├─download
│      TA_Lib-0.4.24-cp39-cp39-win_amd64.whl
│      TA_Lib-0.4.24-cp38-cp38-win_amd64.whl
├─datasets
│  │  candle_summary.csv
│  │  realtime_total.csv
│  │  realtime_total_v1.csv
│  │  candle_summary_v1.csv
│  │  technical_df.csv
│  │  train_df.csv
│  │  test_df.csv
│  │  
├─model
│      my_best_pipeline.pkl
│      my_best_pipeline_dumy_RMSE_v5_Low.pkl
│      my_best_pipeline_dumy_R2_v5_Low.pkl
│      my_best_pipeline_dumy_RMSE_v5_High.pkl
│      my_best_pipeline_dumy_R2_v5_High.pkl
├─config
│      config.ini
├─__pycache__
│      CryptoAnalysis.cpython-38.pyc
└─graph
    ├─technical_graph
    │      bollinger_band.png
    │      pred_result.png
    │      pred_result2.png
    │      pred_result.pdf
    │      SMA.png
    │      pred_result3.png
    │      candle_summary.png
    │      candle_summary - コピー.png

class: center, middle

7. 価格推定モジュール


class: center, middle

8. バックテスト環境


class: center, middle

9. 実稼働環境


Discussion

コメントにはログインが必要です。