By: Rick Dobson | Updated: 2021-08-30 | Comments | Related: > Python
Problem
Demonstrate how to plot in Python financial time series from SQL Server. Use the plots to reveal the value of exponential moving averages with different period lengths to make decisions about time series values.
Solution
A popular aphorism about graphics is that a picture is worth a thousand words. It is also widely believed that facts are more impactful and memorable when accompanied by charts that reinforce key points. However, SQL Server does not enable the generation of charts. This tip shows how to make facts about time series data in a SQL Server database easier to grasp and recall when the time series data and exponential moving averages for the data are plotted in an accompanying Python application.
Data scientists often use charts as a key tool for exploratory data analysis. For example, it is possible to use charts to validate or reject key assumptions about data and to convey the findings about those assumptions to other project members as well as to decisionmakers. Also, charts for data in a SQL Server instance can provide guidance about the best next steps in a data modeling project. Database professionals can use charts to highlight the insights provided by data sources they manage.
To run the code samples presented in this tip, you will need Python and a couple of external libraries. The Python programming application and the external libraries are available without charge.
- You can download Python from this link. You can use whatever developer tool you wish for opening and modifying the Python script files presented in this tip. However, this tip achieves these goals with IDLE, an integrated developer environment, for reading, editing, and running Python code. IDLE downloads automatically with the installation of Python from the Python.org website.
- One of the external libraries featured in this tip is the Pandas library, which is available from this link. This tip illustrates how to transfer data from a csv file, such as one written from SQL Server, into a Pandas dataframe for processing with Python code.
- Matplotlib is the other external Python library applied in this tip. This library is a powerful and popular charting package for Python programmers. You can find installation instructions for Matplotlib from this link.
The SQL Server source data table for this tip
This tip commences by reviewing its source data. The source data table (see the criterion in the following script) comes from a database named DataScience. Here is a script to display the table's column names and their data types.
use DataScience GO -- display column names and data types in source data table select column_name, data_type from INFORMATION_SCHEMA.COLUMNS where table_name = N'ema_values_for_period_lengths_by_symbol_and_date'
Here is the output from the preceding script.
- The date column holds trading dates for the financial data. Time series data always require a column of datetime values.
- The symbol column holds symbols representing the securities tracked by the financial data.
- The close column contains close prices on trading dates for stock symbols.
- The remaining columns hold exponential moving averages for the closing prices of a symbol across trading dates. Each of these columns is for an exponential moving average with a different period length. The period lengths are: 3, 8, 10, 20, 30, 50, and 200.
The next script displays the distinct symbols from the source data table.
-- display distinct symbols select distinct symbol from [DataScience].[dbo].[ema_values_for_period_lengths_by_symbol_and_date]
There are just six symbols in this source table: SDOW, UDOW, TQQQ, SPXL, SQQQ, and SPXS. Each of these symbols is for an ETF security that tracks a stock market index. This tip demonstrates processing techniques for the TQQQ symbol, which tracks the leading stocks traded on the NASDAQ stock exchange. The time series data for all six symbols is available in this tip's download should you care to apply the techniques demonstrated in this tip to other symbols.
Here are a couple of SQL statements for reporting on the dates in the time series data.
-- display selected date info select min(date) start_date, max(date) end_date, count(distinct date) distinct_dates from [DataScience].[dbo].[ema_values_for_period_lengths_by_symbol_and_date] -- display years select distinct year(date) years from [DataScience].[dbo].[ema_values_for_period_lengths_by_symbol_and_date] order by year(date)
The data for each symbol starts on 2018-01-02 and ends on 2021-05-28. There are a total of 858 trading dates over this trading date range. The trading date range provides data for the years from 2018 through 2020 and for a portion of the trading dates in 2021.
This tip focuses on the data from just one year (2020), but you can practice the processing techniques for the other trading dates with the help of a csv file for all the data in the tip's download.
A Python script to read a csv file and plot a segment of the data
In this tip, you will find demonstrations of how to create, save, and display four time series plots. The processing is performed by a Python script that reads a csv file based on the SQL Server source described in the preceding section. After the data are read and processed to populate a Pandas dataframe in its first segment, the data are subset and then plotted in its second segment. A line of single-line comment markers (###...) separates the first and second segments.
Here is an image of the Python script for reading, subsetting, and plotting data in the first time series plot. The image is from the IDLE application.
- Before either script segment, the code invokes two import statements – one for the Matplotlib library and the other for the Pandas library.
- The first segment has three major purposes.
- Read the csv file into a Pandas dataframe.
- Add two new columns to the dataframe to facilitate subsetting the dataframe for the data to be plotted.
- Print the dataset in the dataframe.
- The read_csv function reads a csv file. One column of string values
representing dates is parsed to Python datetime values.
- The read_csv function must specify a location for the csv file. This function belongs to the Pandas library.
- An optional parameter instructs Pandas to convert one column directly to Python datetime values; the designated column has the name date in the script below.
- The output from the read_csv function in the script below is a Pandas dataframe named df.
- Through the Pandas (pd) to_datetime method two new columns are added to the df dataframe. The column names are year and month. These two new columns facilitate subsetting the df dataframe to a second smaller dataframe for plotting in the second segment.
- Before invoking the print function for displaying all the df dataframe rows
and columns, two additional Python methods are invoked.
- By default, the Python print function displays just a subset of the rows and columns from a dataframe.
- The additional Pandas methods cause the print function to display all rows and columns from a dataframe.
- The second segment has two major purposes.
- It subsets the df dataframe to a second smaller dataframe that is plotted.
- The query method for the df dataframe facilitates the subsetting of the df dataframe into the df2 dataframe.
- You can specify the arguments for the Pandas query method with criteria
that have a similar format to where clauses in SQL Server select statements.
For example, the criteria for the first chart pulls data
- for year 2020
- for the TQQQ symbol
- and through the first four months of the year
- It also invokes a series of Matplotlib statements to specify a chart to save and display.
- It subsets the df dataframe to a second smaller dataframe that is plotted.
- The specification, saving, and displaying of the chart appear in this order.
- The plt.style.use(‘seaborn') statement designates a popular and powerful style for displaying Matplotlib charts.
- The next two statements use Pandas syntax for extracting values from df2 – namely x values (price_date) and y values (price_close) for a line chart.
- The plt.plot_date statement assigns the price_date and price_close values, respectively, to the x and y values of a line chart. The linestyle parameter specifies a connector, namely a solid line, between the y values in the chart.
- The plt.gcf().autofmt_xdate statement allows Matplotlib to make the date values appear diagonally on the x axis. This can improve the readability of date values along the x axis.
- The plt.suptitle statement specifies the main chart title.
- The plt.title statement specifies the chart's subtitle.
- The plt.savefig statement designates a file name and path in which to save the chart's image.
- The plt.show statement displays the chart image in a dialog box . Closing the dialog box ends the Python session.
Here is the image presented by the plt.show statement in the preceding script.
- As you can see, the peak TQQQ close price occurs slightly below $60 per share around the middle of February (2020-02-15).
- Then, the close price drops precipitously through several days past the middle of March (2020-03-15).
- Close prices begin to rise again after this point.
- This chart shows three trends in the close price data
- From the beginning of 2020, up through the peak close price
- From the peak close price through the trough close price
- A reversal from the trough price back up towards higher close prices
- While there are three general trends to the close prices, prices do not
move consistently from one trading day to the next.
- For example, from the peak close price through the trough price, close prices generally decline
- However, there are also several reversals between successive trading days on the way from the peak close price through the trough close price
The next screen shot is from the listing of df values at the end of first segment. The values in the listing are precise based on the source data from the csv file. The preceding chart shows (x,y) coordinate values relative to grid lines within the chart. Unless a coordinate value falls precisely on a grid line, the chart shows the coordinate value approximately.
- Recall that this listing is for all dates for all six symbols.
- The first column is a sequential number starting at zero in the df dataframe's first row.
- The second column is for the trading date.
- The third column is for the stock symbol.
- The fourth column is for the close price.
- The selected rows appearing with a shade are from the peak close price (59.0299 on February 19, 2020) through the trough price (17.7549 on March 20, 2020).
- After March 20, 2020, close prices begin to start recovering from the trough value.
A Python script to plot all close prices in a year for a symbol
The preceding time series chart for TQQQ close prices offered an in-depth view of changes during February and March 2020 as well as the preceding and trailing months. For no preceding year going back to 2011 were the TQQQ close price declines during February and March greater than in 2020. The time series chart presented in this section gives a broader perspective of TQQQ throughout all the trading days in 2020. Here is what the chart looks like.
- You can, again, see all precipitous decline in TQQQ close prices from the second half of February 2020 through the beginning of the second half of March 2020.
- However, the time series chart in this section shows that by the end of the year in 2020, the TQQQ close price soars by more than 50% above the peak TQQQ close price in February 2020 before the start of the historic drop in February and Mach 2020.
- There are other less pronounced TQQQ close price swings from the beginning of September 2020 through the end of 2020.
- Clearly, there are multiple date ranges where prices are in an uptrend and other date ranges where prices are in a downtrend. Time series charts are a useful data visualization tool for identifying upward and downward trending close price periods.
The code for the time series chart in this section is nearly identical to the script in the preceding section. There are just three changes to the IDLE window between the Python time series chart scripts for each section. These changes are highlighted and numbered by red bullets in the following screen shot.
- The filename containing the script for the screen shot in this section is
ts_Figure_2.py while the filename in the script in preceding section is ts_Figure_1.py.
- Python script files have a file extension of .py.
- The difference in filenames for each section allows the operating system to track which file is for which time series chart.
- The criteria for the query function is different between the scripts for
this section and the preceding section.
- The criteria for the script in this section pulls all dates and close prices for the TQQQ symbol in 2020 ((year == 2020) and (symbol == "TQQQ")).
- The criteria for the script in the preceding section pulls only dates and close prices from the beginning of January 2020 through the end of April 2020 for the TQQQ symbol ((year == 2020) and (symbol == "TQQQ") and (month <= 4)).
- The third difference between the scripts for the two sections is the filename
for the saved chart image.
- The filename for the time series chart saved from this section has the name ts_Figure_2_programmatically_saved.png.
- The filename for the time series chart saved from the preceding section has the name ts_Figure_1_programmatically_saved.png.
- Recall that the chart image is saved by the plt.savefig statement. This statement invokes the savefig() function from the Matplotlib library.
- Like the plt.savefig statement, the plt.show() statement displays a chart image based on the preceding statements in a Python script, but the plt.show() statement does not need a name that is distinct from other chart image files previously saved to the hard drive. Therefore, the show() function output appears within a dialog box named Figure 1 in this section and the preceding section.
- Matplotlib has a capability to display multiple charts from a single Python script with a single invocation of the plt.show() function. See this link for a demonstration of this capability.
A Python script to contrast long-period length moving averages with price performance
When plotting time series data, it is sometimes useful to display multiple time series within a single chart. This kind of chart can contrast the efficacy of two or more technical indicators for projecting future price performance. For example, it is sometimes the case that when an exponential moving average with a relatively short period length drops below another exponential moving average with a longer period length, then the symbol's close price is likely to continue declining.
The ema_50 and ema_200 are two exponential moving averages that are sometimes contrasted with one another by stock trading analysts. The ema_50 has a 50-day period length, and the ema_200 has a 200-day period length. Both period lengths are relatively long in comparison to other exponential moving averages that stock trading professionals typically monitor.
- When the ema_50 for close prices rises from below to above the ema_200, there is a cross-over of the moving averages. This cross-over is commonly called the "golden cross" because it can signal that a symbol's price may continue rising in the future.
- Conversely, when the ema_50 for close prices drops from above to below the ema_200, this is a technical indicator that a symbol's close price is likely to continue declines. This kind of pattern is sometimes referred to as a "death cross".
By plotting close price, ema_50, and ema_200 in the same time series chart you can visually confirm if a golden cross or a death cross is yielding the expected outcome. The script in this section shows how to compare close prices over time with two exponential moving averages – namely, ema_50 and ema_200. The following script shows how to create a time series chart that compares movements over time for close prices versus ema_50 and ema_200. This section's code follows the same general pattern as the one displayed in the preceding section, but there are also some important additions.
- The filename for the Python script in this section has the name line_chart_3.py instead of ts_Figure_2.py. As you might expect, the name for a chart is arbitrary.
- The script for this file defines two new time series named ema_50 and ema_200. These two newly named time series are added to the price_close time series values from the ts_Figure_2.py file.
- The plt.plot_date statement for the price_close time series is augmented with additional settings to distinguish the price_close time series from the ema_50 and ema_200 time series. The new settings are for date markers, line labels, and line color.
- Just like the price_close time series, the two newly named time series (ema_50 and ema_200) each have plt.plot_date statements that specify the appearance of the lines for each of the new time series.
- The plt.legend() function adds a legend to the chart that helps to identify which line is for which time series.
- The plt.savefig() function specifies the filename and path for the saved chart image based on the settings in the script.
- The plt.show() at the end of the script displays in a dialog box the chart image specified by the preceding lines of code in the script.
Here is the chart image that appears from the plt.show() function statement in the preceding script.
- There are three lines in the chart.
- The line with black markers is for the close price time series values.
- The line made up of solid red circles is for the ema_50 time series values.
- The line with a dotted line style is for the ema_200 time series values.
- There are also two bullets with values of 1 and 2 that are added to the
chart from Matplotlib.
- The bullet with a value of 1 denotes a death cross because the ema_50 line crosses from above to below the ema_200 line.
- The bullet with a value of 2 denotes a golden cross because the ema_50 line crosses from below to above the ema_200 line.
- The golden cross seems to generally support its role of pointing to higher future values. This is because the line of black square markers generally rises after the bullet with a value of 2.
- The death cross seems to fail in fulfilling its role of pointing to lower future values. This is because all the major pandemic-related decline in close price, which occurs from late February 2020 through early to mid-March 2020, is completed by the time the death cross occurs in late March to early April 2020.
Individual technical indicators for stock market performance are widely known to be imprecise. Therefore, it is often helpful to examine multiple indicators when trying to anticipate future price performance. For example, another popular technical indicator is price falling below a support line, such as the ema_50 line.
- Selling TQQQ shares after their close price falls below their ema_50 line would have avoided much of the TQQQ close price decline from late February 2020.
- Then, re-purchasing TQQQ shares after the golden cross would have successfully captured a new high for the year by the end of 2020.
- It would be prudent to verify the robustness of these guidelines before trading based on them. For example, do the guidelines project valid outcomes for different time periods with the TQQQ symbol as well as other symbols?
A Python script to contrast short-period length moving averages with price performance
Instead of comparing the ema_50 and ema_200 time series to the close price time series, you can compare two exponential moving averages with much shorter period lengths. It is well known that moving averages with shorter period lengths have time series that more closely match on average the trend of the underlying time series. Because moving averages with shorter time periods are closer to the underlying time series than moving averages with longer period lengths, it is possible to detect trend changes faster when using moving averages with shorter period lengths.
This section performs the same analysis as in the previous section, except that it replaces
- the 200-day period length with a 10-day period length moving average and
- the 50-day period length with a 3-day period length moving average
The Python script for this section is nearly identical to the script in the previous section. There are just four changes. The following script shows the script for this section in an IDLE window. The changes are identified by numbered bullets.
- The first bullet is for the script filename, which is line_chart_4.py in this section versus line_chart_3.py in the previous section.
- The second bullet is for the assignment of time series values from the filtered dataframe (df2) at the top of the second segment.
- The third bullet is for the assignment of linestyle, marker, label, and color properties to the lines for ema_3 and ema_10.
- The fourth bullet is for the filename (ts_Figure_4_programmatically_saved.png) and path (C:\\python_programs_output\\ts_plots\\) for the file saved by the savefig method for the plt object in the script file.
Here is the chart generated by the preceding script.
- The close time series is represented by a solid line with black square markers to identify each trading day.
- The ema_3 time series is represented by a line with solid red circle markers for trading days.
- The ema_10 time series is represented by a dotted line with small plus markers for trading days.
Numbered bullets on the chart denote key transition points in trend for one or more of the time series.
- Bullet 1 identifies the peak close price before the start of the close price drop commencing in middle to late February.
- Bullet 2 depicts the trading day on which the ema_3 time series value initially drops below close price time series.
- Bullet 3 is around a date when the ema_3 time series rises above the ema_10 time series.
- The uptrend starting around bullet 3 extends to bullet 4 when the ema_3 time series is slightly below the close price time series but above the ema_10 time series.
- There are a lot of quickly alternating short uptrend and downtrend periods until around the fifth bullet where the ema_3 time series clearly rises above and stays mostly above the ema_10 time series until the end of 2020 in the chart.
Next Steps
This tip's download file contains six files to help you get a hands-on feel for implementing the Python solutions in this tip.
- A key file is a csv file with the sample data referenced by the Python script files. Full coverage of how to compute and save exponential moving averages for time series data with SQL code in SQL Server is provided in this previous tip.
- There are also four Python script files – one for each of the time series charts presented in this tip. Each Python script file has a .py extension.
- There are also four .png files with the chart images that were saved by the scripts in this tip. You can store chart images from Matplotlib in bitmap format (for example, .png or .jpg) or in vector-based format (for example, .pdf or .svg). A good tutorial on how to use either of these output formats from Matplotlib is available from this link.
After verifying the code works as described with the sample data file, you are encouraged to use other date ranges and/or symbols than those reported on in this tip. Also, you can gather your own time series data and compute your own exponential moving averages for plotting purposes.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2021-08-30