Finding Data in the NHL API

So I decided to retake my NHL API video to try to improve things (now that I know how to get my audio better) and maybe make it a little easier to digest. Turns out even a 3 and a half minute video takes hours to get just right and somehow I still am not totally satisfied with it. I think perhaps this may be something I spend more time on in the near future to try to provide people with a little bit of educational material. In the meantime feel free to go have a watch of Finding Data in the NHL API over on Odysee

PaaS Frustrations

So after several days working with the Support folks at Digital Ocean they finally nailed down why my deploys were never getting new code and I am still not clear on the reason it was an issue to begin with but I figured it needs to be documented to maybe save someone else the trouble.

The Details

My code is a mix of Python (and some HTML/JS) using Flask, Python version in this specific situation was 3.8.2 (at least locally anyway). Using the Digital Ocean App platform and a domain hosted through them as well (hockey-info.online). Docker version locally was 20.10.2, build 2291f61 on a Fedora 32 based system.

The Problem

No code changes I made after Jan 15th seemed to be pulled when doing a deploy, deploys would trigger properly however they never got the correct code just the correct commit sum. I tried manual deploys, I tried automatic, I searched the internet high, low and inbetween but couldn’t figure it out.

Solution (Eventually)

I finally broke down and opened a ticket with DO on a Wednesday, after going back and forth with their support people and trying a lot of things they finally informed me of a solution on the Tuesday after. It seems that in the Dockerfile I was doing a RUN git clone https://gitlab.com/dword4/hockey-info.git . which was running the git command to pull code down outside of the methods used by the App platform. The fix turns out was as simple as replacing that line with COPY . /hockey-info and then pushing the code up.

Still not entirely sure why this functions this way, there appears to be some kind of git caching going on but I have no real insight as to why, probably due to how the app platform is built.

Monitoring on AWS with CloudWatch Agent and Procstat

Objective: Install CloudWatch Agent with procstat on an EC2 instance and configure a metric alarm in CloudWatch

One of the first issues I ran into was with IAM Policies, or lack thereof . Specifically it was the managed policy CloudWatchAgentServerPolicy which needed to be added. The telltale sign that you forgot to add this Policy is an error message in the Agent logs, seen below

2020-08-17T22:46:18Z E! refresh EC2 Instance Tags failed: NoCredentialProviders: no valid providers in chain<br>caused by: EnvAccessKeyNotFound: failed to find credentials in the environment.

The procstat plugin fortunately is already part of the Agent from install, but it still needs to be configured. In order to do this you have to add a configuration file specific to your monitoring needs. For old school admins the easiest way to think of procstat is that it basically ties into the ps tool. It’s like doing a `ps -ef | grep` to find something about a running process.

[root@lab-master amazon-cloudwatch-agent.d]# pwd
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d
[root@lab-master amazon-cloudwatch-agent.d]# cat processes
{
    "metrics": {
        "metrics_collected": {
            "procstat": [
                {
                    "pattern": "nginx: master process /usr/sbin/nginx",
                    "measurement": [
                        "pid_count"
                    ]
                }
            ]
        }
    }
}

This will get us far enough that now we can see values in the Metrics view of CloudWatch. Once we have data there its time to construct a metric alarm. My goal was to use Terraform however its less painful to do in the AWS console.

resource "aws_cloudwatch_metric_alarm" "nginx-master" {
  alarm_name = "nginx master alarm"
  comparison_operator = "LessThanThreshold"
  evaluation_periods = 1
  datapoints_to_alarm = 1
  metric_name = "procstat_lookup_pid_count"
  namespace = "CWAgent"
  period = "300"
  statistic = "Average"
  threshold = "1"
  alarm_description = "Checks for the presence of an nginx-master process"
  alarm_actions = [aws_sns_topic.pagerduty_standard_alarms.arn]
  insufficient_data_actions = []
  treat_missing_data = "missing"
  dimensions = {
    "AutoScalingGroupName" = "some-ASG-YXI8VDT6MBE3"
    "ImageId"       = "some-ami"
    "InstanceId"    = "some-instance-id"
    "InstanceType"  = "t3a.large"
    "pattern"       = "nginx: master process /usr/sbin/nginx"
    "pid_finder"    = "native"
  }
}

The alarm creation proved to be a lot harder than I had expected, taking up several hours. I had to re-create things in my lab setup twice and do a Terraform import. The problem turned out to be that the dimensions{} block is not optional despite what the Terraform docs say. Had it said the fields were all required I probably would have saved days of time.

Polish Work

In the process of working things out I hard coded a lot of values in the Dimensions {} block. Naturally that is not good practice, especially with IaaS so I will need to rework it to use variables instead. Also the alarm names should utilize the Terraform workspace values for better naming.

Terraform – Reference parent resources

Sometimes things get complicated in Terraform, like when I touch it and make a proper mess of the code. Here is a fairly straight forward example of how to reference parent resources in a child.

├── Child
│   └── main.tf
└── main.tf

1 directory, 2 files
$ pwd
/Users/dword4/Terraform

First lets look at what should be in the top level main.tf file, the substance of which is not super important other than to have a rough idea of what you want/need

provider "aws" {
  region = "us-east-2"
  profile = "lab-profile"
}

terraform {
  backend "s3" {}
}

# lets create an ECS cluster

resource "aws_ecs_cluster" "goats" {
  name = "goat-herd"
}

output "ecs_cluster_id" {
  value = aws_ecs_cluster.goats.id
}

What this does simply is create an ECS cluster with the name “goat-herd” in us-east-2 and then outputs ecs_cluster_id which contains the ID of the cluster. While we don’t necessarily need the value outputted visually to us, we need the output because it makes the data available to other modules including child objects. Now lets take a look at what should be in Child/main.tf

provider "aws" {
  region = "us-east-2"
  profile = "lab-profile"
}

terraform {
  backend "s3" {}
}
module "res" {
  source = "../../Terraform"
}
output "our_cluster_id" {
  value = "${module.res.ecs_cluster_id}"
}

What is going on in this file is that it creates a module called res and sources it from the parent directory where the other main.tf file resides. This allows us to reference the module and the outputs it houses, enabling us to access the ecs_cluster_id value and use it within other resources as necessary.

Close Bitnami banner
Bitnami