在云上面管理资源并推送告警到Dev总是一件痛苦的事情「虽然基础告警确实不用怎么弄了」,AWS Tagging Strategies启发了我们「其实是公司大佬指导的」,接着就想着怎么在生产环境中使用,总不能ops同学总是自己收接着转发吧「成人肉Agent了」。
Tags是啥
Amazon Web Services (AWS) allows customers to assign metadata to their AWS resources in the form of tags. Each tag is a simple label consisting of a customer-defined key and an optional value that can make it easier to manage, search for, and filter resources. Although there are no inherent types of tags, they enable customers to categorize resources by purpose, owner, environment, or other criteria. This webpage describes commonly used tagging categories and strategies to help AWS customers implement a consistent and effective tagging strategy. The following sections assume basic knowledge of AWS resources, tagging, detailed billing, and AWS Identity and Access Management (IAM).
优点
资源所属、资源间关系明确
后期统计花费清晰「那个App、那个Team用了多少钱」
告警精准制导
When creating a tagging strategy for AWS resources, make sure that it accurately represents organizationally relevant dimensions and adheres to the following tagging best practices:
Always use a standardized, case-sensitive format for tags, and implement it consistently across all resource types.
Consider tag dimensions that support the ability to manage resource access control, cost tracking, automation, and organization.
Implement automated tools to help manage resource tags. The Resource Groups Tagging API enables programmatic control of tags, making it easier to automatically manage, search, and filter tags and resources. It also simplifies backups of tag data across all supported services with a single API call per AWS Region.
Err on the side of using too many tags rather than too few tags.
Remember that it is easy to modify tags to accommodate changing business requirements, however consider the ramifications of future changes, especially in relation to tag-based access control, automation, or upstream billing reports.
难点
查缺补漏明确资源Tags「苦力活」
Api Limit问题脚本调用需要注意异常处理「Limits」
明确Tags指标「App Team Owner Environments 自己发挥」
实现
打完Tag之后「aws-tagger」,根据Rds信息自动化添加Alarm「Python Script已经直接实现把tag补到Cloudwatch中」到Cloudwatch「Aws最挫的应用」,Alarm Export「稍作修改添加tags信息」推送到Prometheus后根据不同的Tags推送到后端告警系统。
Tags Export是拉取所有资源的tags情况「修改了下aws_tags_exporter这个东西没有做分页,支持elb,elbv2,efs,route53,elasticache,rds,autoscaling,ec2,dynamodb」,这个是方便后续所有资源待tags梳理完善后需要使用的模块。
Python Script Get Rds info落地到Rds是为了后续数据库基础信息准备「发挥下想象力这些信息日后将发挥巨大的作用」,并且实现Info Export推送到Prometheus「Aws Api有限流、也要钱这个也是为了防止异常与省钱搞的」。
告警系统会直接推送信息到Dev,信息中包含Grafana图片地址,方便查询,至此告警由Dev与DBA通过接受,Dev作为第一处理人「DBA想躲起来」。「手撸2个Python 1个Golang 二次开发2个Golang 还在学Go中...」
启发
其实不管是用云还是自建IDC「IDC就需要在主机名上面动手脚了」,Tags对于Ops来说都是非常重要的信息「要不我们搞CMDB干嘛」,方便明确资源归属、成本结算、利用率统计、自动化各种。
有疑问加站长微信联系(非本文作者)