DevOps , SRE
Docker , Kubernetes
2019. 6. 3. 21:53
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do you determine what your priorities are?
Design for workload insights
How do you design your workload so that you can understand its state?
Development and Integration
How do you reduce defects, ease remediation, and improve flow into production?
Mitigation of deployment risks
How do you mitigate deployment risks?
How do you know that you are ready to support a workload?
Effective preparation is required to drive operational excellence Business success is enabled by shared goals and understanding across the business development and operations Common standards simplify workload design and management enabling operational success Design workloads with mechanisms to monitor and gain insight into application platform and infrastructure components as well as customer experience and behavior …
How do you understand the health of your workload?
How do you understand the health of your operations?
How do you manage workload and operations events?
Successful operation of a workload is measured by the achievement of business and customer outcomes Define expected outcomes determine how success will be measured and identify the workload and operations metrics that will be used in those calculations to determine if operations are successful Consider that operational health includes both the health of the workload and the health and success of the operations acting upon the workload for example deployment and incident response Establish baselines from which improvement or degradation of operations will be identified collect and analyze your metrics and then validate your understanding of operations success and how it changes over time Use collected metrics to determine if you are satisfying customer and business needs and identify areas for improvement …
How do you evolve operations?
Evolution of operations is required to sustain operational excellence Dedicate work cycles to making continuous incremental improvements Regularly evaluate and prioritize opportunities for improvement for example feature requests issue remediation and compliance requirements including both the workload and operations procedures Include feedback loops within your procedures to rapidly identify areas for improvement and capture learnings from the execution of operations …
. 비즈니스 가치를 제공하기 위해 시스템을 모니터링하고 운영할 수 있는 능력 . 운영 지원을 위한 프로세스와 절차를 지속적으로 향상해서 제공하는 능력
How do you manage credentials and authentication?
How do you control human access?
How do you control programmatic access?
Identity and access management are key parts of an information security program ensuring that only authorized and authenticated users are able to access your resources and only in a manner that you intend For example you should define principals that is users groups services and roles that take action in your account build out policies aligned with these principals and implement strong credential management These privilege management elements form the core of authentication and authorization …
& Access Management
How do you detect and investigate security events?
How do you defend against emerging security threats?
You can use detective controls to identify a potential security threat or incident They are an essential part of governance frameworks and can be used to support a quality process a legal or compliance obligation and for threat identification and response efforts There are different types of detective controls For example conducting an inventory of assets and their detailed attributes promotes more effective decision making and lifecycle controls to help establish operational baselines You can also use internal auditing an examination of controls related to information systems to ensure that practices meet policies and requirements and that you have set the correct automated alerting notifications based on defined conditions These controls are important reactive factors that can help your organization identify and understand the scope of anomalous activity …
How do you protect your networks?
How do you protect your compute resources?
Infrastructure protection encompasses control methodologies such as defense in depth necessary to meet best practices and organizational or regulatory obligations Use of these methodologies is critical for successful ongoing operations in either the cloud or on premises …
How do you classify your data?
Data protection at rest
How do you protect your data at rest?
Data protection in transit
How do you protect your data in transit?
Before architecting any system foundational practices that influence security should be in place For example data classification provides a way to categorize organizational data based on levels of sensitivity and encryption protects data by way of rendering it unintelligible to unauthorized access These tools and techniques are important because they support objectives such as preventing financial loss or complying with regulatory obligations …
How do you respond to an incident?
Even with extremely mature preventive and detective controls your organization should still put processes in place to respond to and mitigate the potential impact of security incidents The architecture of your workload strongly affects the ability of your teams to operate effectively during an incident to isolate or contain systems and to restore operations to a known good state Putting in place the tools and access ahead of a security incident then routinely practicing incident response through game days will help you ensure that your architecture can accommodate timely investigation and recovery …
. 위험 평가 및 완화 전략을 통해 비즈니스 가치를 제공하면서 정보, 시스템 및 자산을 보호할 수있는 능력
How do you manage service limits?
How do you manage your network topology?
Before architecting any system foundational requirements that influence reliability should be in place For example you must have sufficient network bandwidth to your data center These requirements are sometimes neglected because they are beyond a single project s scope This neglect can have a significant impact on the ability to deliver a reliable system In an on premises environment these requirements can cause long lead times due to dependencies and therefore must be incorporated during initial planning …
How does your system adapt to changes in demand?
How do you monitor your resources?
How do you implement change?
Being aware of how change affects a system allows you to plan proactively and monitoring allows you to quickly identify trends that could lead to capacity issues or SLA breaches In traditional environments change control processes are often manual and must be carefully coordinated with auditing to effectively control who makes changes and when they are made …
How do you back up data?
How does your system withstand component failures?
How do you test resilience?
How do you plan for disaster recovery?
In any system of reasonable complexity it is expected that failures will occur It is generally of interest to know how to become aware of these failures respond to them and prevent them from happening again …
. 인프라 또는 서비스 중단으로 보터 시스템을 복구하는 능력 . 요구사항을 충족하기 위해 동적으로 컴퓨팅 리소스를 확보하는 능력 . 잘못된 구성이나 일시적인 네트워크 문제와 같은 중단을 완화 할수 있는 능력
How do you select the best performing architecture?
How do you select your compute solution?
How do you select your storage solution?
How do you select your database solution?
How do you configure your networking solution?
The optimal solution for a particular system will vary based on the kind of workload you have often with multiple approaches combined Well architected systems use multiple solutions and enable different features to improve performance …
How do you evolve your workload to take advantage of new releases?
When architecting solutions there is a finite set of options that you can choose from However over time new technologies and approaches become available that could improve the performance of your architecture …
How do you monitor your resources to ensure they are performing as expected?
After you have implemented your architecture you will need to monitor its performance so that you can remediate any issues before your customers are aware Monitoring metrics should be used to raise alarms when thresholds are breached The alarm can trigger automated action to work around any badly performing components …
How do you use tradeoffs to improve performance?
When you architect solutions think about tradeoffs so you can select an optimal approach Depending on your situation you could trade consistency durability and space versus time or latency to deliver higher performance …
. 시스템의 요구사항을 만족하기 위해 컴퓨팅 리소스를 효과적으로 사용하는 능력 . 수요 변화 및 기술의 발전에 따른 효율성을 유지 할 수 있는 능력
How do you govern usage?
Usage and cost monitoring
How do you monitor usage and cost?
How do you decommission resources?
The increased flexibility and agility that the cloud enables encourages innovation and fast paced development and deployment It eliminates the manual processes and time associated with provisioning on premises infrastructure including identifying hardware specifications negotiating price quotations managing purchase orders scheduling shipments and then deploying the resources However the ease of use and virtually unlimited on demand capacity requires a new way of thinking about expenditures …
How do you evaluate cost when you select services?
Resource type and size selection
How do you meet cost targets when you select resource type and size?
Pricing model selection
How do you use pricing models to reduce cost?
Data transfer planning
How do you plan for data transfer charges?
Using the appropriate instances and resources for your workload is key to cost savings For example a reporting process might take five hours to run on a smaller server but one hour to run on a larger server that is twice as expensive Both servers give you the same outcome but the smaller server incurs more cost over time …
Matching supply with demand
How do you match supply of resources with demand?
Optimally matching supply to demand delivers the lowest cost for a workload but there also needs to be sufficient extra supply to allow for provisioning time and individual resource failures Demand can be fixed or variable requiring metrics and automation to ensure that management does not become a significant cost …
supply & demand
New service evaluation
How do you evaluate new services?
AWS에서 새로운 서비스와 기능을 발표함에 따라 기존 아키텍처에 대한 결정을 검토하여 비용 대비 효과가 계속 유지되도록하는 것이 가장 좋습니다. 더 이상 필요하지 않은 서비스 및 시스템 전체를 자원을 폐기하는 데 적극적으로 요구 사항이 변경됩니다.
. 최저 비용으로 비즈니스 가치를 제공하는 시스템을 운영할 수 있는 능력
' 카테고리의 다른 글
Drills Down On Cloud Adoption And Amazon’s Culture
Running Thousands of KVM Guests on Amazon's new i3.metal Instances
Firecracker로 VM을 매우 빠르고 가볍게 띄워보자.
신규 AWS 비용 계산기 AWS Calculator
NFS Performance Test with Amazon EFS
+ Recent posts
, Designed by