AWS Certified Advanced Networking Prep – VPC
This post is part of a multi-series blog to help folks prepare to take the AWS Certified Advanced Networking Exam. This section is dedicated to VPC and its components that you’ll need to know for the test. The previous posts can be found here on Direct Connect and VPN.
When you create a new account in AWS, you will find that AWS establishes a default VPC for you. This default VPC is mostly for beginners to get started with consuming services quickly without having to know a lot about networking. Each default VPC will have an IP subnet of 172.31.0.0/16, and you’ll generally find default subnets as well.
When a VPC is created, there is a VPC router that routes traffic between any subnets that will be created within that VPC. If you recall, AWS reserves (5) IP addresses of each subnet created, one of which is for the VPC router and is represented by the .1 address of each subnet. Although you can’t “see” the VPC router in the console, it is represented in route tables as “local.” It’s also important to note that, by default, all subnets can communicate with each other within a VPC.
To enable communication between multiple VPCs, you can use a native service called VPC peering. The allows cross-VPC connections between VPCs within the same account or VPCs across accounts. It’s important to note here that until the announcement at re:Invent 2017, VPC peering worked only within the same region. When peering, one VPC will be the “local” VPC, and the peered VPC is referred to as the “remote” VPC. The peering request is submitted by the user and must be accepted by the owner of the remote VPC, even if that VPC is in the same account. The Peer-ID will be defined as “pcx-xxxxxx.” If you are peering to a different account, you must specify the VPC and the account number to properly request the peer. Once VPCs peer, you can reference security groups from the other VPC for resources in the local VPC, making it easier to write rules for cross-VPC traffic. In a single VPC, if the public DNS name is also used internally, a DNS query for that resources will result in the private IP. However, a peered VPC would resolve the public IP and will NOT traverse the VPC peer, possibly causing unneeded outbound data charges (which are higher than cross-VPC). To fix this, you need to allow DNS resolution from, and to, the peered VPC under the VPC settings. This will enable public IP resolution to private IP from a peered VPC. Both sides of the VPC peer need to have DNS resolution enabled, and DNS hostnames enabled, both of which are found in the VPC settings.
For proper routing purposes, VPC peering doesn’t support overlapping IP addresses. Each VPC should have a unique, non-overlapping IP subnet that can be routed between VPCs and your on-premises networks. If for some reason you need to peer a VPC to multiple VPCs that use the same IP subnet, you peer it by using multiple subnets and assigned each subnet a distinct routing table. The first subnet would point to the VPC’s IP range over its specific peer using its peering link for the route target. The second subnet would point to the other VCP’s IP range using its peering link for the route target. The diagram below provides an example of this.
If you’ve logged in to the VPC console, you know that VPC contains a wide variety of components to enable communication within the VPC, outside of the VPC to the Internet, and establishing connectivity to remote networks. Below discusses multiple components that are important to know for the test.
Elastic Network Interface (ENI)
The ENI is basically a virtual network card in a VPC, generally attached to an instance provisioned in EC2. By default, each instance comes with a default ENI which cannot be detached. However, you can add additional ENIs to an instance, which can be detached and reassigned if needed.
At a minimum, the ENI comes with a MAC address and an internal IP address, which can be auto-allocated or user specified. Because the ENI is assigned an IP address for the subnet in which it is assigned, the ENI should be considered an AZ construct, and cannot be moved to a new AZ. When provisioning an EC2 instance, and only when provisioning an EC2 instance, you can also add an elastic external IP address, which is a static IP that works across AZs within your account. The elastic external IP is associated with an ENI and, even more, important to know, it will likely change if you stop/start your EC2 instance. Therefore, if you need an IP address that will never change, use an Elastic IP instead. Note that if you assign an Elastic IP to an EC2 instance, the static external IP addresses will be dropped.
ENIs have a security feature, enabled by default, to drop traffic where it is not the source or destination of that traffic. Sometimes, however, you might need an instance to pass traffic for other instances – for example, a NAT, firewall, or router appliance. For traffic to properly traverse the ENI, you’ll need to disable the SRC/DEST check for the instance. This is done under the EC2 Console –> Actions –> Networking –> Change Source/Dest. Check.
Fun fact: Security groups, as discussed below, are attached to ENIs and not the instances themselves. If you move an ENI, the MAC address, the IP addresses, and the security group come along for the ride.
Internet Gateway (IGW)
Internet gateway does exactly what its name implies, acts as a gateway to enable Internet access to/from your VPC. An IGW can be attached to only one VPC, and a VPC can only have a single IGW. Having just a single IGW shouldn’t cause any concerns about availability, as the IGW is a horizontally scaled, highly redundant managed service. The IGW is where any network address translation (NAT) takes place for egress or ingress traffic for your VPC. To take advantage of the IGW, a subnet (or VPC) needs to have a route to the IGW, which is generally configured in a route table as 0.0.0.0/0 –> IGW. In short, if you are hosting publicly available services in your VPC, or if your workloads need to access Internet resources, you need an IGW attached to your VPC.
Important: Any subnet that has a default route to the IGW is considered a public subnet.
Security groups (SGs) are a stateful firewall applied to the ENI of EC2 instances (think of it as micro-segmentation). Stateful means that if you allow traffic inbound, such as a client to a web server over port 80, the security group will allow the return traffic back to the requesting client. This return traffic is allowed regardless of what the outbound rules dictate on the security group. Keep in mind that a NACL could still block the return traffic if configured to do so, as security groups are an ENI construct where NACLs are applied at the subnet level (more on those later). Security Groups required you to explicitly allow traffic, as, by default, all traffic is blocked by a hidden implicit deny. In other words, if you don’t explicitly allow traffic, it won’t be allowed. An object, such as an EC2 instance, can have multiple SGs applied to it and the instance will receive a product of all of the attached SGs.
One of the cool features for SGs are self-referencing SGs. Self-referencing SGs reference themselves within the ruleset configured for the SG. For example, if you have a bunch of Active Directory Domain Controllers that need to communicate with each other, you can create a security group that allows all the ports required (like 135, 137, 139, 445, etc.). For the source and destination of all these ports, you can use the SG instead of an IP or networking. Therefore, any instance that is assigned this security group is allowed to communicate to any other instance with this security group over those specified ports. Pretty cool!
Did I mention Security Groups are stateful?
Network Access Control Lists (NACL)
NACLs are a different beast than SGs, although they somewhat do the same thing, allow or block traffic. NACLs are stateless, meaning you must explicitly allow the traffic in both directions. They block, or allow, traffic at the subnet level, therefore directly affecting any instance within that subnet where the NACL is applied. NACLs use rules that are processed top-down, similar to a firewall. In other words, each rule is processed in order, starting with the lowest rule number or priority. Once the traffic matches a rule, that rule is applied, and rule processing stops. NACLs process traffic before it enters or leaves a subnet, even if a security group will allow the traffic at the instance level. Therefore, sometimes you might see traffic allowed, and then denied when looking at VPC Flow Logs (more on that later). Outbound traffic might be allowed by the SG when leaving the instance but then blocked when it hits the subnet-level NACL.
Stateless rules require a bit of knowledge about how client requests work in order to properly allow outbound traffic. When a client makes an outbound request to a service, it uses an ephemeral port, usually something like 10,000 – 65,535. When traffic is destined to a web server (assuming HTTP here), its destination is the web server’s IP address on port 80, so something like this: 184.108.40.206:80. However, it’s source address and port look like this: 220.127.116.11:45968. When the traffic hits the web server on port 80, the web server will respond to the requesters IP address and the port the request came from, so in our example above, port 45968. Therefore, the NACL must allow both inbound from the internet, or at a minimum 18.104.22.168 over port 80. It must allow OUTBOUND traffic to the internet, or at a minimum, 22.214.171.124 over port 45968, otherwise, traffic will be dropped by the NACL.
By default, all NACLs have an explicit Allow All as the second to last rule, which allows ALL traffic. If you use NACLs to restrict traffic between subnets, you’ll need to remove this. Otherwise, traffic might be permitted if not explicitly denied in prior rule. The last rule, an explicit Deny All, will block all traffic that doesn’t match a prior rule.
NAT Gateway is a managed service provided by AWS that translate internal clients to a communication with the Internet or other AWS services. NAT Gateway is used for outbound traffic only, meaning the traffic must originate from your VPC and bound to somewhere else (usually the Internet). Before NAT Gateway was introduced, customers had to spin up a customized EC2 NAT instance that handled NAT traffic, which was a pain because, depending on the EC2 instance size, it could only handle so much traffic. To handle an increase in traffic, you had to scale the EC2 instance to a larger instance. NAT Gateway doesn’t have those pains, as it scales automatically to provide up to 10Gbps of traffic. If you need more throughput than 10Gbps, you can spin up another NAT Gateway and spread your traffic across them using custom route tables.
Although the NAT Gateway was a welcome change, there are limitations that you should be aware of for the test. For starters, there are currently no traffic metrics available for the NAT Gateway. Therefore you can’t determine how much traffic is passing through the service. As the NAT Gateway is a managed service and not an instance, you can’t access the underlying OS or configure any advanced settings, like port forwarding. You can’t assign a security group to a NAT Gateway to restrict traffic flowing through the service. However, you can put the NAT Gateway in its own subnet and use NACLs, but as you learned above, they are less flexible.
Virtual Private Gateway
As discussed in the VPN and Direct Connect posts, the Virtual Private Gateway (VGW) is essentially a VPN concentrator hosted by AWS. The VGW is the AWS side of a VPN or Direct Connect connection to your VPC. For redundancy, the VGW provides multiple (two) endpoints in multiple AZs for redundancy. When you create a VPN connection, you should ensure that you’re connecting to BOTH endpoints, preferably originating from numerous customer gateways. The VGW will only route traffic for networks from static routes entries or routes learned by dynamic routing via BGP.
A VPC can only be assigned a single VGW. A single VGW can communicate with multiple customer gateways, regardless of where those gateways live. For instance, a single VGW can be configured to connect to a primary data center and a DR data center. The VGW can be (and should be) connected to multiple customer gateways originating in the same location to provide complete redundancy for the VPN tunnels.
Route propagation for the VGW is as follows:
- If routes learned by BGP overlap with local VPC routes, it will prefer the local routes
- if routes learned by BGP overlap with static routes, the static routes will be preferred
- the most specific routes will be preferenced
VPC endpoints allow access to other AWS services without leaving the VPC boundary. Traditionally, many AWS services only had a public endpoint, and any resources in a VPC that needed to interact with those services required Internet access. An example of this would be reading data from an S3 bucket from an EC2 instance. As of today, there are eight endpoints available, including S3, DynamoDB, and EC2. Endpoints can be configured to restrict/permit access using a policy (default is unrestricted), and you can have multiple endpoints for the same service within the same VPC. You also need to configure routing (in a subnet/VPC routing table) to ensure traffic flows across the VPC endpoint vs. the public Internet if desired. Using the combination of policies, routing, and multiple endpoints, you can restrict certain subnets to specific endpoints and to allow/limit access as needed. You can also restrict access to an object, such as an S3 bucket, to only traffic originating from a particular endpoint.
Endpoints are a regional service and are not extendable across VPC boundaries (including VPC Peers, VPN connections). Additionally, DNS resolution is needed within the VPC to resolve the S3 endpoints.
DHCP Options Set
For every subnet created, AWS reserves five addresses:
- .0 – Network Address
- .1 – VPC Router
- .2 – Reserved for DNS
- .3 – Reserved for Future Use
- .255 – Broadcast Address (assuming a network => /24)
DHCP Options Sets are configured and assigned at the VPC level. Once you create an Options Set, it cannot be modified. You can, however, create a new DHCP Options Set and assign the new one to the VPC. A VPC can only be assigned a single DHCP Options Set. If you assign a new DHCP Options Set to a VPC, the changes take effect immediately, and any new resources will be assigned values from the new set. When existing resources eventually renew their lease, they will get the new values. When DHCP hands out an IP address and associated options, it’s the VPC network address that will be assigned for DNS. For example, 10.0.0.2 might be offered in a subnet that is assigned 10.0.1.0/24.
DHCP Options Set values that can be configured include:
- Name Tag
- Domain Name (example.com)
- Domain Name Servers (up to 4 servers, comma separated)
- NTP Servers (up to 4 servers, comma separated – or use the new Amazon Time Sync Service)
- NetBIOS Name Servers
- NetBIOS Node Type
VPC Flow Logs
VPC Flow Logs capture metadata of IP traffic flowing through an ENI, Subnet, or entire VPC. It’s important to note that this is not IP data, it’s metadata about the traffic. It’s not a packet capture, and you cannot read or analyze the data about the packet or payload. If you need deep packet inspection, you’ll need to use a third-party solution like WireShark or a firewall appliance. Flow logs can be enabled on multiple levels, including a single ENI, at the subnet level, or the entire VPC. Flow logs send data to a CloudWatch Logs log group in which you can export to S3 or stream to Lambda or Elasticsearch service. Once the flow log has been configured, it cannot be changed – you’ll have to create a new one.
VPC Flow Logs are not real-time – there is a several minute delay between the time the traffic traverses the ENI/subnet/VPC and the time the data is recorded in the CloudWatch log stream.
VPC Flow Logs will capture the following information:
- Source Address
- Destination Address
- Source Port
- Destination Port
- Protocol – TCP = 6, ICMP = 2, etc (see this link for more information)
- Start/Stop time – displayed in Unix time
- Result – Accepted or Rejected
Flow logs will always show the internal IP addresses of any AWS services – even if they have public or elastic IP. It will also show the primary IP addresses of the instance, even if the traffic is destined to a secondary interface.
Last but not least, the list below lists the limitations on VPC Flow logs. They are mostly focused on communication between your VPC resources, and AWS managed services.
- Traffic between instances and AWS DNS isn’t captured
- License activation traffic between WIN and AWS isn’t captured
- Metadata traffic isn’t capture – EC2 <-> 169.254.169.254
- DHCP traffic between the EC2 instance and AWS DHCP isn’t captured
- Traffic to the reserved IP of the VPC router isn’t captured