The Road to an Open Source DevOps Future: A Case for Software-Defined Networking

OpenDaylite Summit

Executive Director Neela Jacques makes his opening remarks at the ODL Summit

It is no secret that the IT industry is evolving, and the stereotype of the basement-dwelling ‘IT guy’ is quickly fading away. IT departments that only know systems and command-line interfaces (CLIs) are no longer enough to support modern businesses. In fact, the main concept of DevOps is bringing software engineering and IT together to automate infrastructure management and service delivery, and to drive day-to-day operations.

At the end of September 2016, I had the opportunity to attend the OpenDaylight Summit in beautiful Seattle. I saw many presentations covering a wide variety of software-defined networking (SDN) applications and use cases. But wait, hold up, what the heck is SDN!? By definition, SDN is an architectural approach to network design that uses open protocols and provides programmability to network elements, allowing them to be more dynamic, manageable, and adaptable. OpenDaylight, in particular, is the Linux Foundation’s realization of SDN.

Throughout the history of networking, network elements have typically been magic black boxes created by vendors to play certain roles in the network. Network engineers are typically knows as ‘CLI Jockeys,’ which speaks volumes about the tedious, vendor-specific tasks they often face. SDN aims to open source these network elements and allow users to tailor them to their individual functionality needs.

Let’s take a step back and talk about the various ‘planes’ in a network to help us understand the true value of SDN:

Imagine you’re the engineer of a large metropolitan transit system. Busses have designated routes that have been developed by city planners, and at their core they are designed to transfer people efficiently from place to place throughout the day. Now imagine that the busses are data packets that are part of a data flow in a network, and the people inside the busses represent that actual data. Visually, we can represent the roads in the city as the network links, such as an ethernet cable, and bus stations represent other network elements that have routes to different locations.

  • Control Plane: The control plane is the method by which these routes are learned. In the case of the transit system, the people riding the busses have to review all available bus routes to figure out the best way to reach their destination. In computer networking, this is done by routing protocols.
  • Data Plane: The data plane is the actual movement of the packets in the network. In the case of the transit system, this is the bus driver stepping on the gas and following the route. In computer networking, this is the router receiving incoming data packets and forwarding them over a network link based on what was learned from the control plane.
  • Management Plane: The management plane is the handling of packets that are intended for the network device itself. In the transit system analogy, this could be maintenance workers making repairs at a bus station. In computer networking, this is used for configuration changes, monitoring, troubleshooting, etc.

Network elements often muddle these three planes together, making them more static in their deployments. SDN is essentially a programmatic decomposition of these three planes, which:

  1. Leads to better separation of the planes, resulting in much higher uptime.
  2. Allows for independent evolution along these paths.
  3. Through the distributed software control plane, creates a clear separation of duties and manages complexity.

Network function virtualization (NFV), a concept that is complementary to (but not dependent on) SDN, aims at decoupling the various functions of a network device and defining them in software. Take a look at this article for a more detailed description. NFV allows for new network functions to be rolled out in days instead of months. It also enables businesses to scale quickly and effectively. SDN and NFV are typically most effective when combined.

The interesting points that stuck with me from this conference can be best summarized as this:

  1. OpenDaylight is only one particular realization/implementation of SDN, and there are many others.
  2. No single company’s product or solution is completely reliant on OpenDaylight on its own, or even SDN on its own.
  3. Thinking in terms of SDN helps, but it is far from a be-all, end-all networking solution.
  4. IT departments and providers can no longer think of the network as providing support to the product: the network IS the product.

What am I getting at here? The future of IT is not wed to the products or offerings of any single vendor. Rather, it’s dependent on the ability of every company to integrate different open source products together to fit their individual needs and drive innovation in the areas of service delivery, operations, infrastructure, etc.

Open source communities all around the world are leading the way to this DevOps future, and we at AppliedTrust are always dedicated to solving your hard IT problems so you can make the world awesome. If you’re interested in integrating SDN or any open source project into your IT portfolio, please do not hesitate to give us a call at 303.245.4545.

For more information about OpenDaylight and detailed use cases, please visit https://www.opendaylight.org/

Five Steps to Becoming an Effective Pen Tester

There’s been a huge jump in the buzz over Information Security lately, with all of the high-profile disclosures and hacks going on. Anyone working in the field will tell you that this is not all that new, and actually it’s expected to some degree. Some organizations just do not learn until they become affected, and they end up paying much more than the cost savings they figured in by not having their environment pen tested. However, that means fun work ahead for those who have a passion for Information Security and love to play with all the latest tools to help break into various networks with the goal of making them more secure.

Those that are looking to get started may become overwhelmed by all of the tools and technologies pen testers must carry in their arsenal, but it’s also important to note the project management side of testing. It may not be very effective to just go full “black hat” on a client’s network. One must consider other variables, such as client needs, project cost, deadlines, and effective communication of findings so that they may be remediated. That is why I believe the five steps below are a good start to becoming an effective pen tester while not allowing yourself to get overwhelmed by pen test requests from the latest victimized retailer or health insurer.

  1. Be consistent with your testing methodology.

There are many excellent penetration testing methodologies that you should be reading before conducting any actual projects. These include the OWASP Testing Guide and the Penetration Testing Execution Standard. These include a wealth of information that can help get you on your way. However, for new folks, I would start at an even higher level and make sure you’re doing at least the following for *every* single pen testing effort you’re involved with:

  • Identify testing boundaries. This means identifying any rabbit holes you *shouldn’t* go down with your client’ system(s).
  • Perform host, service, and protocol identification. Utilize well-known tools such as Nmap and Metasploit. Vulnerability scanners are nice too. I personally like to use Nessus. Identify any technology-specific vulnerabilities from this data (easy mode).
  • Attempt to bypass implemented network, system, or application access controls, including verifying implemented network segmentation controls to ensure that network scope is reduced to its absolute minimum.
  • Enumerate and then inventory all of your points of entry in a test plan. This will help you track what you’ve done, as well as your findings.
  • There’s so much more that could be added here.
  1. Create a test plan.

Performing the steps above can give you an overwhelming amount of data to track. Unlike hackers, we are not just looking for the first successful exploit and forgetting the rest. We must track all that was completed in order to relay that information effectively to the client. It may be awesome that you got that authentication bypass exploit to work with Metasploit, but what does that mean for the client? Why was the client running an outdated version of VNC? What other services that *weren’t* exploited are also out of date and may be exploited in the future? WHY WAS IT RUNNING ON A DOMAIN CONTROLLER?! These are all important questions that should be asked and backed up with our test plan.

A test plan can then be reviewed with an experienced pen tester to help identify any gaps. This can be incorporated into your organization’s peer review process (which should exist if it does not). Shoot me a message if you’d like to discuss creating effective test plans.

  1. Save all of your data.

Please do not complete a test and save *none* of your data. Again, this isn’t a race to just find the first exploit in a client’s network. It’s critical that information be saved so that it can be properly referenced if needed in the future. Nobody should want to memorize all their findings from a test completed months ago. I recommend making sure you have a *secure* place to store the following information:

  • Your test plan!
  • Screenshots of data or findings identified during the test. The more the merrier.
  • Exported session data from key tools such as OWASP’s ZAP.
  • Saved scan data that may come from tools such as Nmap or Nessus.
  • Scripts or exploits you may have written for testing.
  1. Communicate with your client.

After all, your testing effort is because of your client’s desire to secure its network. Your methodologies and findings should not be a “closed book,” and it is wise to be as open as possible. There are a few things that should be regularly communicated before and during testing:

  • Scope, boundaries, and timeline expectations. Set the scope of the testing effort, establish testing boundaries, and set timeline expectations. Do not just wing it and end up missing a block of IP address space that should have been included.
  • Any findings you deem to be a critical or high risk to the client. Do not wait to provide this type of information until a month down the road when testing is completed and a report is delivered.
  • Any shifts in the testing deadline. Issues arise for all projects. It may even be caused by the client! If there is something that will cause the project to not be delivered on time, then communicate this with the client and adjust as necessary. This is a an easy way to avoid ticking anyone off.
  1. Acquire historical data.

This will not always be possible, as it depends on the type of testing you’ll be conducting (white box, gray box, black box, etc). However, if possible, collect as much historical data on the environment as you can from the client before starting any testing. Data such as the items listed below can help gain an edge over a network. Yes, you may discover the same information while following your testing methodology, but you never know what you may find!

  • Historical vulnerability scans, pen tests, security assessments, or even a PCI Report on Compliance
  • Change orders (if the client has a working change management program)
  • Asset inventory
  • Network diagrams
  • Service configuration files

Remember to read up on all the latest blogs, vulnerabilities, and exploits. Don’t worry, you won’t catch everything. Learn something new every day. Keep all this in mind while you’re starting out and it will give you a head start in becoming an effective tester.

Already an expert pen tester, or interested in making it part of your career? Check out our open Infrastructure InfoSec Engineer position!

Follow me on Twitter at @neiltylerbell.

Upcoming Drupal 8 Release and Beyond

drupal 8 logo Stacked CMYK 300Drupal 8 is coming! No really, it is this time. As of today there are four critical tasks, 21 critical bugs and five plans remaining in 8.x. Once all of those numbers go to 0, assuming no more criticals are filed or reclassified, a Drupal 8 release candidate will become available. At that time security issues with Drupal 8 will no longer be public in the issue queues, and site builders and developers alike can be confident that Drupal 8 is ready to start implementing in production.

So, what does this mean for the Drupal project you’re working on now (or getting ready to work on soon)? If your release date is 3 – 6 months out and you can get by without a bunch of contrib modules, now’s a good time to start looking at whether the site or web application could be implemented using Drupal 8. If you need to release before then, I’d recommend holding off on Drupal 8.

Drupal 8 brings a number of new advantages with it. If you haven’t heard yet, Drupal is now Object Oriented, built on top of the Symfony Framework. This means a number of procedural hooks have been moved to object oriented classes. Also, CMI (Config Management Initiative) brings the ability to define module default configurations through simplified YAML syntax.

One of my favorite new features, which elevates Drupal from your average content management system (CMS) to a fully mobile-ready service handler, is the Web Services and Context Core Initiative (WSCCI). Built on top of Symfony responses, Drupal 8 can now natively handle non-HTML responses, such as JSON. Everything returned to the user in Drupal 8 is now a response. In addition to the ones provided by Symfony and the Zend Framework, Drupal adds some really cool response types you can use in your code, such as AjaxResponse and viewAjaxResponse (used for Ajax responses specific to Views).

Speaking of Views, it’s now part of core! That means a lot of simple Drupal sites that only had contrib modules for Views can run Views out of the box! Based on historical data of the Drupal 7 release, most adoption trailed behind Views being available, which is no longer the case with Drupal 8.

Finally, Entities with full CRUD (Create, Read, Update and Delete) are not only available in core, but also are used as the basic building block of every piece of content in Drupal. Everything from a Node to a taxonomy term is now an Entity, and you have the tools with Drupal 8 to build your own custom Entities.

If you’re more interested in creating themes for Drupal, there’s much to rejoice about Drupal 8 as well. Drupal 8 templates are now using twig, which means themers for Drupal 8 can do simple logic statements without having to write templates and hooks in PHP.

This just scratches the surface of all the new things being added to Drupal 8. If you want to know more, check out http://www.drupal.org/8. This was years in the making and involved a lot of people in the community who volunteered their time and effort. My thanks goes out to all those who helped, whether it was writing documentation, reviewing patches, or contributing as part of one of the major Drupal 8 initiatives. If you’re interested in helping to make Drupal 8 better, we’re always looking for new people to join us. The community helps new folks get started with contribution on Wednesdays at 16:00 UTC. You probably won’t want to jump in on fixing one of those criticals I mentioned above, but working on major and minor bugs is valuable to the community as well.

Finally, I want to wrap up with what future releases for Drupal are going to look like. The Drupal Community is looking at switching to a 6-month minor release cycle with only releases that break backwards compatibility being classified as major. Prior to a major release, a Long Term Support (LTS) release will be made available to give developers plenty of time to update their code to support the new features being added. Even major releases are planned to take less than a year from code freeze. This is very exciting and means that things that don’t break compatibility can be added to improve Drupal 8 much faster than they have in the past.

If you haven’t looked at Drupal 8 yet, it’s getting close enough to release that you might want to consider doing so. Drupal 8 is a huge advancement from Drupal 7, and in my opinion it’s way ahead of where competing CMSs and platforms are right now. Having the power of Symfony at its core with all the Entities we’ve come to love with Drupal 7 makes Drupal 8 the best choice for building your next website or application.

Fabric Deployments for Fun and Profit – Environments and a Web Application

If you haven’t read the first blog post, I highly recommend you start there. In this post we’ll be digging deeper into some more intermediate tasks with Fabric. We’re going to start out talking about roles. From there we’re going to move an environment-specific configuration into a YAML config file. We’re then going to delve into building a deployment script for a simple Python application with a pip install for requirements into a virtualenv and a deployment strategy that simplifies rollbacks.

Server Roles

The basis of any Fabric deployment script is defining what server’s get what tasks. We do this using the @roles decorator on a task. This then will run the commands in that task on every server in the group. A list of servers getting what roles is in the env.roledefs variable.

Here’s a simple example:

env.roledefs = {
    'application': [ 'web.example.com' ]
}

@roles('application')
def deploy():
    sudo('echo deploying application to webserver')

To run a deploy to the webservers, all we do now is run fab deploy

Environment Configuration Files – Using YAML With Fabric

You very well could specify all your roledefs in your fabfile.py for all your environments, but a trick I like to do is load this from a YAML file. In addition to roledefs, this pattern also allows you to have environment-specific variables, such as environment name, some credentials, etc.

To do this, we create a task for loading our environment. This task then parses the YAML file with the configuration and then sets that configuration in a new variable, env.config. This config variable is then accessible in any other tasks. Finally, we set env.roledefs to env.config['roledefs'].

Here’s the code:

def loadenv(environment = ''):
    """Loads an environment config file for role definitions"""
    with open(config_dir + environment + '.yaml', 'r') as f:
        env.config = yaml.load(f)
        env.roledefs = env.config['roledefs']

And the associated configuration file staging.yaml:

roledefs:
    application:
      - 'web.example.com'

Context Managers

Context managers are a useful concept. They run a command within a certain context on the remote server. A simple example is the cd() context manager. This changes the directory before running a specific command. It’s used as follows:

with cd('/opt/myapp'):
    run('echo running from `pwd`')

Other context managers that we’ll be using for this example are lcd() to cd on the system from which we’re running Fabric and exists() to check whether a file or directory exists on the remote host before running a command.

Using Prefix for Python virtualenv

With Fabric, we can prefix any command with the prefix() context manager. We can also create our own context managers by decorating a function as @_contextmanager. We aren’t going to go into huge details on these commands right now (they’re much more advanced usage), but we are going to use them to create a context manager for loading a Python virtualenv using the following code:

env.activate = 'source /opt/myapp/python/bin/activate'
@_contextmanager
def virtualenv():
    with prefix(env.activate):
        yield

This context manager can then be used in your tasks similar to the built-in cd() context manager as follows:

def deploy():
    with virtualenv():
        run('pip install -r requirements.txt')

Running Privileged Commands

Sometimes you need to run a command as root, for example, to create an initial directory and chown it to the user. This can be done by replacing run() with sudo(). Just remember, always follow the least privilege security pattern. It’s always better to not use sudo() if you don’t have to! In this example, it is only used to create the initial directory for the application and the Python virtualenv.

Let’s Deploy an Application!

Ok, so now that we have the basics, let’s work on deploying an application from a git repository! We’ll start with the code and staging/production config files and then explain what they’re doing. You can find the Fabric file at https://github.com/disassembler/fabric-example/fabfile.py and configuration for staging at https://github.com/disassembler/fabric-example/config/staging.yml.

To break down the deploy process, here are the steps we are trying to accomplish with the deploy task:

  1. If this is the first run on this server, run the setup() process.
  2. Remove previous local builds and use git to clone the application locally.
  3. Create a binary release tarball for the application.
  4. Copy the tarball to the application server.
  5. On the application server, extract to /opt/application/builds/.
  6. Symlink above directory to /opt/application/current.
  7. Run pip install to get any requirements that have changed for the app.

And our initial setup is the following:

  1. If virtualenv for application doesn’t exist, create it.
  2. If /opt/application/builds doesn’t exist, create it.

Here is the output of our deployment:

fab loadenv:environment=staging deploy
[10.211.55.17] Executing task 'deploy'
[10.211.55.17] sudo: mkdir -p /opt/virtualenvs/application
[10.211.55.17] sudo: chown -R vagrant /opt/virtualenvs/application
[10.211.55.17] run: virtualenv /opt/virtualenvs/application
[10.211.55.17] out: New python executable in /opt/virtualenvs/application/bin/python
[10.211.55.17] out: Installing distribute........done.
[10.211.55.17] out: Installing pip...............done.
[10.211.55.17] out:

[10.211.55.17] sudo: mkdir -p /opt/application/builds
[10.211.55.17] sudo: chown -R vagrant /opt/application
[localhost] local: mkdir -p /tmp/work
[localhost] local: rm -rf *.tar.gz fabric-example
[localhost] local: /usr/bin/git clone https://github.com/disassembler/fabric-example.git fabric-example
Cloning into 'fabric-example'...
remote: Counting objects: 21, done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 21 (delta 7), reused 18 (delta 4), pack-reused 0
Unpacking objects: 100% (21/21), done.
Checking connectivity... done.
[localhost] local: git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
[localhost] local: git archive --format=tar master | gzip > ../application-20150416080436.tar.gz
[10.211.55.17] put: /tmp/work/application-20150416080436.tar.gz -> /tmp/application-20150416080436.tar.gz
[10.211.55.17] run: mkdir -p /opt/application/builds/20150416080436
[10.211.55.17] run: tar -zxf /tmp/application-20150416080436.tar.gz
[10.211.55.17] run: rm -f /opt/application/current
[10.211.55.17] run: ln -sf /opt/application/builds/20150416080436 /opt/application/current
[10.211.55.17] run: pip install -q -U -r requirements.txt

Done.
Disconnecting from 10.211.55.17... done.

I hope this blog post will help you get started with doing your own deployments with Fabric. One thing we didn’t do in this case is create a production environment, but that is as simple as creating a new production.yml file containing the roledefs for production servers, and specifying environment=production in the loadenv task. In a future post we’ll discuss adding new roles, using execute for ordering tasks across multiple servers, and hiding the implementation details inside a class so our Fabric file can be nice and clean. I’ll also be doing a separate blog post not related to Fabric on how we can take a Flask Python application and use supervisord to launch it with a proxy behind nginx. Keep an eye on the OpsBot Blog for these upcoming posts!

Detecting Bad Apple Instances and Other Performance Issues in AWS

Amazon Web Services (AWS) is an awesome and quick way to get access to a lot of computational power; it is known for providing near-instant access to large amounts of computational resources. Between the API and the web interface, you can quickly scale your application with demand. As with any technology service though, it’s hardware based, and hardware does fail. I have personally seen that it is possible for AWS resources to become almost completely unreliable. These bad apple instances exist across the entire AWS infrastructure.

In looking at performance issues within AWS, most of them appear to originate from I/O performance. You can start a bunch of instances and receive widely differing performance levels. This can be caused by a failing disk in the infrastructure or, more likely, a least-optimal routing path for storage. The graph below shows a recent example of performance severely degrading read speeds on an instance, causing my system load to float around 2.5 on the midterm. After stopping and restarting the instance, AWS automatically rerouted the storage path and my load dropped to around .15. This is a huge increase in performance just from stopping and restarting the instance. This is why it is super critical to monitor and to check for these performance issues.


Graphite Browser 2015-04-09 03-03-30 (1)

We were able to catch this performance issue with the aid of a couple of tools. Icinga, along with our custom check_sshperf script (available at https://github.com/AppliedTrust/nagios-plugins/blob/master/check_sshperf), allowed us to have agentless threshold checks for system load, network performance, memory usage, and much more. The tools collectd and Graphite can be used to collect and graph information about the system, with the ability to span a wide range in values. Icinga initially alerted us to a high load on the system. Then, using Graphite, we were able to see the impact of load over time and show that performance had been severely impacted.

These tools can help not just with monitoring hardware issues, but also with software misconfigurations that could be impacting memory, storage, or compute resources. Below are just a few of the things we can monitor with Graphite:

Graphite Browser 2015-04-09 03-17-45

Icinga can also be more deeply configured to check everything that makes your application run. So take the time to set up some in-depth monitoring across your application. It will allow you to detect performance problems before the end-user notices.

 

MFA for Cloud Hosting and Web Applications

If you are using an infrastructure as a service (IaaS) provider and do not have multi-factor authentication (MFA) enabled, stop reading now and come back when you are done. It’s OK… we’ll wait.

The terms data breach, unauthorized access, and hacked are no stranger to headlines recently. Common terms that follow those seem to be credentials, user account, and weak passwords. These reports illustrate the need to ensure that strong password policies are in place, but we are also seeing that passwords alone are no longer sufficient to guarantee security. This is especially true for cloud providers and hosted services. Because their purpose is to be well-known and easily located there is no obscurity here, making these sites a perfect target. So, if you didn’t stop reading before and you are still using a cloud provider and do not have MFA enabled, now would be a good time for a heart to heart.

 

You have implemented an exceptional password policy, you say?

Passwords have been the primary authentication method since the 60s, and for decades experts have warned that they are one of the weakest security points. Two out of three network intrusions involve weak or stolen credentials, with the highest percentage of attacks against web applications. The number of intrusions involving compromised credentials is almost three times that of software vulnerabilities or SQL injections, yet organizations often seem to have a more robust patch management strategy than access management strategy, and rarely do we see authentication methods beyond username and password in use.

In today’s environment it is common to see good complexity requirements of passwords to ensure factors such as minimum length, special characters, non-dictionary words, and password history. However, even with strong passwords, phishing attacks and malware can be used to obtain credentials and compromise accounts. Approximately half of all hacking breaches involve stolen passwords gained through malware, phishing attacks, or theft of password lists. We already know that organizations such as Target, eBay, and Code Spaces (to name a few) had password policies implemented, but complexity requirements did not help once credentials were compromised and malicious users were able to access the environment.

 

But why would someone want to compromise my site?

With more than a million dollars in e-commerce sales every 30 seconds, and one-third of transactions from the United States alone, it is clear that there is an incredible amount of business conducted over the Internet today. However, without customers or users able to easily find your site or service you are not going to see much business. So you work hard and devote resources to marketing, SEO, and any other method to get people to your site. But remember, as you are advertising to customers you are also advertising to hackers and malicious users. Also keep in mind that attackers are not always seeking gain, and sometimes a site is attacked simply “because.” If you want to do some research yourself on which sites, why, and how they are attacked, go to Google, pick an organization, and search for that organization followed by the phrase data breach. You will find no shortage of results, and that both large enterprises as well smaller businesses are susceptible to data security problems, and that all are targets of intrusion attempts.

 

Say hello to my little friend… MFA!

MFA offers additional protection against malware, attacks against web applications, point-of-sale attacks, insider abuse, and loss of devices. Approximately four out of five data breaches would have been stopped or forced the intruder to change tactics if an alternative to password authentication such as MFA was in use. Many organizations are concerned by the cost of implementing an MFA solution and the change of workflow, but we must remember that these costs have a direct effect on decreasing risk. We must also keep in mind the insurmountable and often incalculable cost of a data breach in the forms of loss of records, customer trust, and negative public image. For instance, the breach at Code Spaces destroyed such a large portion of the company’s data, including backups, that it decided that to restore all services was not feasible and subsequently closed its doors. Target experienced not only a financial loss because of its data breach, but it also took a big hit on customer loyalty, satisfaction, and trust, which surely turns into a financial cost in the long term.

 

Now what?

Let’s take a look at a few steps we can take to increase security using MFA within Amazon Web Services (AWS), the current leader in IaaS.

  • First and foremost, it is always a good idea to review compliance requirements to verify whether PCI, HIPAA, or other standards you are subject to require MFA, and to what extent.
  • Create a strong password and enable MFA for root accounts. Use this root account to create an Identity and Access Management (IAM) account, which will be used for administrative tasks going forward. Do not use the root account for everyday administration.
  • Enable MFA for all other accounts as well, especially any account with administrative rights.
  • Explore linking MFA to API access to services. This allows you to protect certain tasks, such as TerminateInstances, by requiring MFA before that task can be initiated.


If you would like to find out more on using MFA, contact your cloud service provider or contact AppliedTrust. We can help you secure your site and your business.

systemd: A new init daemon for Linux

If you use Linux and you haven’t tried systemd yet, it won’t be long. Most of the major distributions, including Red Hat, Debian, Ubuntu, Archlinux, etc., have either adopted it already or are planning to adopt it within the next year. So because you can’t ignore and avoid it anymore, this blog post is going to discuss what it is, why it’s an improvement over System V init, and how to interact with it using systemctl.

To begin, systemd is a replacement init daemon for Linux. The init daemon is the process the kernel launches on startup that manages the starting of everything else. It is primarily used for stopping and starting services and getting the current status of a service.

It supports some awesome new kernel features, such as cgroups, where every process started by systemd gets its own cgroup, so it’s easy to identify all the processes associated with a service. It also allows systemd to assign a cgroup a max amount of memory, a higher CPU priority, block I/O read/write bandwidth, and even some really nitty gritty values such as swappiness.

One of the things systemd excels at over System V init scripts is service dependencies. If you aren’t familiar with how /sbin/init works, when it starts, it sets the runlevel and first executes all the K scripts in /etc/rcN.d/ (where N is the runlevel) with the stop argument, followed by all the S scripts with the start argument. Everything is done numerically, so if process foo needs to start before process bar, it would be S02foo and S03bar. You can see how this could get unruly when you need to insert process baz between foo and bar and there isn’t a number available. Now you have to go back and renumber foo and bar, which means altering the RPM of foo and bar to get baz to start at the right time. With systemd, we can specify a service to require or even want (like require but if the service doesn’t exist it’s ignored) another service in the unit file. So when baz and bar 2.0 are released, baz can require foo, and bar 2.0 can want baz.

Starting and stopping services with systemd is pretty simple. We use the systemctl command to interact with services. A service unit ends in .service to distinguish among different types of systemd units. For this example, we’ll stop/start/restart the httpd.service unit. To start our service, we run systemctl start httpd.service. Similarly, to stop the service we run systemctl stop httpd.service. To restart, the command is: systemctl restart httpd.service.

Another benefit to systemd is the monitoring of services. With System V, to get the status of a service, the init script needed to be written to support it. We get this out of the box with systemd with all services. Here’s an example output of a stopped service:

sam@myvm:~$ sudo systemctl status -n 50 apache2.service
apache2.service – LSB: Start/stop apache2 web server
Loaded: loaded (/etc/init.d/apache2)
Active: inactive (dead)
CGroup: name=systemd:/system/apache2.service

And of a started service:

sam@myvm:~$ sudo systemctl status ssh.service
ssh.service – LSB: OpenBSD Secure Shell server
Loaded: loaded (/etc/init.d/ssh)
Active: active (running) since Fri, 13 Mar 2015 14:23:33 -0400; 3 days ago
CGroup: name=systemd:/system/ssh.service
└ 1183 /usr/sbin/sshd

Mar 16 22:04:52 myvm sshd[12496]: Accepted publickey for sam from 10.211.55.2 port 53061 ssh2
Mar 16 22:04:52 myvm sshd[12496]: pam_unix(sshd:session): session opened for user sam by (uid=0)
Mar 16 22:55:27 myvm sshd[12577]: Accepted publickey for sam from 10.211.55.2 port 53362 ssh2
Mar 16 22:55:27 myvm sshd[12577]: pam_unix(sshd:session): session opened for user sam by (uid=0)
Mar 16 23:18:48 myvm sshd[12593]: Accepted publickey for sam from 10.211.55.2 port 53766 ssh2
Mar 16 23:18:48 myvm sshd[12593]: pam_unix(sshd:session): session opened for user sam by (uid=0)
Mar 17 10:17:07 myvm sshd[18117]: Accepted publickey for sam from 10.211.55.2 port 52378 ssh2
Mar 17 10:17:07 myvm sshd[18117]: pam_unix(sshd:session): session opened for user sam by (uid=0)
Mar 17 10:45:42 myvm sshd[30694]: Accepted publickey for sam from 10.211.55.2 port 52716 ssh2
Mar 17 10:45:42 myvm sshd[30694]: pam_unix(sshd:session): session opened for user sam by (uid=0)

There’s a hidden gem in the status command above. Because systemd by default sends all stdout/stderr output to journalctl, we can get the most recent logs via the status command. If we want more, we can use the -n parameter to specify the number of lines of logs we want to see. In this case, we haven’t even created a systemd unit file; systemd is starting the old LSB init script without any new systemd features being set up in unit files.

This is the basic usage of systemd. In future blog posts we’ll look at some more advanced features, such as writing your own unit script (to see an example of how easy it is, see my blog post on Advanced Docker Networking with Pipework), integrating with dbus, using other unit types like timers, and taking advantage of cgroups to control your process resource usage. Until next time, enjoy playing with systemd.

IAM Now Using Managed Policies for Everything

If you work with AWS IAM policies on a regular basis, chances are you have lots of groups defined with inline policies attached. If you just have a lot of users with inline policies attached, go back and do what I said in the last sentence. Just like in Active Directory, applying IAM policies (read “file/machine permissions”) directly to user accounts is guaranteed to fill your authentication structure with snowflakes fit for a ski hill. Creating consistent access structures is paramount to maintaining the ongoing security of your AWS presence.

Anyway, back to those inline policies. The workflow so far has been to create a group, create a policy for that group, and then add users to the group so they get to assume the permissions of the policy.  Need to change what a group can do? Go edit the inline policy for the group as needed. Need to find out why a user can’t (or can) access an S3 bucket? Open the user page, scroll down to inline group policies, see what the user has inherited, and move your way up the chain. Want to see a comprehensive list of the IAM entities that have access to create new buckets? That’s where things get tricky. Finding this information would require someone to sit down and open every user, group, and role created to examine their inline policies and look for the s3:CreateBucket action.  This can make auditing access and maintaining adequate security a chore, which likely means it won’t happen.

Enter managed policies. This feature was quietly rolled out in February and, like most AWS feature upgrades, was conveniently placed in front of your face for you to ignore while you go about your normal business. Here’s the thing: don’t. Aside from having 106 canned policies available for public use at the time of this writing, managed policies solve a lot of issues we have been working around inconveniently for years. I wanted to share a few key thoughts about the new features and how they can be useful.

 

  1. Staying Current

Just like with managed policies, AWS seems to be rolling out new minor features all the time. So you want to give a user full access to RDS? That requires a lot of specific permissions. Amazon has done a good job of providing examples to use for inline policies, but what happens when suddenly you can store backups in Glacier? You need to go back and copy its new example. Managed policies has provided a method for attaching Amazon managed policies to your own entities so that as it rolls out new features and requires additional access, your users automatically get the rights they need.

policy details

I will mention though that you should be VERY careful about using access policies managed by any third party in a production environment or any environment containing sensitive data. Even though Amazon is a trustworthy bunch, you should always be maintaining tight control over your data. These managed policies make it easy for your developers to try things out, but I will always recommend making a copy in your own managed policy and manually syncing changes after conducting a security review.

 

  1. Policy Versioning

We’ve all done it. You say to yourself on a Friday afternoon, “I’ll just make this one permissions tweak, grab a beer, and these backups will be able to run over the weekend.” Unfortunately, you ignored your organization’s change management policies and even forgot to copy and paste the old policy into your favorite text editor for posterity while saving the live change in the console. In the process you overwrote an obscure ARN that took you two weeks to figure out in the first place, and now you’ve broken something. Managed policies automatically save up to five versions of the policy, and rolling back is as simple as clicking “Set as Default.” In addition, there’s a handy timestamp you can use to search your CloudTrail logs and figure out who made previous changes.

image (3)

Alternatively, you could always just look back at what you missed and save a new working version as well. There are options.

 

  1. Access Auditing

As I said before, poring through a large number of IAM entities looking for a specific action or resource is a pain. Now all you have to do is look at the policy and see what it’s attached to. Keep in mind, however, that this only works if you have truly committed to using only managed policies, because inline policies will still work side by side.

attached entities

 

Here’s a hint: Create an administrator policy that only allows the creation of managed policies to avoid people making the mistake of adding inline policies in a pinch. A simple ‘deny’ in your admin policy takes care of it.

txt

Still Using FTP? You Should Be Ashamed!

If you call yourself a sysadmin or IT person and are still using FTP, you should be ashamed of yourself. The original specification for the cleartext File Transfer Protocol, or FTP, was written by Abhay Bhushan and published as RFC 114 on April 16, 1971. Although completely groundbreaking at the time, not to mention a trailblazer for future file transfer protocols, FTP has outlived its usefulness in a time of billion-dollar digital bank heists and hijackings of millions of credit card numbers and protected health information records.

The reason FTP is so insecure is that it’s a completely cleartext protocol, which means that all data–including usernames and passwords–are transmitted across any network in cleartext. This is the equivalent of a postcard being sent via snail mail. Anyone and everyone that touches this information as it travels from origin to destination can plainly see the information written on it. It’s almost as silly as printing everything you need to make a credit card transaction on the physical credit card itself… but that’s a topic for another rant.

So why in an age of cyber heists, boundless information gathering, and profiling are we still using such an outdated and insecure technology? Simple: Because our trusted application vendors, software creators, and service providers are often lazy. As consumers, we tend to assume that when we sign up for a new product or service and provide any accompanying data–be it personal, financial, or medical–that the company behind that product or service does everything in its power to protect our information. In reality, this is often not the case.

As an IT auditor, I’m surprised at how many companies, hospitals, insurance companies, financial institutions, and city and federal government agencies are using antiquated and insecure methods for storing and transferring our data. I’m shocked when I find that that their supposedly cutting-edge high-tech vendors that are creating products for regulated sectors such as the heathcare and financial industries typically recommend and implement outdated protocols such as FTP.

Medical vendors, in my opinion, can be some of the worst offenders. These vendors–whose clients are always required to comply with HIPAA regulations in order to act as custodians of our electronic protected health information (ePHI)–regularly configure their products in a manner that is not compliant. It’s not that their products are incapable of being HIPAA compliant; it’s that they support outdated, insecure protocols such as FTP. To make it worse, these same protocols are typically the ones that get enabled because it’s easy, and it’s not the vendor’s responsibility to be HIPAA compliant: It’s the customer’s.

Often when confronted about the reasons such non-compliant protocols are available and supported within their products, vendors say that they don’t get paid to make it secure; they get paid to make it work. Unfortunately, most customers using these vendors will default to trusting the expertise and professionalism of said vendors, which results in non-compliance and, in many cases, security incidents. As an auditor and system administrator, I call out any vendor that supports and implements protocols that are clearly prohibited for the industry that vendor is supposedly advancing.

So as a system administrator or even a consumer, the next time you hear someone at your company, or a software vendor, or even a trusted connected partner mention FTP, be bold and tell the person that although you respect how FTP was a modern marvel for its time, and undoubtedly paved the way for file transfers of the future, you would really prefer to stop living in the 70s and implement a more modern and secure encrypted alternative. Maybe try SCP or SFTP, which are based on the SSH protocol, to bring you into the 90s!

Monitoring EC2

My dirty little secret is that I’m a bit of a monitoring junkie. I like to monitor all the things. I’ve been known to monitor my stereo receiver’s network connection so that I can make sure AirPlay is always available. That’s why I was surprised one afternoon a few years ago when several of my Amazon Web Services (AWS) EC2 instances started dropping like flies, with no warning. Typically I’d have seen some indication that the server was having problems, right? Wrong.

 
That was my first foray into AWS servers, and it turns out that they require you reboot them every once in a while, because they retire old hardware. If you don’t, they’ll do it for you — on their schedule. I should have known about this, you say? Maybe. I could have logged into the console and reviewed each host (see below) to see whether there were any notices attached to them. But why would I, if they weren’t showing problems in my monitoring system of choice (Nagios/Icinga)? There had to be a better way.

monitoring2

Enter Amazon’s EC2 API. Using this super handy interface (thanks, Amazon!), we were able to set up a check through our monitoring server to see whether our EC2 instances had any status events that we needed to be concerned with. You can be sure that we ran around and added this check to ALL of our instances as soon as it was ready. Using the access and secret keys as well as the AWS EC2 API Tools (https://aws.amazon.com/developertools/351), we were able to query our AWS environments for systems that had alerts attached to them — no more manually logging in and clicking through each instance to locate potential problems. This is just one of many things that can be monitored using the API. We also have checks to run and report on backups, something else that historically we had been handling through the console. All of our AWS Nagios checks (including check_ec2_status) are stored and available in our github repository here: https://github.com/AppliedTrust/nagios-plugins

 
Because our monitoring server is tightly integrated with our ticketing system, tickets get opened automatically whenever the status check fails. This ensures that we have a ticket open for events such as scheduled retirement so that we can reboot or move EC2 instances on our own schedule. This helps us to ensure that we’re meeting SLAs, notifying appropriate stakeholders, and ensuring that we’re following proper change control procedures. No more unexpected terminations and/or reboots!

monitoring1

Monitoring AWS notification events is a great way to help ensure the best possible availability for your EC2 environment. Using Amazon’s API and your organization’s default monitoring or ticketing system will make sure your admins are tuned in and taking action to fix issues within your Amazon environment.

 
So, now that you know how we monitor our EC2 instances, we want to know how you’re doing it! Are you? If you want help implementing monitoring for your EC2 instances (or any other servers), please reach out and let us help find the solution that’s right for you.