6 Reasons INGEST_EVAL Can Help (Or Hurt) Your Splunk Environment

As a Splunker, you’re constantly faced with questions about what can help or hurt you in Splunk. And if you attended some of this year’s .conf20 sessions, you’re probably asking yourself this question:

“Should I use INGEST_EVAL?”

The answer to this is a solid maybe.

At Splunk’s .conf20 this year, Richard Morgan and Vladimir Skoryk presented a fantastic session on different capabilities for INGEST_EVAL. When you get a chance, take a look at their presentation recording!

In this review, we’ll go through Richard and Vladmir’s presentation and discuss inspiration derived from it. These guys know what they’re talking about, now I’m giving my two-cents.

This is part one of two: in the second part, we’ll look at code samples to test some of these use cases.

Background

Splunk added the ability to perform index-time eval-style extractions in the 7.2 release. It was in the release notes, but otherwise wasn’t much discussed. It generates more buzz in the 8.1 release as these index-time eval-style extractions (say that three times quickly) support the long-awaited index-time lookups. 

The purpose of INGEST_EVAL is to allow EVAL logic on indexed fields. Traditionally in Splunk, we’d hold-off on transformations until search time — old-timers may remember Splunk branding using the term “Schema on the Fly.” Waiting for search-time is in our DNA. Yet, perhaps the ingest-time adjustments are worth investing in.

Let’s look through the key takeaways on what ingest-time eval provides. Then you can review to see it’s worth the hassle to do the prep-work to take advantage of this.

1. Selective Routing

Before you try to yank my Splunk certs away, yes, we already have a version of this capability. This is slightly different than the typical method that can send data to separate indexers, say Splunk internal logs going to a management indexer instead of the common-use one, or security logs to a parent organization’s Splunk instance.

The INGEST_EVAL version allows for selective routing based on whatever you can come up to use in an eval statement. The example from the presentation uses the match function in regex to send data from select hosts to different indexers. Ideally, this would happen on a heavy forwarder, or any other Splunk Enterprise box, before reaching the indexers. Perhaps those security logs are staying on-prem, and the rest of the logs go to Splunk Cloud.

What else could we come up with for this? If data contains a particular string, we can route that to different indexes or indexers. We already have that with transforms. But, transforms are reliant upon regex, whereas this could use eval functions. Move off large profit transaction to a separate set of indexers, if a list of special codewords appear, move it to a different indexer?

Let your imagination run on this, and you’ll find lots of possibilities.

2. Ingest log files with multiple timestamp formats

In the past, we had to dive into the depth of datetime_config.xml and roll a custom solution. INGEST_EVAL, along with if/case statements, can handle multiple timestamp formats in the same log. Brilliant. If you have ever had to deal with logs that have multiple timestamp formats (and the owners of those logs who won’t fix their rotten logs), then you’ll be thrilled to see an easy solution.

INGEST_EVAL can look at the data and search for different formats until it finds a match.

3. Synthesizing dates from raw data mixed with directory names

Sometimes we find data, often IoT or custom syslog data, where the log file only has a timestamp. In these cases, we normally see the syslog server write the file into a directory with a date name. Example: /data/poutinehuntingcyborg/2020-10-31.log 

Using INGEST_EVAL, it’s possible to create an _time that uses part of the source and part of the raw data to create a timestamp that matches what Splunk expects. A lovely solution that wasn’t so easy otherwise.

This simple trick could replace having to use ETL. 

4. Event Sampling

Using eval’s random function and an if/case statement, it is possible to send along only the percentage of events. Combine with other eval elements such as sending on one in ten login errors or one in one-thousand successful purchases.

By combining multiple eval statements, you could create a sample data set that includes data from multiple countries, different products, and different results. 

5. Event Sampling combined with Selective Routing

 Whoa.

 Sample the data, and then send the sample to test, dev, or over to your machine learning environment. This is big.

6. Dropping unwanted data from structured data

Using INGEST_EVAL, we can drop fields that we otherwise don’t need. With indexed extractions for csv and json, each column or element becomes a field. Sometimes we don’t want those extra fields.

Let’s look at an example: an excel spreadsheet exported as csv, where a user has been adding notes that are unneeded in Splunk.

In standard Splunk ingest, those notes become fields in Splunk and we have to use SPL to remove them from our searches. How often does a csv dump contain hundreds of fields, but we only care about four? (Answer: often).

Using INGEST_EVEL, we can onboard only the columns or elements that we want, and the rest poof away. Not only would this save disk space, but it makes for cleaner searching. 

My Final Thoughts

Back to our question… “Should I use INGEST_EVAL?” Again, it’s a solid maybe.

If you need to keep licensing down by only ingesting what you need, then sure. If you need to modify data beyond what sed or a regex can perform, then give it a try. INGEST_EVAL isn’t for every Splunk admin, but not every admin hunts down blogs like this.

Stay tuned for more takeaways on INGEST_EVAL in part two.

Michael Simko’s Top Five Recommended Sessions at Splunk’s .conf20

Splunk .conf20 logo

One of my favorite times of the fall is the annual Splunk user conference. The pandemic has thrown lots of conferences into disarray. The Las Vegas .conf may be off, but virtual .conf is on — and is free. And yes, free as in free, not free like someone tried to give you a dog.

The virtual conference is 20-21 October for AMER, and 21-22 for EMEA and APAC. 

Here are the top five sessions at Splunk .conf20 that I recommend my customers, colleagues, and students attend. There are many more interesting sessions across the Splunk product line and beyond (temperature scanning crowds to find the infected?). 

 

1) PLA1454C – Splunk Connect for Syslog: Extending the Platform 

Splunk Connect for Syslog is an outstanding system for onboarding syslog data into Splunk. Traditionally, Splunk uses a third-party syslog to write data to disk, and then a Universal Forwarder to read that data and send it to Splunk. This has worked well but requires building the syslog server and understanding enough of the syslog rules to configure the data correctly.  

Enter Splunk Connect for Syslog, which handles the syslog configuration, sends the data to Splunk, and for many known sourcetypes makes the onboarding process a snap. 

 

What I like best: This came from engineers looking at a problem and making things better.

 

2) PLA1154C – Advanced pipeline configurations with INGEST_EVAL and CLONE_SOURCETYPE

Eval is powerful way to create, modify, and mask data within Splunk. Traditionally it is performed at search time. This session shows methods for using INGEST_EVAL to perform eval logic as the data in being boarded. This helps with event enrichment, removing unwanted fields, event sampling, and many more uses.  

 

What I like best: INGEST_EVAL opens a world of more control in Core Splunk.

 

3) SEC1392C – Simulated Adversary Techniques Datasets for Splunk

The Splunk Security Research Team has developed test data for simulating attacks and testing defenses in Splunk. In this session, they are going to share this data and explain how to use it to improve detecting attacks.

 

What I like best: Great test data is hard to come by, much less security test data.

 

4) PLA1129A – What’s new in Splunk Cloud & Enterprise

This session shows off the newest additions to Splunk Cloud and Splunk Enterprise. Each year these sessions show the new features that have arrived either in the last year or in new versions that often coincide with Splunk .conf.

What I like best: New toys to play with.

 

5) SEC1391C – Full Speed Ahead with Risk-Based Alerting (RBA)

I’ve talked to several customers who wanted to use a risk-based alerting (RBA) system for their primary defenses. Traditional methods require lots of tuning to avoid flooding the security staff with too many alerts. RBA is a method to aggregate elements together and then present the findings in an easier-to-consume method.

 

What I like best: Another option on how to approach security response.

 

Bonus Sessions: You didn’t think I could really stop at five, did you?

TRU1537C – Hardened Splunk: A Crash Course in Making Splunk Environments More Secure

TRU1276C – Splunk Dashboard Journey: Past Present and Future

TRU1761C – Master joining your datasets without using join. How to build amazing reports across multiple datasets without sacrificing performance

TRU1143C – Splunk > Clara-fication: Job Inspector

 

Join us!

Our KGI team will be on board for .conf20 and we’re more excited than ever to attend with you. With over 200 virtual sessions at Splunk’s .conf20 event, this year is going to be BIG. With exciting updates to Splunk and grand reveal on new product features… Kinney Group is ready to help Splunkers along the way.

Keep your ears perked for some big, Splunk related announcements coming your way from Team KGI this month…

The Needle in the Haystack: Missing Data in Splunk

Splunk has wonderful charts, graphs, and even d3.js visualizations to impart data in an easily understandable fashion. Often, these graphical representations of the data are what users focus on. Decisions are made and budgets determined due to how the data appears in these visualizations. It’s safe to say, the accuracy of the data that supports these visuals needs to be spot on.

Visualize Your Data

Splunk brings value through its visualization features. However, for the visuals to be meaningful, the data has to be accurate and complete. This highlights a challenge: focusing on visualizations often masks incomplete data. Pie charts appear to have all the data. representing the data as a “full” circle, even if we are missing data in Splunk. However, that pie chart of external connections to our network is inaccurate if it’s missing one of our firewalls. For example, a security control for “3 fails with 60 minutes per user” is compromised when a third of the data isn’t arriving in Splunk. Let’s take a look at some steps to find that missing data…

Figure 1 - Pie chart missing data in Splunk
Figure 1 – Pie chart missing data in Splunk

Find Your Missing Data

Step 1: Create a list of all the data coming into Splunk. Using an account that can search all the indexes, run the following:

| metadata type=sourcetypes index=* | fields sourcetype, totalCount | sort - totalCount

 

 

Figure 2 - Metadata in Splunk
Figure 2 – Metadata in Splunk

Step 2: Export the table that was the resulted from the previous step. (Good thing there’s an export button in the Splunk UI!)

Step 3: Send the results to your major teams and ask them, “What’s missing from this list?” When you’re thinking about teams to send this to, think Networking Team, Security Team, Windows Operations, Unix Operations, Applications Teams, etc.

Step 4: Gather a list of which systems and system types are missing and investigate. Is this data that you can onboard?

Example: Networking looks at your list of sources and realizes it is missing the Juniper VPN. The Networking team sends the FW logs to a syslog server while the Splunk team loads the configs that will handle parsing and search.

Figure 3 - Pie chart showing all sources in Splunk
Figure 3 – Pie chart showing all sources in Splunk

There’s Your Needle

Collecting and maintaining the correct data sets can be a difficult task. From collaborating with many teams to finding the needle in the haystack of missing data, you’ve got your work cut out for you.

At Kinney Group, we’ve spent years finding the proverbial Splunk needle amongst a ton of data. Ensuring that you are ingesting the right data in the right way is one of our Splunk specialties. If you have trouble finding missing data or spinning up the right Splunk visualizations, we’re here to help!

Empower IT Operations with Splunk MLTK for Automated Insights

Automated pattern discovery against large data sets is now commonly called AIOPS. Read on as we explore the ways AIOPS can be facilitated by Splunk to uncover meaningful insights for Operations.

IT Operations History

IT Operations has traditionally been the domain of silo tools that specialize in one area of operations and are lacking or non-existent in others. Operations personnel then had to open each of these tools and understand the data contained within. IT Operations was time consuming and often had to rely on instinct and gut-feelings instead of being based on evidence.

With the rise of big data, we saw platforms like Hadoop and tools like Splunk come along and help greatly with reducing the need for separate silos. With making data available to operations as a whole, each member was empowered to gain insights faster. Operations personnel codify their expertise and create alerts and searches, which then share that experience with others in their organization.

The next progression is to step into machine learning. That is, initial algorithms are set, and then the programs running the algorithms use actual data to gain increased understanding of the data. In short, the machine learns how best to understand the data, and it then uses that data to make actionable insights. This last part sounds like something available to us only in the distant future, but it is actually available today within Splunk using the Machine Learning Toolkit (MLTK). Using the Splunk MLTK, operations personnel are able to reap the rewards that comes from AIOPS.

AIOPS

There are multiple ways to define what AIOPS is. The original acronym would define AIOPS as Artificial Intelligence Operations, but the term has deviated enough in industry that we’re going to back off the artificial intelligence side and focus on the machine learning side. After all, we’re less hunting for Sarah Connor, and more wanting to know when our hardware is going to crash. And since we are focusing on Splunk, we can look at the MLTK.

Machine Learning & Data Science

The gist of machine learning is to provide systems with the ability to learn. That is, we give the systems algorithms to start with, and they can adapt based upon data, make classifications, and make decisions with little to no human intervention.

The Splunk Machine Learning Toolkit

The MLTK is a Splunk app, which is free by the way, that helps to create, validate, manage, and most importantly, operationalize, machine learning models. The MLTK includes a variety of algorithms including several hundred from the Python for Scientific Computing Library, that give the power to try different algorithms to find the right insights for your data.

Two Example Scenarios
  • Resource Management — when we’ll need more capacity
  • Systems breaking — identify the items that are indicative of forthcoming system failures

 

Looking Forward with Splunk MLTK

We are in a new day and age of IT Operations, where many manual processes can start to be automated with the help of these tools. Putting the power of Splunk’s MLTK into the hands of your IT Operations personnel can empower them to begin a transition to a more automated approach to their everyday work. Such as, being able to investigate and troubleshoot a problem before you even see the effects of what may be going on. This approach is not mainstream—and may be daunting to some—but now is the time to get a grasp on the next generation of IT Operations.

Want to know what Splunk MLTK do for you and your organization? You can actually get access to Kinney Group’s deep bench of Splunk experts, on demand. Check out our Expertise on Demand for Splunk service offering for more information on our various packages and let us know how we can help unleash the power of Splunk.

About Kinney Group’s Splunk Practice:

Splunk AwardThe Kinney Group team has the deepest bench of Splunk expertise in North America. Our team provides a comprehensive Splunk customer experience across multiple disciplines including Splunk Enterprise, Splunk Enterprise Security (ES), IT Services Intelligence (ITSI), and custom use cases in the areas of compliance, IoT, and machine learning. Kinney Group highlights include:

  • A Top Global Splunk Professional Services Practice
  • Splunk Elite Partner
  • Splunk Public Sector Services Partner of the Year
  • Experience with 300+ projects delivered nationwide and overseas
  • Application development expertise for the Splunk platform

Visit www.kinneygroup.com/contact-us or call us at (317) 721-0500.

Leverage Splunk at Scale, Fast with Pure Storage

We live in the age of the operational intelligence platform. This technology undeniable for organizations because it is making machine data accessible, usable, and valuable to everyone. By harnessing platforms like Splunk, organizations can tear down departmental silos enterprise-wide, and can creatively ask questions of all their data to maximize opportunities. To make this a reality, hardware and infrastructure is a critical consideration.

Leveraging traditional storage approaches for Splunk at scale is under siege. One of the greatest risks to enterprise-wide adoption of Splunk is inadequate, under-sized, or non-performant hardware. Organizations frequently want to repurpose existing and aging hardware, leaving Splunk customers dissatisfied with the implementation, and possibly the entire platform.

Good news. A superior hardware approach is here: running Splunk on Pure Storage FlashStack.

Get the Data Sheet: Click Below

Imagine an all-flash reference architecture that enables true harnessing of a Splunk deployment (technically, and for the bottom-line):

  • Smarter Computing. 5x greater efficiency at the compute layer
  • Operationally Efficient. 10x greater efficiency in rackspace, power, heating, and cooling compared to equal disk-based solution
  • Uniquely Virtualized. On a 100% virtualized environment with native Pure + VMware High Availability features
  • Smarter Spending. Higher ROI on hardware, so the same budget can be reallocated to further harnessing the power of Splunk at scale

The result is true competitive advantage as an organization achieves improved simplicity and performance while lowering the Total Cost of Ownership (TCO) of enterprise Splunk deployments. Splunk on Pure Storage Flashstack empowers organizations to manage large Splunk instances as they journey toward the analytics-driven, software-defined enterprise.

Two Analytics Platforms Synergize for Holistic Application Monitoring

Pairing Splunk and AppDynamics

We now live in the era of the “software-defined enterprise”. Software applications represent the key enablers for commercial businesses and public sector organizations. Applications are no longer just enablers for back-office processes. Today, software applications are now the “face of the organization” to customers, partners, and also internal co-workers.

The era when customers would tolerate application failures being fixed in hours, days, or weeks are long gone. Today’s constituencies expect applications to be “always on”, and problems identified and resolved in minutes (if not quicker).

The ability to leverage analytics to support critical applications within the software-defined enterprise will define the winners and losers in the market. The power of IT operations analytics holds promise as the enabler for dramatically reducing Mean Time to Repair metrics for critical applications, regardless of where a problem exists.

The paragraphs that follow will provide insights into a proven approach for leveraging the power of analytics to identify and solve application problems quickly and to win in the market as a software defined enterprise.

A One-Two Approach: Winning Against Problematic Application Stacks

Pinpointing problems with large, distributed, and often legacy application stacks is difficult. Troubleshooting and identifying the underlying cause of internal and external customer facing problems can often take weeks or months. The result for organizations unable to solve applications problems is negative. End-user satisfaction goes down and precious customers can be lost forever. Organizations feel the pressure of hectic customer support war rooms, missed goals, and upset leadership and investors. Time is money; inefficiency and downtime for mission critical systems means lost revenue and angry customers.

But, there is hope. It’s a new day in analytics, and several solutions have entered the market recently that attempt to reduce Mean Time to Identify (MTTI) and Mean Time to Repair (MTTR) metrics for application troubleshooting with varying levels of success.

The bottom-line: in order for organizations to get the full picture and achieve holistic application stack monitoring, they need to use Splunk and AppDynamics for a cohesive view of their entire application stack. Splunk can natively see across the application stack to point to an issue. Then, AppDynamics can drill down and see into the proverbial “black box” (as illustrated in Figure 1), which is typically a database layer, the application layer, and UI/ Web layer.

Splunk can see around the “black box”, and AppDynamics can see into it.

Figure 1: Splunk can see around the “black box”, and AppDynamics can see into it.

Where Splunk Ends AppDynamics Begins, and Vice Versa

Splunk and AppDynamics can artistically be woven together to build a cohesive analytics solution for end-to-end application visibility. Here’s how.

Splunk Pros and Cons

Arguably, the most flexible tool to address application stack monitoring is a platform called Splunk. Entering the market in 2005 initially as a type of “Google” for monitoring, Splunk software quickly evolved into a flexible and scalable platform for solving application problems. It also emerged as a platform with a robust and configurable user interface, touting sleek data visualization capabilities. Those qualities have allowed it to become a standardized platform in application stack monitoring teams. How is Splunk better than the rest? There are two main reasons.

First, Splunk’s ability to correlate disparate log sources allows it to identify and find issues in tiered applications. Applications are commonly written in very different languages. Thus, they have few logging similarities in structure, content, or methodologies. For most traditional monitoring tools, configuring data source setups is labor intensive and needs to be aggressively maintained if the application or its environment changes. On the other hand, Splunk is elite in dealing with these differences “on the fly”, as it is able to monitor these disparate log sources in real-time as the data is consumed. Splunk’s advantage is because it can provide very flexible, reporting driven schemas as the data is searched. This is important with legacy applications due to limited standardization, especially in the application layer where most of the business logic and “glue” code resides for an application to work.

Second, Splunk is easy to use for monitoring around an application, particularly in the networking, infrastructure, and Operating System (OS) layers, it has standard configurations which are fast to implement and where one can start deriving technical and business insights quickly. The areas where Splunk is the straightforward solution in IT Operations Analytics includes networking, operating system, storage, hypervisor, compute, load balancers, and firewalls.

Where does Splunk need assistance? With deep application performance monitoring in complex, highly distributed environments. This is because many mission-critical applications cannot be easily updated, and doing so is often too labor intensive (or impossible) to use the application logs to derive insights into problems. While legacy approaches to solving these monitoring problems are under siege, their existence is a reality as organizations transform. Splunk’s answer to this issue is in Splunkbase, the community for certified apps and add-ons. There is the Splunk App for Stream to monitor, ingress, and egress communicate points between the layers in the application stacks, database to application, and then application to UI/ Web layer. Still, with Splunk App for Stream this is deficient when compared to AppDynamics because monitoring “around” a problem only describes the downstream impacts, it cannot pinpoint the actual problem quickly.

BusinessTransactions

Figure 2: Pairing Splunk and AppDynamics achieves unparalleled visibility into the entire infrastructure (Splunk) while providing unified monitoring of business transactions to pinpoint issues (AppDynamics).

AppDynamics Pros and Cons

AppDynamics entered the marketplace in 2009 with a simple purpose: be the best for addressing deficiencies in application stack monitoring options, particularly for large, distributed, and often legacy application stack monitoring. They monitor the business transactions, which are the backbone of any application. In doing so, they found a common auditing language that transcends database, application, and UI/ Web layers, including full support for legacy applications, provided the application language is one that AppDynamics supports. You can access a list of languages and system requirements here.

A primary AppDynamics differentiator is that it has the innate ability to understand what “normal” looks like in an environment. The platform automatically accounts for all of the discrete calls that run through an application. Then, it can find bottlenecks by identifying application segments, machines, application calls, and even lines of code that are problematic. Unlike other Application Monitoring Tools (APMs), AppDynamics can monitor the application from the end user point of view.

Regarding business value, what does AppDynamics bring that Splunk cannot? As the application is updated as part of a normal software development cadence, AppDynamics agents will then autodiscover again, saving time on professional services and money on re-customizing monitoring. Conversely, the Splunk App for Stream can require re-customization as application code and topology is updated.

AppDynamics does need some augmentation its counterpart, Splunk, in looking outside of an application at the full stack. If the underlying problem is not with the code, but with the functionality of the environment, such as storage, networking, compute, or the operating system, AppDynamics cannot do in-depth problem diagnosis on broader infrastructure components. Instead, the traditional approach is that APM teams use several, narrowly focused “point tools” to monitor each layer, which causes silos within teams. To skip the silos, cue Splunk. Its sweet spot is as a “Single Pane of Glass” where it can tie together its own visibility and the visibility provided by AppDynamics to identify where in the massive environment the problem lies.

So, where Splunk ends AppDynamics begins, and vice versa.

Skip the Silos: Splunk and AppDynamics Synergize for a Holistic Approach

Splunk and AppDynamics both interact with the application infrastructure in a way that is straightforward to setup, easy to maintain, and can deliver fast time-to-value. By visualizing the output of these two platforms in Splunk, teams achieve a “single pane of glass” monitoring approach that gives the business a real-time, holistic view into distributed, complex application stacks.Spunk-ITSILayout

Figure 3: Visualizing the output of these two platforms together in Splunk, teams achieve a “single pane of glass” for applications and the infrastructure.

Pairing together the analytics platform synergies of Splunk and AppDynamics to achieve holistic application stack monitoring for the mission will reduce MTTI and MTTR. The organization will observe reliable, sustainable ROI as applications and the environment evolve with the inevitable business transformation. Leveraging machine data in real-time is the cutting edge in analytics and empowers organizations to creatively scrutinize all their data in an automated, continuous, and contextual way to maximize insights and opportunities.

About Kinney Group

Kinney Group is a cloud solutions integrator harnessing the power of IT in the cloud to improve lives. Automation is in Kinney Group’s DNA, enabling the company to integrate the most advanced security, analytics, and infrastructure technologies. We deliver an optimized solution powering IT-driven business processes in the cloud for federal agencies and Fortune 1000 companies.

Splunk License Estimations

Over the years I have searched for a tool that will allow me to size a customer’s Splunk license quickly and accurately. I have even attempted to manually and accurately complete this task several times in the past, but I have failed.

Estimating the Splunk data volume within an environment is not an easy task due to several factors: number of devices, logging level set on devices, data types collected per device, user levels on devices, load volumes on devices, volatility of all data sources, not knowing what the end logging level will be, not knowing which events can be discarded, and many more.

As you begin the process of planning and implementing the Splunk environment, understand that the license size can be increased and the Splunk environment can be expanded quickly and easily if Splunk best practices are followed.

Here is my tested and approved, 7-step process on how to determine what size Splunk license is needed:

  1. Identify and prioritize the data types within the environment.
  2. Install the free license version of Splunk.
  3. Take the highest priority data type and start ingesting its data into Splunk, making sure to start adding servers/devices slowly so the data volume does not exceed the license.  If data volumes are too high, pick a couple of servers/devices from the different types, areas, or locations to get a good representation of the servers/devices.
  4. Review the data to ensure that the correct data is coming in. If there is unnecessary data being ingested, that data can be dropped to further optimize the Splunk implementation.
  5. Make any adjustments to the Splunk configurations needed, and then watch the data volume over the next week to see the high, low, and average size of the data per server/device.
  6. Take these numbers and calculate them against the total number of servers/devices to find the total data volume for this data type.
  7. Repeat this process for the other data types listed until you are completed.

 

If you would like to accelerate this process you can work with Splunk or a Splunk partner to get a larger, temporary license to do your testing.

Good luck with your Splunk implementing and continue Splunking.

Pure Storage: the Full Promise of Flash Storage

Buying storage is like watching car ads on television. Oh, a celebrity likes that model, that one has models telling us we call sports by the wrong names, and that one is so testosterone filled that my beard grew just by watching the commercial.

That’s exactly what buying storage is about. Say you want (nay, need) a new truck. You look at the main suppliers: Ford, Chevy/GMC, and Dodge. Then you check out “the new kids” on the block: Toyota and Nissan. You might even look at other options like the ultra-high International, or something so ugly it’s cool like an old Unimog.  The vendors will bring out stats upon stats with what they can do. “We have best in class towing”, “We have the best fuel mileage”, and “We are so cool that after riding in our truck you can wrestle bears.”

So now we come to buying storage. “We are the most sold” – great, but I only want one. “We do 1.2 million IOPS” – okay, that does sound cool. “We are the fastest per BTU” – Gee, that’s something to brag about.

Then every so often a disruptor shows up. Maybe the new trucks run on hydrogen or all-electric. The first disruptors are sweet and expensive. Then affordable versions arrive — that’s when it gets interesting.

In storage that disruptor is flash. The All-flash arrays have shown up in strength and are as awesome to have as you might imagine — assuming you spend your time thinking about storage.

At first the old crowd taunts the new trucks and tries to discredit them (all while secretly working on their own). That’s what happened in storage. Now that the big storage boys are all entering the all-flash array space they have stopped calling it a gimmick.

gimmick, noun — Anything you do better than we do.

Now that the market has become mainstream in acceptable, where do you go for a device? Right to the people who do it best at a price that isn’t premium — Pure Storage.

Why Pure Storage instead of Vendor X, Y, Z, and the traditional vendors that are making their appearances? Our investigation led us to Pure Storage as the best all-flash array: proven track record (who wants to use beta-worthy gear?), huge feature set, simplicity (the user guide is on a business card), awesome colors (people buy vehicles for less valid reasons), and performance in the real world.

Pure delivers this flash power at costs comparable to spinning disk. Flash (the super great new thing) at the cost of the traditional disk. I love my trucks, but if someone wants to upgrade my vehicle, who am I to argue?

Pure Storage handles the toughest workloads, such as Database and VDI boot/patch storms, like traditional storage handles file requests. We have customers that put their Pure Storage boxes to the test on heavy workloads all day, every day, and their bottlenecks are no longer storage. The bottlenecks are now how fast can they throw data at their storage.

Pure Storage handles the toughest workloads, such as Database and VDI boot/patch storms, like traditional storage handles file requests.

When Pure Storage built their array to handle Flash from the ground up they did more than just slap in flash disks. They looked at what is painful and obscure in storage and made it easy.

Remember when dealers would sap you for routine maintenance to make all the parts of your vehicles to keep working? With traditional storage your admins need the level of knowledge as those dealer’s mechanics. With Pure Storage all the sensors, parts to maintain, and software have built-in know how. Those trips to the dealer aren’t needed. Pure Storage does the hard work internally — picture your truck giving itself oil changes.

Pure Storage has brought the full promise of flash storage, and we can get the luxury model for the cost of your traditional storage. And as other’s have said, “It is built Pure tough.”

Note: No trucks were harmed in the authoring of this blog post.

Splunk ONTAP – Not Just a Tongue Twister

The Splunk App for NetApp Data ONTAP isn’t just a great way to integrate NetApp systems into your enterprise monitoring and logging solutions; it’s also a great tongue twister. Go ahead, try: “Splunk App for NetApp Data ONTAP”.  The Splunk App brings in data from all the NetApp FAS systems to give insights into what is happening in the enterprise, in a real-time capacity. It’s a great tool to watch NetApps and proactively identify problems.

Doesn’t NetApp already have a tool like this? Of course! NetApp makes great software. Yet, the big advantage of using the Splunk App is the ability to reach into other tiers of the infrastructure. The NetApp administrator isn’t about to let the JBOSS developer reach into his/her storage systems. The Network admin sure isn’t letting anyone else log on their server.

[perfectpullquote align=”right” cite=”Michael Simko” link=”” color=”#ED8B00″ class=”” size=””]…the big advantage of using the Splunk App is the ability to reach into other tiers of the infrastructure.[/perfectpullquote]

This app combines NetApp ONTAP (7 mode or cluster mode) system information, performance, and configuration with the rest of the systems in an enterprise to give a complete view of your infrastructure.

Now, what does that mean in non-sales talk? Simple: It lets every admin see where problems are hiding. The VMware admin can see if “that storage stuff” is where their problems exist. It lets the Network team see that the JBOSS server is blowing up without spending too much time chasing down issues. The app lets the storage admin find which of his 24 FAS is running slow. When Splunk brings in data from the hypervisors, servers, storage, networking systems, and environmental systems (hey, why not get told when you are on battery power or the air conditioning stops working?) it gives a real view of the entire stack.

Key capabilities of  the Splunk App for NetApp Data ONTAP:

  1. Overview of all 7-mode Filers
  2. Overview of all Cluster-mode Filers
  3. Details on NetApp FAS entities: Aggregates, Disks, Volumes, qTree, LUN.
  4. 36 built-in reports focusing on things from Failed Disks to Aggregates that have too high of capacity.

 

Inside the Splunk App for NetApp Data ONTAP (Version 2).

We start with 7-mode filers

filer_overview_7mode

and cluster mode-filer details.

filer_overview_clustermode

The overview page also has overview portions for each NetApp entity.

aggregates_mainpage

Drill-ins are available for each NetApp entity, such as the volumes

volume_detail

and aggregates here.

aggregate_detail

 

Wildcards and Automation Naming Conventions

Automation and configuration management tools are wonderful creatures. They come in many varieties including BMC BladeLogic, Puppet, Salt, Chef, Ansible, Urban Code, etc. Implemented correctly, these tools can take days of manual effort down to minutes with a simple, wizard-like setup.

Splunk is delivered with the optional, and free, Deployment Server (See: Splunk Documentation). The Splunk deployment server is a limited use configuration management system that distributes application configuration across Splunk distributed architectures. Among other uses, we implement deployment servers to deploy input/output configurations to forwarders, props and transformations to Splunk indexers, and applications to Splunk Search Heads.

Automation and configuration management tools create the most wondrous of problems. These tools are neither beneficial nor malevolent. Automation implements instructions regardless of the quality of those mandates. (Go ahead and ask how I know that…)

One day I was deploying a Splunk environment in our lab and I did what any good computer guy does — I borrowed working configurations. (Don’t judge, how many of us thought to make the wheel round on our own?) I built a new Splunk Index Server and Splunk Search Head, and named them <prefix>SplunkIndex and <prefix>SplunkSearch. I installed Splunk, hooked up the Indexer, and then enabled the Deployment Server. I copied applications to the deployment-apps directory on the deployment server, and then reloaded the deploy-server.

My forwarders, indexer, and search head all received their application configurations and data started flowing into my new Splunk Instance. It was great — until it wasn’t.

After a few minutes my Splunk indexes stopped reporting any new events. The Splunk Indexer was still online. The services were running on the indexer and on the forwarders. New apps were still being deployed successfully to the forwarders. I checked outputs.conf on the forwarders, and even cycled those services to no avail. On the indexer “netstat -na | grep 8089” showed connections from the forwarders — for a while. Then the connections would go stale and the ephemeral ports froze.  In splunkd.log I found references to frozen connections. The forwarders ceased transferring data to the indexer and declared the indexer frozen.

You win a brownie** if you know what was going on by this point in the story.

The key to this story is that the deployment server managed a base config application. In the name of automation, this basic config deployed an outputs.conf to every server. However, the person I copied my configs from had the foresight to blacklist the Splunk Index server so it wouldn’t try to send outputs to itself (which can result in a really ugly loop). The configurations were fine until someone (ok, me) changed the names of the Splunk Index server by adding a prefix to splunkindex instead of a suffix (in my defense, it looked better in vCenter).  The blacklist controlling which servers get the outputs.conf listed splunkindex*. If I had used a suffix the indexer wouldn’t have received the outputs.conf, and hence wouldn’t have entered the computer version of an endless self-hug.

But, I decided to get cute on my naming convention and was rewarded with a very nice learning opportunity.

The takeaway, be like Santa: Check your (white and black) lists twice before deploying applications to your environment.

** A figurative brownie in case you wondered.