Chaos Sumo is a cloud service that transforms S3 storage into an analytics platform for historical log and event data. The Chaos Sumo service uniquely automates the discovery, organization, and indexing of log and event data and provides both an Elasticsearch and Kibana interface for analysis. With the Chaos Sumo service you can extend the functionality of your Elasticsearch cluster onto S3 for easy, inexpensive access to warm and long-term data. Chaos Sumo is a serverless service on AWS that scales with your data, automates the complexities of log and event analytics, at an unmatched price point.
The Chaos Sumo service is designed to work alongside an existing Elastic Stack architecture. When an Elasticsearch cluster becomes cost prohibitive to store log and event data for longer-term analysis, simply move these logs and events into your S3 bucket and let the Chaos Sumo service manage and index this data while providing Elasticsearch and Kibana access. Chaos Sumo conceptually turns S3 into an Elasticsearch cluster for historical (aka Warm) log and event analytics, allowing an actual Elasticsearch cluster to focus on real-time analysis.
Elasticsearch and the ELK stack are used for real-time search and fault detection of log and event data, typically in the range of 5 to 10 days. Chaos Sumo is designed to provide “long-term” log and event management and analysis in the range of weeks, months, and years, all at a significantly reduced cost. Storing gigabytes and terabytes of log and event data in Elasticsearch becomes expensive quickly, forcing users to either archive data to S3 or delete it completely. Additionally, unlike ELK, the Chaos Sumo service is the first of its kind to offer cost-effective log and event management and analytics, enabling trend, predictive, and machine learning analysis.
Chaos Sumo is an elastic cloud service running on AWS requiring no provisioning or configuration in order to scale -- a true SaaS offering. The Chaos Sumo service transforms S3 storage into a log and event analytics platform. The service was built from the ground up using leading technologies such as Scala/Akka distributed framework, Docker Swarm cluster deployment, and our own Data Edge indexing technology. Each aspect of our solution is designed for performance and scale. For more information about how it works, visit https://chaossumo.io/technology
The primary focus of Chaos Sumo is to provide simple and cost-efficient historical log and event data analysis. However, the service goes well beyond being a data analytics platform with Elasticsearch and Kibana interfaces. Chaos Sumo also provides data management capabilities such as discovery, cataloging, organization, grouping, normalization, and indexing. With Chaos Sumo, users can be confident their S3 data lake does not become a data swamp. For more information about use cases, visit https://chaossumo.io and click on Use Cases.
The Chaos Sumo service has been optimized for historical data analytics. We have identified several use cases all geared around the analysis of long-term time-based logs and events:
- Live Long-Term Log & Event Storage
- Historical Log & Event Analytics
- Log & Event Data Management
- Searchable Data Retention for Compliance
- Machine Learning
For more information about use cases, visit https://chaossumo.io and click on Use Cases.
It’s easy to get started! First, request early access to the Chaos Sumo service http://info.chaossumo.io/request-early-access. Then make sure you have:
- An existing AWS account with S3 bucket privileges
- AWS account access with read / write IAM privileges
Chaos Sumo will provide you with a customer ID for IAM configuration. See Prerequisites for more information about AWS S3 configuration.
Pricing for Chaos Sumo isn’t official yet but will be based on a data plan similar to S3 with annual tiered options ranging from 5TB to 250TB. Entry-level plans will be priced around $0.075/GB/year. Official pricing for the Chaos Sumo service will be available in late Q2.
With one click, Chaos Sumo discovers and catalogs just about any type of data found in S3 including CSV, XML, JSON, LOG, TXT and more. Chaos Sumo indexing functionality has the ability to automatically model CSV, JSON, and LOGs where the service understands many of the most common logging formats.
There are no imposed limits to the amount of data you can store or use with Chaos Sumo; Amazon S3 storage is the primary and only backing store used within the service.
Chaos Sumo specifically chose Amazon S3 as its first go-to-market storage layer. The reasons are many, including cost, scale, and simplicity. However, a major benefit is that S3 has become the de facto standard for storing log and event data either as an archive or as a temporary store before moving data into an Elasticsearch cluster; and it’s often the case that cloud services already store data in S3. As a result, data might not have to be moved out of an Elasticsearch cluster since it already resides within S3. In the case that data is only within Elasticsearch, there are several easy techniques and tools to export data to S3 as either JSON or CSV file format. In a future release, the Chaos Sumo service will discover and index archived Elasticsearch indices backed up within S3.
Aside from the Chaos Sumo API itself, Chaos Sumo allows you to access your data through two main interfaces: Amazon S3 REST API and Elasticsearch APIs. For raw data stored in Amazon S3, Chaos Sumo can act as a passthrough to S3 for most regular Bucket / Object operations. For logical views of your data created using the Chaos Sumo API, the service allows read-only access to your data via the following interfaces:
Amazon S3 REST API
- GET Service (ListAllMyBuckets)
- GET Bucket (List Objects) Version 2
- GET Object
- Multi Search
- Field Capabilities
In addition to the S3 interface support, the Chaos Sumo service has extended this API to include relational operations in an S3 type style. See question “What kind of relational queries can I do?” for more information.
The Chaos Sumo service can be accessed via our secure endpoint https://service.chaossumo.io using one of our supported REST APIs (Chaos Sumo, Amazon S3, and Elasticsearch). All incoming requests must be signed by your Chaos Sumo API access key (key ID and secret) using the Amazon Signature Version 4 signing process.
It is possible to use any standard HTTP(s) client to access Chaos Sumo. However, we generally recommend that you use a client that supports Amazon V4 request signing to help generate the request signatures automatically. Below is a sample configuration profile for the AWS CLI to connect to the Chaos Sumo service (replace X's and Y's with actual Chaos Sumo access key ID / secret, respectively):
[chaos_sumo] aws_access_key_id=XXXXXXXXXXXXXXXXXXXX aws_secret_access_key=YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY s3 = addressing_style = path signature_version = s3
And here is a sample GetObject request using the AWS CLI configured to use the profile above:
aws --profile chaos_sumo --endpoint https://service.chaossumo.io/V1/ s3api get-object --bucket [bucket] --key [key] [outfile]
Yes. Chaos Sumo is focused on extending the functionality of Elastic Stack (ELK) onto S3. The Chaos Sumo platform is independent of Elastic Stack and will work with any of the Elasticsearch-based ELK as a service companies, such as Logz.io.
As part of Elasticsearch support, the Chaos Sumo service initially supports the following text search functionality. Text search is supported both from an Elasticsearch REST based API as well as Kibana and Lucene syntax. The list will continue to grow as the service is built out:
- Exact (i.e. term) match
- Wildcard (i.e. phrase) match
- +/- operators (i.e. must, must_not) match
Chaos Sumo indexes your data and exposes it in a tabular format similar to that of other relational systems. This tabular data can be queried via two interfaces: the Chaos Sumo extensions to S3 GetObject / ListObjectsV2 operations and the Elasticsearch API. In both cases, Chaos Sumo supports:
- Point and range queries for numeric, date, and string data types (=, <, >)
- Common aggregations (all datatypes): COUNT, MIN, MAX
- Numeric aggregations: SUM, AVG, STD
- Logical operators (AND, OR, NOT)
- Order By
- Group by
Any column exposed by Chaos Sumo indexing may be referenced by the query predicates and/or aggregations.
The Chaos Sumo service has been designed for large scale, historical log and event analytics. Based on Chaos Sumo’s Data Edge technology, it has been shown that:
- Text-based queries are up to 10x faster to index and up to 2x faster to search when compared to Lucene.
- Analytic queries are up to 5x faster to index and up to 2x faster to query when compared to column stores.
The Chaos Sumo service has an elastic data fabric that scales up or down based on performance and cost metrics.
All data is 100% owned by the customer. Chaos Sumo is a data fabric and abstraction layer on top of S3. When configuring AWS for Chaos Sumo, simply create an AWS IAM Role that gives the service “read-only” access. As part of this Role, specify the location that Chaos Sumo can write its analytic metadata. You always own your data and any related information about your data.
Chaos Sumo is built on top of AWS, making your data highly available, scalable, durable, and secure. Amazon S3 provides an infrastructure to store important data and is designed for durability of 99.999999999% of objects. Your data is redundantly stored across multiple facilities and multiple devices in each facility. Chaos Sumo uses S3 as its backing store, providing you with the same security and reliability AWS ensures its customers. And because Chaos Sumo never moves your data, there is no change in its security.
Chaos Sumo is a cloud analytics service running on AWS — built for multi-tenancy and with a host of other critical differentiators — and is superior to single-tenant cloud hosted services or on-premises deployment running old-school enterprise software.
The Chaos Sumo service is deployed as a collection of Docker containers to a pool of shared compute resources. These containers are customer-specific and are isolated from one another using encrypted Docker Swarm overlay networks. All external API access to the service is done via authenticated HTTPS. Requests are signed client-side and routed by the service to the correct containers via a unique customer identifier (and are rejected if the signature cannot be verified against the customer's secret key).
The Chaos Sumo service is an elastic, serverless solution and supports all AWS regions. Wherever your S3 buckets have been provisioned, the Chaos Sumo service allocates compute EC2 resources to provide our unique discover, refine, and query functionality. There is no configuration or provisioning required. The Chaos Sumo service ensures that all S3 data access is within the same AWS region such that there is no additional cost for network/data access.
The Chaos Sumo service is backed by a new and powerful indexing technology called “Data Edge”. Data Edge is an index file format that provides both relational queries and text search in one representation. This format significantly compresses data compared to existing index technologies. Written in Scala over an Akka distributed framework, Data Edge is uniquely designed to exploit the cost efficiency of object storage such as S3, while still providing high performance and elastic scale capabilities. For example, 10TB of raw source data indexed by Chaos Sumo would typically result in a compressed data footprint of around 2TB. And with S3 pricing, Chaos Sumo enables cost disruptive historical log and event analysis.
Today Chaos Sumo is only available on AWS. However, Chaos Sumo is architected as cloud agnostic and will soon be available on Google Cloud Platform and Azure. Chaos Sumo is also integrated with Minio.io. Minio, Inc is the prime developer of Minio cloud storage stack. Minio is a cloud storage server released under Apache License v2, compatible with Amazon S3.