Physics Speed
  • slidebg1

    Physics Speed

    OPTIMAL PERFORMANCE · FOR BIG DATA ANALYTICS
  • SPEED LIMITED ONLY BY PHYSICS

    NOT JUST FAST. AS FAST AS THEORETICALLY POSSIBLE
  • slidebg1

    FASTER INSIGHTS · FOR LESS

    IMPROVE POSTGRES 1000X · GREENPLUM 400X · VERTICA 200X · SPARK 100X
  • slidebg1

    WORKS WITH YOU

    Unlock the hidden potential of your existing infrastructure. Plugs into your database.

It started with baseball

In the late 1990s, the new owners of the Oakland Athletics had to cut payroll. So GM Billy Beane used statistics to discover undervalued players.

It turned out that certain players were good at getting walks or stealing bases. While that wasn't flashy, it correlated with winning games. From 2000 to 2003, Oakland reached the playoffs, and had the most wins per payroll dollar of any team.



Which led to a tectonic shift in decision making

Just a few years ago, critical decisions were made by experts, whose wisdom was hard won over years of experience. But now, decisions informed by analyzing vast amounts of data are proving more effective. Increasingly, the depth, breadth, agility, and speed of data analytics are a strategic differentiator and a critical success factor

So where's the party?

We have the data. We have the computing resources. Yet too few of us are breaking out the champagne. What's wrong?

Pain in the Analytics

Our analytic engines are letting us down. They make it too hard to load new streams of data. They impose a high cost for changing modeling decisions. They are inflexible, costly, imperialistic, sluggish, and frankly, they have too much attitude.

Not that they should. Analytic optimizers were designed in the 1980s. They're not aware of the capabilities of your modern infrastructure. They don't leverage SIMD processing or vector instructions. They don't treat SSD any differently than disk. They could care less about CPU cache latencies or capacities.

So to conform to the limitations our engines impose on response time and resource provisioning, we're forced into hacks and shortcuts. Often we have to subset our data, running the risk of losing valuable cross-subset correlations. What a mess!

Breakthrough in analytic performance

Physics Speed is an analysis engine that unlocks the power of your existing hardware, to achieve performance that is as fast as the laws of physics permit. It takes a new approach to query processing, that focuses on streaming data flows.

Where other analytic engines start with a formalism like relational algebra and optimize, Physics Speed starts with the actual hardware characteristics and systematically eliminates processing bottlenecks. So analysis runs as fast as the hardware allows.

How fast is that? Across several industry standard benchmarks, Physics Speed was consistently more than ten times faster, on hardware that was much less expensive. The table at the top shows some example results on 100GB of data. That's tiny of course, but we got tired of waiting for the unaccelerated engines to answer questions on as little as 1TB of data.

The summary table on the bottom compares price and performance across simple queries ("scan and aggregate") and complex queries. The important point is that you can dial the knob between high performance and lower cost to fit your fancy. Run on equivalent hardware and go 50 times faster. Or dial your costs down while maintaining the same performance. Or strike a balance in beteen.

Example QueriesPhysics SpeedGreenplumRedshiftSpark 2.0
tpcds 28 sf1003.2 sec14156.5
tpcds 34 sf1001.3 sec31.712.230
tpcds 53 sf1001.0 sec29.426.121
tpcds 63 sf1001.0 sec28.617.215
tpcds 90 sf1000.3 sec14.210.8


Physics SpeedGreenplumRedshiftVerticaNetezzaSpark 2.0
hardware cost9x cheaper2x cheaper3x cheaper100x cheaper8x cheaper
speed summary
  Scan & Agg40x faster30x faster80x faster3x faster
  Complex Joins16x faster10x faster20x faster
total advantage250x better40x better240x better300x better160x better

About Us

Physics Speed was created to deliver the "AHA!" feeling that comes with sudden insight, to give flight to your curiosity, to rise above the dust and clouds, so you can see all that can be seen.


We want to sweep away the drudgery and clear the runway, because nothing should come between your ideas and your data.


So we built an ANALYTIC ENGINE that executes your experiments as fast as physics, and as easy as pie.


Team

Foster Hinshaw

Foster founded Netezza, introducing data warehouse appliances to the market. He has a B.S. and Masters in Engineering from Cornell University, an MBA from Harvard Business School, and a long track record of success.


Craig Harris

Craig co-founded Ontologic, the first object database company to use C++ as its DML. Then he created PulseTrak which delivered a sentiment extraction service. Craig teemed with Foster at Netezza and was previously an architect at Oracle. He has a B.A. from Dartmouth College


Richard Cownie

Richard has long experience in algorithm and data structure optimizations, and deep hardware knowledge. He worked with Craig at Oracle. Richard has a B.A. in Math from Cambridge University and a M.Sc. in Computer Science from Edinburgh University.

Physics Speed Benefits

Deeper insights

What could you learn mashing up ten times the amount of data, with five times more analytic functions?

In less time

Projects finish faster. Not just because the queries go faster. Spend less time loading and tuning and rewriting.

At lower cost

Unleash the power of your existing hardware, with multi-threading, AVX instructions, storage tiering and more.

With less complexity

Native performance means less time futzing with indexes, tablespaces, and other arcana.

and your own magic

Plug your own proprietary algorithms right into the data flow. We bring the data to you.

that fits right in

Connect with your existing tools. Access your existing data.

Use Cases

 

Amp up the Awesome!!!

You've seen it all, and you still need more. More speed, more power, more "right now" and less "get back to you tomorrow". Ninety-nine percent of the world's analytic needs are served by systems that are good enough. Lucky you. Welcome to the 1% of people who need something 10 times faster. Not that you wouldn't take more.

A large semiconductor fabricator reaps $millions in profits for every 0.1% improvement in yield per month. So figuring out what's going wrong when spinning up a new design is part of their business DNA. But with hundreds of steps and machines and environmental factors, the retrospective cohort analysis operates over huges amounts of data, and requires days of processing time for each experiment. By accelerating their pattern analysis, Physics Speed shortens their time to market.

A major telecommunications company carries most of the telephone traffic across Europe and North Africa. But they don't do it alone. Each call flows through a series of carriers across a number of switches with a number of segments. Physics Speed helps them stitch together the segments of a phone call to assure that each carrier gets revenue for the portion of the call that they carry.

Keepin' it real

Your organization wants the flexibility and economics that come with operating in the cloud. You just want to avoid the fee fie foe fum. Existing applications should run the way they do now. You would even sacrifice some speed in exchange for low cost reliability, and easy dev ops.

A young customer marketing business had a whale of a client that they they served on some expensive hardware. As they grew they needed a lower cost way to support smaller clients, without bifurcating their core product.

A mobile marketing business was already in the cloud, designed that way from day one. But as their revenues grew, their operating costs grew even faster. They had chosen a popular MPP column store for their analytics but keeping it all balanced and online was becoming prohibitively expensive. They could shut it down, but starting it up was taking longer. They needed a solution that scaled, but wanted sub-linear costs.

You like your existing ride just fine. But the data has grown faster that anyone guessed, and you're flat outta gas. You need to find a home for your older data, your newer unvetted experimental data, and the data of your smaller customers. The solution has to fit within your budget, and everything has to keep working more or less the way it does now, preferably without providing lifetime employment to a team in India.

A cable operator happily ran all their analytics on a Netezza TwinFin(tm) appliance. Until they ran out of space. Their budget didn't have room for another rack, and they couldn't throw away their data. They tried a hadoop-based solution, but six months later, they were still trying to get their applications to run on the new platform. And the performance? It felt like their transmission was shot: a jerk forward, then a stall.

An investment bank had a team of quants happily using a pricey MPP appliance. Then their CIO ramped it up a notch, but they were two years away from another MPP buy. They had plenty of servers available. But no plug-compatible way to scale out, while incorporating their special investment juju.

Product Offerings

 

 

 

Performance Enginetm

An analytic query engine accelerator that plugs into an existing DBMS, to unleash the performance potential of your hardware, so your experiments run as fast as the laws of physics permit. Of course your existing applications don't change at all, except for running faster.



Plugs into your DBMS

The core component of Physics Speed is a shared library that plugs into an existing DBMS, including open source databases like Postgres (single-node) or Greenplum (multi-node). Physics Speed accelerates analytic queries by generating C++ code to execute portions of the query. This code is compiled and dynamically linked into the host DBMS as a "user-defined function".

Physics Speed understands how to accelerate basic operations like scans, joins, sorts, aggregations and set operations. If more complex constructs are found, like text processing or windowed aggregations Physics Speed falls back to the host DBMS for processing. In most cases, Physics Speed handles the whole query

The diagram to the right shows where Physics Speed's shared library plugs into the host DBMS query stack. The generated code can be compiled with either clang or g++. The resulting shared libraries are cached so that similar queries can reuse previously compiled code.

Sometimes the compilation process can take a second or two, so Physics Speed is not a good choice for applications that need sub-second response times for "find the needle in the haystack" queries, or for very small databases.

Via the Federation Layer

Most database vendors are able to access other vendors' databases through a "federation" layer. For example, this layer allows an IBM DB2 system to query a Hadoop/Spark system through the DB2 SQL API. In a federated system, the remote DBMS is "wrapped" in a way that tells the host DBMS which pieces of data it contains. The host DBMS then divides up its plan for answering a question, pushing a portion of the plan to the remote system.

Physics Speed plugs into a DBMS as a virtual remote database. It claims to contain data that directly answers a portion (or the whole) of the original query. It doesn't actually contain that data. Instead, it quickly generates a C++ program that produces the results to the query, and returns it as if it had the data stored on disk all along.

Where does it get the actual data? In some cases, from the host dbms itself. For systems with slower, row-oriented storage, Physics Speed also offers a highly compressed column store. Because Physics Speed operates directly against its compressed data, this typically results in an additional 10x performance boost.

Supports Distributed Computation

The Performance Engine parallelizes your analytics over the cores in a processor, and across servers. As your data and processing needs expand and contract, you can add or remove servers to maintain acceptable cost and performance. The engine takes care of moving code and data across the network to balance resource usage. So you get the linear scalability you expect from an MPP appliance with the flexible elasticity you have with a map-reduce architecture.

If you already have a system, such as Greenplum, which provides parallel query processing, then the Physics Speed Performance Engine plugs right in and uses the existing system's data shuffling mechanisms.

With the Performance Engine, you don't have to guess your peak load years in advance, paying for more resources than you need most of the time, or risk falling short just when the going gets tough. Instead, you can pay for what you're using right now. And then ramp up or down as workloads change.

PathScopetm

Some types of analytics don't fit easily into an existing DBMS. They might incorporate an element of chance, or have inputs from a complex event stream. An example might be a Monte Carlo simulation of a stochastic process involving geometric Brownian Motion, where a state at a given time involves both deterministic and probabilistic factors. These kinds of stochastic processes arise when assessing risk, and have applications in project management, banking, option value assessment, insurance and warranty claims.

PathScope uses the underlying capabilities of the Physics Speed Performance Engine to operate on streaming data flows, with:

  • user-defined aggregators in C++
  • method for per-row update
  • method for combining partial results from tasks/nodes
  • serialize partial results for distributed computation
  • user-defined data sources in C++
  • seek to a substream
  • method to generate next row from data source

It combines these capabilities with stochastic capabilities including efficient and parallelized generation of random numbers conforming to a probability distribution.

PathScope can generate Monte Carlo paths in parallel and evalautate multiple grouped aggregators in parallel at 2100 million timesteps per second on a 16 core processor. That's about 1 trillion time/risk points in 1.3 hours on a single processor.

HOW IT'S DONE

Most database optimizers are written without regard for modern hardware. With few exceptions, they ignore SIMD vector processing, and the 200X difference between L1 cache and main memory. Only a few DBs use multiple cores for a single query. These techniques, along with compressed columnar storage and code generation, are "table stakes" for analytic processing.

Physics Speed goes beyond these basics with new techniques for primitive query operations, and dynamically changing execution plans as processing occurs. But the main thing is a different way of thinking. Instead of asking how to do things faster, we start with the theoretically optimal, speed-of-light approach.

Vectorized SIMD

85%

Compressed Columnar

97%

Code Generation and Compilation

70%

Stream through L1 Cache

82%

Multicore Intra-query parallelism

94%

SSD

75%

Advanced techniques

95%
2000X

Faster than Postgres alone

For large scan and aggregation

500X

Better than native Greenplum

price * performance for TPCH like queries

100X

Less costly than Netezza

based on Twin Fin list pricing

0

Changes for you

Plugs right in

 

 

Vector Processing

Modern CPUs support SIMD processing, which executes a single instruction simultaneously across all elements in a vector register. The latest Intel CPUs allow as many as 8 (AVX2/Haswell) or 16 (AVX-512/Skylake) 32-bit elements in a vector. This means arithmetic can be performed on many numeric values simultaneously providing as much as a 16x performance boost ... BUT ONLY when processing vectors of attribute values.

In terms of analytic database processing, this means there's a huge penalty to operating on whole records, and a huge advantage to operating on column attributes across multiple records.

Most analytic systems support columnar storage these days. But too often, they recombine columns into records too early, as data flows from storage into RAM. This fills a cache line with information that might be relevant in the very next instruction, but is irrelevant for the current instruction, and so loses vector parallelism

Even systems that are careful about staying columnar can lose the potential for vector processing. Most analytic engines interpret a tree of analysis operators. The top level operator asks its children for one row of input, whereupon the children ask their children for one row of input, and on down the line, until some kind of scan operator reads one record from a file system buffer. Then each operator transforms its input into one row of output, resulting in the next query result. This is called "demand pull."

The advantage of demand pull systems is that they can avoid needless work. But as you can imagine, with all the context switching between operators, demand-pull systems lose the opportunity for processing multiple vector elements at the same time.

Storage Tiering

Up until 2000 or so, analytic query processing was relatively simple. Computers consisted of CPUs, RAM and spinning disk. RAM was expensive and spinning disk was about 10,000 times slower than anything else, so optimizing analytic performance was almost entirely about optimizing getting information off of disk.

But things are different now! SSDs are larger, cheaper, and capable of using the faster PCIe bus at over 2GB/sec bandwidth. Soon, non-volatile 3D XPoint memory in NVDIMMs will allow even higher bandwidth with sub-microsecond latency. Cloud object storage offers enhanced durability and availability at low cost, but with high access latency. With these different tiers of storage with widely varying cost, latency, and throughput, the analytic query engine has many new opportunities to balance query performance and system cost by managing the flow of data between storage tiers.

And it doesn't stop there, CPU caches are growing in size, with lower latencies, but there's still more than a 10x difference between L2 cache and RAM and a 5X difference between L3 cache and RAM.

All of this means that the location of data matters, up and down the stack. Hash tables and bucket sorts that fit in L2 or L3 cache are much faster than those in DRAM. Within a record, some columns may be accessed frequently and others rarely. Even within a column, values for some records (e.g. recent records) may be accessed more frequently than for other records. Physics Speed supports locating data at the optimum location in the storage hierarchy to provide the best balance between performance and cost.

Services

With long experience helping deliver analytic solutions, we can help in many ways.


Analysis

It all starts with you. Your goals, your data, and your infrastructure. We can provide tools, advice on resource sizing, and a path to success.

Provisioning

Physics Speed can be deployed on premise or in the cloud. Either way, it plugs into your database and works with your data. We can help you hook it all together.

Schematics

Specify which data you want to accelerate. We'll analyze a sample and suggest how it could be optimally compressed and partitioned.

Loading

Is cleansing, extracting and loading data really the best use of your time? You can give us the grunt work so that you can focus on creating value.

Your Magic

With Physics Speed, you can plug your existing algorithms directly into the data flow, processing it as it streams by. We're happy to connect it up for you.

Monitoring

We can help you track query performance, adjust compression and storage tiering, and administer resources.

WANT A LIVE DEMO? MORE INFORMATION?


Tell us how to get in touch with you, and a little about your current environment