QU Service Isolation
 and Search Engine Refactoring

John Reiser, Dmitri Lapanik, Gašper Ažman

Query Understanding Service Isolation

Why: hardware

QU Service:

- large static ram requirements

- small dynamic memory and CPU requirements per query

- massive QPS on single machine

-> optimize for small fleet of machines with lots of memory

 

Collators and Blenders:

- Moderate static memory requirements

- Large dynamic memory requirements

- load balancing, connection juggling

-> More available memory per collator -> more concurrent requests -> cheaper fleet

Why: Change rate and types

QU Service:

- software changes all the time

- data changes all the time

- changes are generally very safe, because they are basically update data from builds that are vetted already, or easily testable business logic

- can move to continuous deployment

Collators:

- software changes all the time

- data changes every day (relevance data deployments)

- will be hard to move to continuous deployment, even if we get continuous integration.

How: remote xformer (step 1)

Collator

QU Service

Partitions

RSAS & other clients

time

How: RSAS Direct (step 2)

The rest of the clients may continue using previous path

Collator

QU Service

Partitions

RSAS

time

previous path

Search Engine Refactoring

Whitepapers still pending. Specifying what exactly needs to be done is TBD, and can be part of this discussion.

- metadata

- attribute store type madness uncovered by Russ Brown

- we're mixing business logic and core services left and right (transformers should be a library, for instance)

- query context is just too complex

- pro(p)ffers

 

Metadata

 

 Uses too much memory:

  1. approach: change the way we store and represent metadata to not allocate individual objects separately, and move to representing it with one data structure that is a proper regular type that manages its own explicitly shared memory
  2. approach: normalize metadata and compute index descriptors on the fly. This would reduce metadata to roughly the size of the .xqm file.

 

Metadata

Is too complicated:

  • 4 types of inheritance
  • python/c++ impl sync problems
  • 30k lines of xml just for query config?
  • Document->index routing in query config?!?!? (fixed constraints)

Proposed solutions:

  1. Continue as is, after all it's OK to have a 30k+ line file, right?
  2. Metadata flattening
    1. good for complexity, bad for generated xqm file size
  3. only deploy metadata that is needed on a particular component
  4. ???

Attribute store madness

  • Actual types of attributes:
    • None
    • "Native" (8, 16, 32, 64 bits) (scalar only, dense)
    • LSA (sparse multi-map)
      • varint (8, 16, 32, 64 bit bucket widths)
      • unencoded (8, 16, 32, 64 bit values)
      • string
  • The above seems to be basically an enum with 1+3*4+1= 14 values. Not too horrible. Also, some combinations need not make sense. We might also want packed native bools in there, which are currently unsupported, and maybe unsigneds, etc.
  • Current specification requires about two screenfuls of code to figure out what the hell the format actually is from two and a half attributes.

Business logic mixing

  • Infra is in the business of vending a stable, fast and featureful platform for relevance, QU, and other teams that produce "business logic"
  • Currently, our code is full of business logic mixed in with what those teams are doing
  • Tearing apart A9SearchEngine started with making a stable interface for libIbisModels.so and the xbm/xqm. We should continue with teasing it apart so that we can have all the stuff that other teams are working on as plugins vended separately. I propose we start with figuring out an interface for transformers (right after we figure out metadata).

Query Context is just too complex

  • Query Context contains everything
  • it's not a regular type
  • can't really be serialized
  • no isolation anywhere
  • ... and yet implicit assumptions abound.

 

We need to tease it apart at least in such a way that we can know what are inputs and what are outputs to particular phases, and enforce that. Only then will we be able to move towards a sane dataflow representation of our computation.

Pro(p)ffers: proper proffers

  • We have a new document type: proffer (everyone look at Valera)
  • We should probably take a good look at our query execution path and see what that means for our architecture. Is it still optimal? How can we make it cleaner?

 

QU Service Isolation

By Gašper Ažman

QU Service Isolation

  • 759