elastipy
stable

Tutorials

  • Overview
  • don’t be plastic, elastipy!

Reference

  • Search
    • supported queries
    • search interface
    • search parameters
    • printing utilities
  • Aggregation
  • Exporter

Examples

  • git commit analytics
  • Plotting maps
elastipy
  • »
  • Search
  • Edit on GitHub

Search¶

The Search class is the main entry for all queries and aggregation requests against elasticsearch.

supported queries¶

  • compound

    • bool

  • full-text

    • match

    • query_string

  • match

    • match_all

    • match_none

  • term-level

    • range

    • term

    • terms

search interface¶

The Search class combines the query and the aggregation interface.

class elastipy.Search(index: Optional[str] = None, client: Optional[Union[str, Callable, elasticsearch.client.Elasticsearch, Any]] = None, timestamp_field: str = 'timestamp')[source]¶

Bases: elastipy.query.generated_interface.QueryInterface, elastipy.aggregation.generated_interface.AggregationInterface

Interface to elasticsearch /search.

All changes to a search object create and return a copy. Except for aggregations, which are attached to the search instance.

agg(*aggregation_name_type, **params) → elastipy.aggregation.aggregation.Aggregation¶

Creates an aggregation.

Either call

aggregation(“sum”, field=…) to create an automatic name

or call

aggregation(“my_name”, “sum”, field=…) to set aggregation name explicitly

Parameters
  • aggregation_name_type – one or two strings, meaning either “type” or “name”, “type”

  • params – all parameters of the aggregation function

Returns

Aggregation instance

agg_adjacency_matrix(*aggregation_name: Optional[str], filters: Mapping[str, Union[Mapping, QueryInterface]], separator: Optional[str] = None)¶

A bucket aggregation returning a form of adjacency matrix. The request provides a collection of named filter expressions, similar to the filters aggregation request. Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.

The matrix is said to be symmetric so we only return half of it. To do this we sort the filter name strings and always use the lowest of a pair as the value to the left of the "&" separator.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • filters – Mapping[str, Union[Mapping, 'QueryInterface']]

  • separator – Optional[str] An alternative separator parameter can be passed in the request if clients wish to use a separator string other than the default of the ampersand.

Returns

'AggregationInterface' A new instance is created and returned

agg_auto_date_histogram(*aggregation_name: Optional[str], field: Optional[str] = None, buckets: int = 10, minimum_interval: Optional[str] = None, time_zone: Optional[str] = None, format: Optional[str] = None, keyed: bool = False, missing: Optional[Any] = None, script: Optional[dict] = None)¶

A multi-bucket aggregation similar to the Date histogram except instead of providing an interval to use as the width of each bucket, a target number of buckets is provided indicating the number of buckets needed and the interval of the buckets is automatically chosen to best achieve that target. The number of buckets returned will always be less than or equal to this target number.

The buckets field is optional, and will default to 10 buckets if not specified.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – Optional[str] If no field is specified it will default to the ‘timestamp_field’ of the Search class.

  • buckets – int The number of buckets that are to be returned.

  • minimum_interval –

    Optional[str] The minimum_interval allows the caller to specify the minimum rounding interval that should be used. This can make the collection process more efficient, as the aggregation will not attempt to round at any interval lower than minimum_interval.

    The accepted units for minimum_interval are: year, month, day, hour, minute, second

  • time_zone –

    Optional[str] Date-times are stored in Elasticsearch in UTC. By default, all bucketing and rounding is also done in UTC. The time_zone parameter can be used to indicate that bucketing should use a different time zone.

    Time zones may either be specified as an ISO 8601 UTC offset (e.g. +01:00 or -08:00) or as a timezone id, an identifier used in the TZ database like America/Los_Angeles.

    Warning

    When using time zones that follow DST (daylight savings time) changes, buckets close to the moment when those changes happen can have slightly different sizes than neighbouring buckets. For example, consider a DST start in the CET time zone: on 27 March 2016 at 2am, clocks were turned forward 1 hour to 3am local time. If the result of the aggregation was daily buckets, the bucket covering that day will only hold data for 23 hours instead of the usual 24 hours for other buckets. The same is true for shorter intervals like e.g. 12h. Here, we will have only a 11h bucket on the morning of 27 March when the DST shift happens.

  • format – Optional[str] Specifies the format of the ‘key_as_string’ response. See: mapping date format

  • keyed – bool Setting the keyed flag to true associates a unique string key with each bucket and returns the ranges as a hash rather than an array.

  • missing – Optional[Any] The missing parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value.

  • script – Optional[dict] Generating the terms using a script

Returns

'AggregationInterface' A new instance is created and returned

agg_children(*aggregation_name: Optional[str], type: str)¶

A special single bucket aggregation that selects child documents that have the specified type, as defined in a join field.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • type – str The child type that should be selected.

Returns

'AggregationInterface' A new instance is created and returned

agg_composite(*aggregation_name: Optional[str], sources: Sequence[Mapping], size: int = 10, after: Optional[Union[str, int, float, datetime.datetime]] = None)¶

A multi-bucket aggregation that creates composite buckets from different sources.

Unlike the other multi-bucket aggregations, you can use the composite aggregation to paginate all buckets from a multi-level aggregation efficiently. This aggregation provides a way to stream all buckets of a specific aggregation, similar to what scroll does for documents.

The composite buckets are built from the combinations of the values extracted/created for each document and each combination is considered as a composite bucket.

For optimal performance the index sort should be set on the index so that it matches parts or fully the source order in the composite aggregation.

Sub-buckets: Like any multi-bucket aggregations the composite aggregation can hold sub-aggregations. These sub-aggregations can be used to compute other buckets or statistics on each composite bucket created by this parent aggregation.

Pipeline aggregations: The composite agg is not currently compatible with pipeline aggregations, nor does it make sense in most cases. E.g. due to the paging nature of composite aggs, a single logical partition (one day for example) might be spread over multiple pages. Since pipeline aggregations are purely post-processing on the final list of buckets, running something like a derivative on a composite page could lead to inaccurate results as it is only taking into account a “partial” result on that page.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • sources –

    Sequence[Mapping] The sources parameter defines the source fields to use when building composite buckets. The order that the sources are defined controls the order that the keys are returned.

    The sources parameter can be any of the following types:

    • Terms

    • Histogram

    • Date histogram

    • GeoTile grid

    Note

    You must use a unique name when defining sources.

  • size –

    int The size parameter can be set to define how many composite buckets should be returned. Each composite bucket is considered as a single bucket, so setting a size of 10 will return the first 10 composite buckets created from the value sources. The response contains the values for each composite bucket in an array containing the values extracted from each value source.

    Pagination: If the number of composite buckets is too high (or unknown) to be returned in a single response it is possible to split the retrieval in multiple requests. Since the composite buckets are flat by nature, the requested size is exactly the number of composite buckets that will be returned in the response (assuming that they are at least size composite buckets to return). If all composite buckets should be retrieved it is preferable to use a small size (100 or 1000 for instance) and then use the after parameter to retrieve the next results.

  • after –

    Optional[Union[str, int, float, datetime]] To get the next set of buckets, resend the same aggregation with the after parameter set to the after_key value returned in the response.

    Note

    The after_key is usually the key to the last bucket returned in the response, but that isn’t guaranteed. Always use the returned after_key instead of derriving it from the buckets.

    In order to optimize the early termination it is advised to set track_total_hits in the request to false. The number of total hits that match the request can be retrieved on the first request and it would be costly to compute this number on every page.

Returns

'AggregationInterface' A new instance is created and returned

agg_date_histogram(*aggregation_name: Optional[str], field: Optional[str] = None, calendar_interval: Optional[str] = None, fixed_interval: Optional[str] = None, min_doc_count: int = 1, offset: Optional[str] = None, time_zone: Optional[str] = None, format: Optional[str] = None, keyed: bool = False, missing: Optional[Any] = None, script: Optional[dict] = None)¶

This multi-bucket aggregation is similar to the normal histogram, but it can only be used with date or date range values. Because dates are represented internally in Elasticsearch as long values, it is possible, but not as accurate, to use the normal histogram on dates as well. The main difference in the two APIs is that here the interval can be specified using date/time expressions. Time-based data requires special support because time-based intervals are not always a fixed length.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – Optional[str] If no field is specified it will default to the ‘timestamp_field’ of the Search class.

  • calendar_interval – Optional[str] Calendar-aware intervals are configured with the calendar_interval parameter. You can specify calendar intervals using the unit name, such as month, or as a single unit quantity, such as 1M. For example, day and 1d are equivalent. Multiple quantities, such as 2d, are not supported.

  • fixed_interval –

    Optional[str] In contrast to calendar-aware intervals, fixed intervals are a fixed number of SI units and never deviate, regardless of where they fall on the calendar. One second is always composed of 1000ms. This allows fixed intervals to be specified in any multiple of the supported units.

    However, it means fixed intervals cannot express other units such as months, since the duration of a month is not a fixed quantity. Attempting to specify a calendar interval like month or quarter will throw an exception.

    The accepted units for fixed intervals are:

    • milliseconds (ms): A single millisecond. This is a very, very small interval.

    • seconds (s): Defined as 1000 milliseconds each.

    • minutes (m): Defined as 60 seconds each (60,000 milliseconds). All minutes begin at 00 seconds.

    • hours (h): Defined as 60 minutes each (3,600,000 milliseconds). All hours begin at 00 minutes and 00 seconds.

    • days (d): Defined as 24 hours (86,400,000 milliseconds). All days begin at the earliest possible time, which is usually 00:00:00 (midnight).

  • min_doc_count – int Minimum documents required for a bucket. Set to 0 to allow creating empty buckets.

  • offset –

    Optional[str] Use the offset parameter to change the start value of each bucket by the specified positive (+) or negative offset (-) duration, such as 1h for an hour, or 1d for a day. See Time units for more possible time duration options.

    For example, when using an interval of day, each bucket runs from midnight to midnight. Setting the offset parameter to +6h changes each bucket to run from 6am to 6am

  • time_zone –

    Optional[str] Elasticsearch stores date-times in Coordinated Universal Time (UTC). By default, all bucketing and rounding is also done in UTC. Use the time_zone parameter to indicate that bucketing should use a different time zone.

    For example, if the interval is a calendar day and the time zone is America/New_York then 2020-01-03T01:00:01Z is

    • converted to 2020-01-02T18:00:01

    • rounded down to 2020-01-02T00:00:00

    • then converted back to UTC to produce 2020-01-02T05:00:00:00Z

    • finally, when the bucket is turned into a string key it is printed in America/New_York so it’ll display as "2020-01-02T00:00:00"

    It looks like:

    bucket_key = localToUtc(Math.floor(utcToLocal(value) / interval) * interval))

    You can specify time zones as an ISO 8601 UTC offset (e.g. +01:00 or -08:00) or as an IANA time zone ID, such as America/Los_Angeles.

  • format – Optional[str] Specifies the format of the ‘key_as_string’ response. See: mapping date format

  • keyed – bool Setting the keyed flag to true associates a unique string key with each bucket and returns the ranges as a hash rather than an array.

  • missing – Optional[Any] The missing parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value.

  • script – Optional[dict] Generating the terms using a script

Returns

'AggregationInterface' A new instance is created and returned

agg_date_range(*aggregation_name: Optional[str], ranges: Sequence[Union[Mapping[str, str], str]], field: Optional[str] = None, format: Optional[str] = None, time_zone: Optional[str] = None, keyed: bool = False, missing: Optional[Any] = None, script: Optional[dict] = None)¶

A range aggregation that is dedicated for date values. The main difference between this aggregation and the normal range aggregation is that the from and to values can be expressed in Date Math expressions, and it is also possible to specify a date format by which the from and to response fields will be returned.

Note

Note that this aggregation includes the from value and excludes the to value for each range.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • ranges –

    Sequence[Union[Mapping[str, str], str]] List of ranges to define the buckets

    Example:

    [
        {"to": "1970-01-01"},
        {"from": "1970-01-01", "to": "1980-01-01"},
        {"from": "1980-01-01"},
    ]
    

    Instead of date values any Date Math expression can be used as well.

    Alternatively this parameter can be a list of strings. The above example can be rewritten as: ["1970-01-01", "1980-01-01"]

    Note

    This aggregation includes the from value and excludes the to value for each range.

  • field –

    Optional[str] The date field

    If no field is specified it will default to the ‘timestamp_field’ of the Search class.

  • format – Optional[str] The format of the response bucket keys as available for the DateTimeFormatter

  • time_zone –

    Optional[str] Dates can be converted from another time zone to UTC by specifying the time_zone parameter.

    Time zones may either be specified as an ISO 8601 UTC offset (e.g. +01:00 or -08:00) or as one of the time zone ids from the TZ database.

    The time_zone parameter is also applied to rounding in date math expressions.

  • keyed – bool Setting the keyed flag to true associates a unique string key with each bucket and returns the ranges as a hash rather than an array.

  • missing – Optional[Any] The missing parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value.

  • script – Optional[dict] Generating the terms using a script

Returns

'AggregationInterface' A new instance is created and returned

agg_diversified_sampler(*aggregation_name: Optional[str], field: Optional[str] = None, script: Optional[Mapping] = None, shard_size: int = 100, max_docs_per_value: int = 1)¶

Like the sampler aggregation this is a filtering aggregation used to limit any sub aggregations’ processing to a sample of the top-scoring documents. The diversified_sampler aggregation adds the ability to limit the number of matches that share a common value such as an “author”.

Note

Any good market researcher will tell you that when working with samples of data it is important that the sample represents a healthy variety of opinions rather than being skewed by any single voice. The same is true with aggregations and sampling with these diversify settings can offer a way to remove the bias in your content (an over-populated geography, a large spike in a timeline or an over-active forum spammer).

Example use cases:

  • Tightening the focus of analytics to high-relevance matches rather than the potentially very long tail of low-quality matches

  • Removing bias from analytics by ensuring fair representation of content from different sources

  • Reducing the running cost of aggregations that can produce useful results using only samples e.g. significant_terms

A choice of field or script setting is used to provide values used for de-duplication and the max_docs_per_value setting controls the maximum number of documents collected on any one shard which share a common value. The default setting for max_docs_per_value is 1.

Note

The aggregation will throw an error if the choice of field or script produces multiple values for a single document (de-duplication using multi-valued fields is not supported due to efficiency concerns).

Limitations:

Cannot be nested under breadth_first aggregations Being a quality-based filter the diversified_sampler aggregation needs access to the relevance score produced for each document. It therefore cannot be nested under a terms aggregation which has the collect_mode switched from the default depth_first mode to breadth_first as this discards scores. In this situation an error will be thrown.

Limited de-dup logic. The de-duplication logic applies only at a shard level so will not apply across shards.

No specialized syntax for geo/date fields Currently the syntax for defining the diversifying values is defined by a choice of field or script - there is no added syntactical sugar for expressing geo or date units such as "7d" (7 days). This support may be added in a later release and users will currently have to create these sorts of values using a script.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – Optional[str] The field to search on. Can alternatively be a script

  • script – Optional[Mapping] The script that specifies the aggregation. Can alternatively be a ‘field’

  • shard_size – int The shard_size parameter limits how many top-scoring documents are collected in the sample processed on each shard. The default value is 100.

  • max_docs_per_value – int The max_docs_per_value is an optional parameter and limits how many documents are permitted per choice of de-duplicating value. The default setting is 1.

Returns

'AggregationInterface' A new instance is created and returned

agg_filter(*aggregation_name: Optional[str], filter: Union[Mapping, QueryInterface])¶

Defines a single bucket of all the documents in the current document set context that match a specified filter. Often this will be used to narrow down the current aggregation context to a specific set of documents.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • filter – Union[Mapping, 'QueryInterface']

Returns

'AggregationInterface' A new instance is created and returned

agg_filters(*aggregation_name: Optional[str], filters: Mapping[str, Union[Mapping, QueryInterface]])¶

Defines a multi bucket aggregation where each bucket is associated with a filter. Each bucket will collect all documents that match its associated filter.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • filters – Mapping[str, Union[Mapping, 'QueryInterface']]

Returns

'AggregationInterface' A new instance is created and returned

agg_geo_distance(*aggregation_name: Optional[str], field: str, ranges: Sequence[Union[Mapping[str, float], float]], origin: Union[str, Mapping[str, float], Sequence[float]], unit: str = 'm', distance_type: str = 'arc', keyed: bool = False)¶

A multi-bucket aggregation that works on geo_point fields and conceptually works very similar to the range aggregation. The user can define a point of origin and a set of distance range buckets. The aggregation evaluate the distance of each document value from the origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket).

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str The specified field must be of type geo_point (which can only be set explicitly in the mappings). And it can also hold an array of geo_point fields, in which case all will be taken into account during aggregation.

  • ranges –

    Sequence[Union[Mapping[str, float], float]] A list of ranges that define the separate buckets, e.g:

    [ { "to": 100000 }, { "from": 100000, "to": 300000 }, { "from":
    300000 } ]
    

    Alternatively this parameter can be a list of numbers. The above example can be rewritten as [100000, 300000]

  • origin –

    Union[str, Mapping[str, float], Sequence[float]] The origin point can accept all formats supported by the geo_point type:

    • Object format: { "lat" : 52.3760, "lon" : 4.894 } - this is the safest format as it is the most explicit about the lat & lon values

    • String format: "52.3760, 4.894" - where the first number is the lat and the second is the lon

    • Array format: [4.894, 52.3760] - which is based on the GeoJson standard and where the first number is the lon and the second one is the lat

  • unit – str By default, the distance unit is m (meters) but it can also accept: mi (miles), in (inches), yd (yards), km (kilometers), cm (centimeters), mm (millimeters).

  • distance_type – str There are two distance calculation modes: arc (the default), and plane. The arc calculation is the most accurate. The plane is the fastest but least accurate. Consider using plane when your search context is “narrow”, and spans smaller geographical areas (~5km). plane will return higher error margins for searches across very large areas (e.g. cross continent search).

  • keyed – bool Setting the keyed flag to true will associate a unique string key with each bucket and return the ranges as a hash rather than an array.

Returns

'AggregationInterface' A new instance is created and returned

agg_geohash_grid(*aggregation_name: Optional[str], field: str, precision: Union[int, str] = 5, bounds: Optional[Mapping] = None, size: int = 10000, shard_size: Optional[int] = None)¶

A multi-bucket aggregation that works on geo_point fields and groups points into buckets that represent cells in a grid. The resulting grid can be sparse and only contains cells that have matching data. Each cell is labeled using a geohash which is of user-definable precision.

  • High precision geohashes have a long string length and represent cells that cover only a small area.

  • Low precision geohashes have a short string length and represent cells that each cover a large area.

Geohashes used in this aggregation can have a choice of precision between 1 and 12.

The highest-precision geohash of length 12 produces cells that cover less than a square metre of land and so high-precision requests can be very costly in terms of RAM and result sizes.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field –

    str The specified field must be of type geo_point or geo_shape (which can only be set explicitly in the mappings). And it can also hold an array of geo_point fields, in which case all will be taken into account during aggregation.

    Aggregating on Geo-shape fields works just as it does for points, except that a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values if any part of its shape intersects with that tile.

  • precision –

    Union[int, str] The required precision of the grid in the range [1, 12]. Higher means more precise.

    Alternatively, the precision level can be approximated from a distance measure like "1km", "10m". The precision level is calculate such that cells will not exceed the specified size (diagonal) of the required precision. When this would lead to precision levels higher than the supported 12 levels, (e.g. for distances <5.6cm) the value is rejected.

    Note

    When requesting detailed buckets (typically for displaying a “zoomed in” map) a filter like geo_bounding_box should be applied to narrow the subject area otherwise potentially millions of buckets will be created and returned.

  • bounds – Optional[Mapping] The geohash_grid aggregation supports an optional bounds parameter that restricts the points considered to those that fall within the bounds provided. The bounds parameter accepts the bounding box in all the same accepted formats of the bounds specified in the Geo Bounding Box Query. This bounding box can be used with or without an additional geo_bounding_box query filtering the points prior to aggregating. It is an independent bounding box that can intersect with, be equal to, or be disjoint to any additional geo_bounding_box queries defined in the context of the aggregation.

  • size – int The maximum number of geohash buckets to return (defaults to 10,000). When results are trimmed, buckets are prioritised based on the volumes of documents they contain.

  • shard_size – Optional[int] To allow for more accurate counting of the top cells returned in the final result the aggregation defaults to returning max(10, (size x number-of-shards)) buckets from each shard. If this heuristic is undesirable, the number considered from each shard can be over-ridden using this parameter.

Returns

'AggregationInterface' A new instance is created and returned

agg_geotile_grid(*aggregation_name: Optional[str], field: str, precision: Union[int, str] = 7, bounds: Optional[Mapping] = None, size: int = 10000, shard_size: Optional[int] = None)¶

A multi-bucket aggregation that works on geo_point fields and groups points into buckets that represent cells in a grid. The resulting grid can be sparse and only contains cells that have matching data. Each cell corresponds to a map tile as used by many online map sites. Each cell is labeled using a “{zoom}/{x}/{y}” format, where zoom is equal to the user-specified precision.

  • High precision keys have a larger range for x and y, and represent tiles that cover only a small area.

  • Low precision keys have a smaller range for x and y, and represent tiles that each cover a large area.

Warning

The highest-precision geotile of length 29 produces cells that cover less than a 10cm by 10cm of land and so high-precision requests can be very costly in terms of RAM and result sizes. Please first filter the aggregation to a smaller geographic area before requesting high-levels of detail.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str The specified field must be of type geo_point (which can only be set explicitly in the mappings). And it can also hold an array of geo_point fields, in which case all will be taken into account during aggregation.

  • precision –

    Union[int, str] The required precision of the grid in the range [1, 29]. Higher means more precise.

    Note

    When requesting detailed buckets (typically for displaying a “zoomed in” map) a filter like geo_bounding_box should be applied to narrow the subject area otherwise potentially millions of buckets will be created and returned.

  • bounds – Optional[Mapping] The geotile_grid aggregation supports an optional bounds parameter that restricts the points considered to those that fall within the bounds provided. The bounds parameter accepts the bounding box in all the same accepted formats of the bounds specified in the Geo Bounding Box Query. This bounding box can be used with or without an additional geo_bounding_box query filtering the points prior to aggregating. It is an independent bounding box that can intersect with, be equal to, or be disjoint to any additional geo_bounding_box queries defined in the context of the aggregation.

  • size – int The maximum number of geohash buckets to return (defaults to 10,000). When results are trimmed, buckets are prioritised based on the volumes of documents they contain.

  • shard_size – Optional[int] To allow for more accurate counting of the top cells returned in the final result the aggregation defaults to returning max(10, (size x number-of-shards)) buckets from each shard. If this heuristic is undesirable, the number considered from each shard can be over-ridden using this parameter.

Returns

'AggregationInterface' A new instance is created and returned

agg_global(*aggregation_name: Optional[str])¶

Defines a single bucket of all the documents within the search execution context. This context is defined by the indices and the document types you’re searching on, but is not influenced by the search query itself.

Note

Global aggregators can only be placed as top level aggregators because it doesn’t make sense to embed a global aggregator within another bucket aggregator.

elasticsearch documentation

Parameters

aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

Returns

'AggregationInterface' A new instance is created and returned

agg_histogram(*aggregation_name: Optional[str], field: str, interval: int, min_doc_count: int = 0, offset: Optional[int] = None, extended_bounds: Optional[Mapping[str, int]] = None, hard_bounds: Optional[Mapping[str, int]] = None, format: Optional[str] = None, order: Optional[Union[Mapping, str]] = None, keyed: bool = False, missing: Optional[Any] = None)¶

A multi-bucket values source based aggregation that can be applied on numeric values or numeric range values extracted from the documents. It dynamically builds fixed size (a.k.a. interval) buckets over the values. For example, if the documents have a field that holds a price (numeric), we can configure this aggregation to dynamically build buckets with interval 5 (in case of price it may represent $5). When the aggregation executes, the price field of every document will be evaluated and will be rounded down to its closest bucket - for example, if the price is 32 and the bucket size is 5 then the rounding will yield 30 and thus the document will “fall” into the bucket that is associated with the key 30. To make this more formal, here is the rounding function that is used:

bucket_key = Math.floor((value - offset) / interval) * interval + offset

For range values, a document can fall into multiple buckets. The first bucket is computed from the lower bound of the range in the same way as a bucket for a single value is computed. The final bucket is computed in the same way from the upper bound of the range, and the range is counted in all buckets in between and including those two.

The interval must be a positive decimal, while the offset must be a decimal in [0, interval) (a decimal greater than or equal to 0 and less than interval)

Histogram fields: Running a histogram aggregation over histogram fields computes the total number of counts for each interval. See example

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str A numeric field to be indexed by the histogram.

  • interval – int A positive decimal defining the interval between buckets.

  • min_doc_count –

    int By default the response will fill gaps in the histogram with empty buckets. It is possible change that and request buckets with a higher minimum count thanks to the min_doc_count setting

    By default the histogram returns all the buckets within the range of the data itself, that is, the documents with the smallest values (on which with histogram) will determine the min bucket (the bucket with the smallest key) and the documents with the highest values will determine the max bucket (the bucket with the highest key). Often, when requesting empty buckets, this causes a confusion, specifically, when the data is also filtered.

    To understand why, let’s look at an example:

    Lets say the you’re filtering your request to get all docs with values between 0 and 500, in addition you’d like to slice the data per price using a histogram with an interval of 50. You also specify “min_doc_count” : 0 as you’d like to get all buckets even the empty ones. If it happens that all products (documents) have prices higher than 100, the first bucket you’ll get will be the one with 100 as its key. This is confusing, as many times, you’d also like to get those buckets between 0 - 100.

  • offset –

    Optional[int] By default the bucket keys start with 0 and then continue in even spaced steps of interval, e.g. if the interval is 10, the first three buckets (assuming there is data inside them) will be [0, 10), [10, 20), [20, 30). The bucket boundaries can be shifted by using the offset option.

    This can be best illustrated with an example. If there are 10 documents with values ranging from 5 to 14, using interval 10 will result in two buckets with 5 documents each. If an additional offset 5 is used, there will be only one single bucket [5, 15) containing all the 10 documents.

  • extended_bounds –

    Optional[Mapping[str, int]] With extended_bounds setting, you now can “force” the histogram aggregation to start building buckets on a specific min value and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if min_doc_count is greater than 0).

    Note that (as the name suggest) extended_bounds is not filtering buckets. Meaning, if the extended_bounds.min is higher than the values extracted from the documents, the documents will still dictate what the first bucket will be (and the same goes for the extended_bounds.max and the last bucket). For filtering buckets, one should nest the histogram aggregation under a range filter aggregation with the appropriate from/to settings.

    When aggregating ranges, buckets are based on the values of the returned documents. This means the response may include buckets outside of a query’s range. For example, if your query looks for values greater than 100, and you have a range covering 50 to 150, and an interval of 50, that document will land in 3 buckets - 50, 100, and 150. In general, it’s best to think of the query and aggregation steps as independent - the query selects a set of documents, and then the aggregation buckets those documents without regard to how they were selected. See note on bucketing range fields for more information and an example.

  • hard_bounds – Optional[Mapping[str, int]] The hard_bounds is a counterpart of extended_bounds and can limit the range of buckets in the histogram. It is particularly useful in the case of open data ranges that can result in a very large number of buckets.

  • format – Optional[str] Specifies the format of the ‘key_as_string’ response. See: mapping date format

  • order – Optional[Union[Mapping, str]] By default the returned buckets are sorted by their key ascending, though the order behaviour can be controlled using the order setting. Supports the same order functionality as the Terms Aggregation.

  • keyed – bool Setting the keyed flag to true associates a unique string key with each bucket and returns the ranges as a hash rather than an array.

  • missing – Optional[Any] The missing parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value.

Returns

'AggregationInterface' A new instance is created and returned

agg_ip_range(*aggregation_name: Optional[str], field: str, ranges: Sequence[Union[Mapping[str, str], str]], keyed: bool = False)¶

Just like the dedicated date range aggregation, there is also a dedicated range aggregation for IP typed fields:

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str The IPv4 field

  • ranges –

    Sequence[Union[Mapping[str, str], str]] List of ranges to define the buckets, either as straight IPv4 or as CIDR masks.

    Example:

    [
        {"to": "10.0.0.5"},
        {"from": "10.0.0.5", "to": "10.0.0.127"},
        {"from": "10.0.0.127"},
    ]
    

    Alternatively this parameter can be a list of strings. The above example can be rewritten as: ["10.0.0.5", "10.0.0.127"]

  • keyed – bool Setting the keyed flag to true associates a unique string key with each bucket and returns the ranges as a hash rather than an array.

Returns

'AggregationInterface' A new instance is created and returned

agg_missing(*aggregation_name: Optional[str], field: str)¶

A field data based single bucket aggregation, that creates a bucket of all documents in the current document set context that are missing a field value (effectively, missing a field or having the configured NULL value set). This aggregator will often be used in conjunction with other field data bucket aggregators (such as ranges) to return information for all the documents that could not be placed in any of the other buckets due to missing field data values.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str The field we wish to investigate for missing values

Returns

'AggregationInterface' A new instance is created and returned

agg_nested(*aggregation_name: Optional[str], path: str)¶

A special single bucket aggregation that enables aggregating nested documents.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • path – str The field of the nested document(s)

Returns

'AggregationInterface' A new instance is created and returned

agg_range(*aggregation_name: Optional[str], ranges: Sequence[Union[Mapping[str, Any], Any]], field: Optional[str] = None, keyed: bool = False, script: Optional[dict] = None)¶

A multi-bucket value source based aggregation that enables the user to define a set of ranges - each representing a bucket. During the aggregation process, the values extracted from each document will be checked against each bucket range and “bucket” the relevant/matching document.

Note

Note that this aggregation includes the from value and excludes the to value for each range.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • ranges –

    Sequence[Union[Mapping[str, Any], Any]] List of ranges to define the buckets

    Example:

    [
        {"to": 10},
        {"from": 10, "to": 20},
        {"from": 20},
    ]
    

    Alternatively this parameter can be a list of strings. The above example can be rewritten as: [10, 20]

    Note

    This aggregation includes the from value and excludes the to value for each range.

  • field – Optional[str] The field to index by the aggregation

  • keyed – bool Setting the keyed flag to true associates a unique string key with each bucket and returns the ranges as a hash rather than an array.

  • script – Optional[dict] Generating the terms using a script

Returns

'AggregationInterface' A new instance is created and returned

agg_rare_terms(*aggregation_name: Optional[str], field: str, max_doc_count: int = 1, include: Optional[Union[str, Sequence[str], Mapping[str, int]]] = None, exclude: Optional[Union[str, Sequence[str]]] = None, missing: Optional[Any] = None)¶

A multi-bucket value source based aggregation which finds “rare” terms — terms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a terms aggregation that is sorted by _count ascending. As noted in the terms aggregation docs, actually ordering a terms agg by count ascending has unbounded error. Instead, you should use the rare_terms aggregation.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str The field we wish to find rare terms in

  • max_doc_count –

    int The maximum number of documents a term should appear in.

    The max_doc_count parameter is used to control the upper bound of document counts that a term can have. There is not a size limitation on the rare_terms agg like terms agg has. This means that terms which match the max_doc_count criteria will be returned. The aggregation functions in this manner to avoid the order-by-ascending issues that afflict the terms aggregation.

    This does, however, mean that a large number of results can be returned if chosen incorrectly. To limit the danger of this setting, the maximum max_doc_count is 100.

  • include –

    Optional[Union[str, Sequence[str], Mapping[str, int]]] A regexp pattern that filters the documents which will be aggregated.

    Alternatively can be a list of strings.

    Parition expressions are also possible.

  • exclude –

    Optional[Union[str, Sequence[str]]] A regexp pattern that filters the documents which will be aggregated.

    Alternatively can be a list of strings.

  • missing – Optional[Any] The missing parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value.

Returns

'AggregationInterface' A new instance is created and returned

agg_sampler(*aggregation_name: Optional[str], shard_size: int = 100)¶

A filtering aggregation used to limit any sub aggregations’ processing to a sample of the top-scoring documents.

Example use cases:

  • Tightening the focus of analytics to high-relevance matches rather than the potentially very long tail of low-quality matches

  • Reducing the running cost of aggregations that can produce useful results using only samples e.g. significant_terms

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • shard_size – int The shard_size parameter limits how many top-scoring documents are collected in the sample processed on each shard. The default value is 100.

Returns

'AggregationInterface' A new instance is created and returned

agg_significant_terms(*aggregation_name: Optional[str], field: str, size: int = 10, shard_size: Optional[int] = None, min_doc_count: int = 1, shard_min_doc_count: Optional[int] = None, execution_hint: str = 'global_ordinals', include: Optional[Union[str, Sequence[str], Mapping[str, int]]] = None, exclude: Optional[Union[str, Sequence[str]]] = None, script: Optional[dict] = None)¶

An aggregation that returns interesting or unusual occurrences of terms in a set.

Example use cases:

  • Suggesting “H5N1” when users search for “bird flu” in text

  • Identifying the merchant that is the “common point of compromise” from the transaction history of credit card owners reporting loss

  • Suggesting keywords relating to stock symbol $ATI for an automated news classifier

  • Spotting the fraudulent doctor who is diagnosing more than their fair share of whiplash injuries

  • Spotting the tire manufacturer who has a disproportionate number of blow-outs

In all these cases the terms being selected are not simply the most popular terms in a set. They are the terms that have undergone a significant change in popularity measured between a foreground and background set. If the term “H5N1” only exists in 5 documents in a 10 million document index and yet is found in 4 of the 100 documents that make up a user’s search results that is significant and probably very relevant to their search. 5/10,000,000 vs 4/100 is a big swing in frequency.

Warning

Picking a free-text field as the subject of a significant terms analysis can be expensive! It will attempt to load every unique word into RAM. It is recommended to only use this on smaller indices.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • size – int The size parameter can be set to define how many term buckets should be returned out of the overall terms list. By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned).

  • shard_size –

    Optional[int] The higher the requested size is, the more accurate the results will be, but also, the more expensive it will be to compute the final results (both due to bigger priority queues that are managed on a shard level and due to bigger data transfers between the nodes and the client).

    The shard_size parameter can be used to minimize the extra work that comes with bigger requested size. When defined, it will determine how many terms the coordinating node will request from each shard. Once all the shards responded, the coordinating node will then reduce them to a final result which will be based on the size parameter - this way, one can increase the accuracy of the returned terms and avoid the overhead of streaming a big list of buckets back to the client.

  • min_doc_count –

    int It is possible to only return terms that match more than a configured number of hits using the min_doc_count option. Default value is 1.

    Terms are collected and ordered on a shard level and merged with the terms collected from other shards in a second step. However, the shard does not have the information about the global document count available. The decision if a term is added to a candidate list depends only on the order computed on the shard using local shard frequencies. The min_doc_count criterion is only applied after merging local terms statistics of all shards. In a way the decision to add the term as a candidate is made without being very certain about if the term will actually reach the required min_doc_count. This might cause many (globally) high frequent terms to be missing in the final result if low frequent terms populated the candidate lists. To avoid this, the shard_size parameter can be increased to allow more candidate terms on the shards. However, this increases memory consumption and network traffic.

  • shard_min_doc_count –

    Optional[int] The parameter shard_min_doc_count regulates the certainty a shard has if the term should actually be added to the candidate list or not with respect to the min_doc_count. Terms will only be considered if their local shard frequency within the set is higher than the shard_min_doc_count. If your dictionary contains many low frequent terms and you are not interested in those (for example misspellings), then you can set the shard_min_doc_count parameter to filter out candidate terms on a shard level that will with a reasonable certainty not reach the required min_doc_count even after merging the local counts. shard_min_doc_count is set to 0 per default and has no effect unless you explicitly set it.

    Note

    Setting min_doc_count=0 will also return buckets for terms that didn’t match any hit. However, some of the returned terms which have a document count of zero might only belong to deleted documents or documents from other types, so there is no warranty that a match_all query would find a positive document count for those terms.

    Warning

    When NOT sorting on doc_count descending, high values of min_doc_count may return a number of buckets which is less than size because not enough data was gathered from the shards. Missing buckets can be back by increasing shard_size. Setting shard_min_doc_count too high will cause terms to be filtered out on a shard level. This value should be set much lower than min_doc_count/#shards.

  • execution_hint –

    str There are different mechanisms by which terms aggregations can be executed:

    • by using field values directly in order to aggregate data per-bucket (map)

    • by using global ordinals of the field and allocating one bucket per global ordinal (global_ordinals)

    Elasticsearch tries to have sensible defaults so this is something that generally doesn’t need to be configured.

    global_ordinals is the default option for keyword field, it uses global ordinals to allocates buckets dynamically so memory usage is linear to the number of values of the documents that are part of the aggregation scope.

    map should only be considered when very few documents match a query. Otherwise the ordinals-based execution mode is significantly faster. By default, map is only used when running an aggregation on scripts, since they don’t have ordinals.

  • include –

    Optional[Union[str, Sequence[str], Mapping[str, int]]] A regexp pattern that filters the documents which will be aggregated.

    Alternatively can be a list of strings.

    Parition expressions are also possible.

  • exclude –

    Optional[Union[str, Sequence[str]]] A regexp pattern that filters the documents which will be aggregated.

    Alternatively can be a list of strings.

  • script – Optional[dict] Generating the terms using a script

Returns

'AggregationInterface' A new instance is created and returned

agg_terms(*aggregation_name: Optional[str], field: str, size: int = 10, shard_size: Optional[int] = None, show_term_doc_count_error: Optional[bool] = None, order: Optional[Union[Mapping, str]] = None, min_doc_count: int = 1, shard_min_doc_count: Optional[int] = None, include: Optional[Union[str, Sequence[str], Mapping[str, int]]] = None, exclude: Optional[Union[str, Sequence[str]]] = None, missing: Optional[Any] = None, script: Optional[dict] = None)¶

A multi-bucket value source based aggregation where buckets are dynamically built - one per unique value.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • size – int The size parameter can be set to define how many term buckets should be returned out of the overall terms list. By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned).

  • shard_size –

    Optional[int] The higher the requested size is, the more accurate the results will be, but also, the more expensive it will be to compute the final results (both due to bigger priority queues that are managed on a shard level and due to bigger data transfers between the nodes and the client).

    The shard_size parameter can be used to minimize the extra work that comes with bigger requested size. When defined, it will determine how many terms the coordinating node will request from each shard. Once all the shards responded, the coordinating node will then reduce them to a final result which will be based on the size parameter - this way, one can increase the accuracy of the returned terms and avoid the overhead of streaming a big list of buckets back to the client.

  • show_term_doc_count_error –

    Optional[bool] This shows an error value for each term returned by the aggregation which represents the worst case error in the document count and can be useful when deciding on a value for the shard_size parameter. This is calculated by summing the document counts for the last term returned by all shards which did not return the term.

    These errors can only be calculated in this way when the terms are ordered by descending document count. When the aggregation is ordered by the terms values themselves (either ascending or descending) there is no error in the document count since if a shard does not return a particular term which appears in the results from another shard, it must not have that term in its index. When the aggregation is either sorted by a sub aggregation or in order of ascending document count, the error in the document counts cannot be determined and is given a value of -1 to indicate this.

  • order –

    Optional[Union[Mapping, str]] The order of the buckets can be customized by setting the order parameter. By default, the buckets are ordered by their doc_count descending.

    Warning

    Sorting by ascending _count or by sub aggregation is discouraged as it increases the error on document counts. It is fine when a single shard is queried, or when the field that is being aggregated was used as a routing key at index time: in these cases results will be accurate since shards have disjoint values. However otherwise, errors are unbounded. One particular case that could still be useful is sorting by min or max aggregation: counts will not be accurate but at least the top buckets will be correctly picked.

  • min_doc_count –

    int It is possible to only return terms that match more than a configured number of hits using the min_doc_count option. Default value is 1.

    Terms are collected and ordered on a shard level and merged with the terms collected from other shards in a second step. However, the shard does not have the information about the global document count available. The decision if a term is added to a candidate list depends only on the order computed on the shard using local shard frequencies. The min_doc_count criterion is only applied after merging local terms statistics of all shards. In a way the decision to add the term as a candidate is made without being very certain about if the term will actually reach the required min_doc_count. This might cause many (globally) high frequent terms to be missing in the final result if low frequent terms populated the candidate lists. To avoid this, the shard_size parameter can be increased to allow more candidate terms on the shards. However, this increases memory consumption and network traffic.

  • shard_min_doc_count –

    Optional[int] The parameter shard_min_doc_count regulates the certainty a shard has if the term should actually be added to the candidate list or not with respect to the min_doc_count. Terms will only be considered if their local shard frequency within the set is higher than the shard_min_doc_count. If your dictionary contains many low frequent terms and you are not interested in those (for example misspellings), then you can set the shard_min_doc_count parameter to filter out candidate terms on a shard level that will with a reasonable certainty not reach the required min_doc_count even after merging the local counts. shard_min_doc_count is set to 0 per default and has no effect unless you explicitly set it.

    Note

    Setting min_doc_count=0 will also return buckets for terms that didn’t match any hit. However, some of the returned terms which have a document count of zero might only belong to deleted documents or documents from other types, so there is no warranty that a match_all query would find a positive document count for those terms.

    Warning

    When NOT sorting on doc_count descending, high values of min_doc_count may return a number of buckets which is less than size because not enough data was gathered from the shards. Missing buckets can be back by increasing shard_size. Setting shard_min_doc_count too high will cause terms to be filtered out on a shard level. This value should be set much lower than min_doc_count/#shards.

  • include –

    Optional[Union[str, Sequence[str], Mapping[str, int]]] A regexp pattern that filters the documents which will be aggregated.

    Alternatively can be a list of strings.

    Parition expressions are also possible.

  • exclude –

    Optional[Union[str, Sequence[str]]] A regexp pattern that filters the documents which will be aggregated.

    Alternatively can be a list of strings.

  • missing – Optional[Any] The missing parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value.

  • script – Optional[dict] Generating the terms using a script

Returns

'AggregationInterface' A new instance is created and returned

aggregation(*aggregation_name_type, **params) → elastipy.aggregation.aggregation.Aggregation[source]¶

Creates an aggregation.

Either call

aggregation(“sum”, field=…) to create an automatic name

or call

aggregation(“my_name”, “sum”, field=…) to set aggregation name explicitly

Parameters
  • aggregation_name_type – one or two strings, meaning either “type” or “name”, “type”

  • params – all parameters of the aggregation function

Returns

Aggregation instance

bool(must: Optional[Union[elastipy.query.generated_interface.QueryInterface, Mapping, Sequence[Union[elastipy.query.generated_interface.QueryInterface, Mapping]]]] = None, must_not: Optional[Union[elastipy.query.generated_interface.QueryInterface, Mapping, Sequence[Union[elastipy.query.generated_interface.QueryInterface, Mapping]]]] = None, should: Optional[Union[elastipy.query.generated_interface.QueryInterface, Mapping, Sequence[Union[elastipy.query.generated_interface.QueryInterface, Mapping]]]] = None, filter: Optional[Union[elastipy.query.generated_interface.QueryInterface, Mapping, Sequence[Union[elastipy.query.generated_interface.QueryInterface, Mapping]]]] = None) → elastipy.query.generated_interface.QueryInterface¶

A query that matches documents matching boolean combinations of other queries. The bool query maps to Lucene BooleanQuery. It is built using one or more boolean clauses, each clause with a typed occurrence.

The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.

elasticsearch documentation

Parameters
  • must – Optional[Union['QueryInterface', Mapping, Sequence[Union['QueryInterface', Mapping]]]] The clause (query) must appear in matching documents and will contribute to the score.

  • must_not – Optional[Union['QueryInterface', Mapping, Sequence[Union['QueryInterface', Mapping]]]] The clause (query) must not appear in the matching documents. Clauses are executed in filter context meaning that scoring is ignored and clauses are considered for caching. Because scoring is ignored, a score of 0 for all documents is returned.

  • should – Optional[Union['QueryInterface', Mapping, Sequence[Union['QueryInterface', Mapping]]]] The clause (query) should appear in the matching document.

  • filter – Optional[Union['QueryInterface', Mapping, Sequence[Union['QueryInterface', Mapping]]]] The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching.

Returns

'QueryInterface' A new instance is created

client(client)[source]¶

Replace the client that will be used for request.

Parameters

client – an elasticsearch.Elasticsearch client or compatible

Returns

new Search instance

copy()[source]¶

Make a copy of this instance and it’s queries.

Warning

Copying of Aggregations is currently not supported so aggregations must be added at the last step, after all queries are applied.

Returns

a new Search instance

property dump¶

Access the print interface

execute() → elastipy.search.Response[source]¶

Sends the search against the current client and returns the response. If no client is specified, elastipy.connections.get(“default”) will be used.

Returns

Response, a dict wrapper with some convenience methods

get_client()[source]¶

Return current client

get_index() → str[source]¶

Return current index

get_query()[source]¶

Return current query

index(index: str)[source]¶

Replace the index.

Parameters

index – str

Returns

new Search instance

match(field: str, query: Union[str, int, float, elastipy.query.generated_interface.QueryInterface.bool], auto_generate_synonyms_phrase_query: elastipy.query.generated_interface.QueryInterface.bool = True, fuzziness: Optional[str] = None, max_expansions: int = 50, prefix_length: int = 0, fuzzy_transpositions: elastipy.query.generated_interface.QueryInterface.bool = True, fuzzy_rewrite: Optional[str] = None, lenient: elastipy.query.generated_interface.QueryInterface.bool = False, operator: Optional[str] = None, minimum_should_match: Optional[str] = None, zero_terms_query: str = 'none') → elastipy.query.generated_interface.QueryInterface¶

Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.

The match query is the standard query for performing a full-text search, including options for fuzzy matching.

elasticsearch documentation

Parameters
  • field – str Field you wish to search.

  • query –

    Union[str, int, float, bool] Text, number, boolean value or date you wish to find in the provided <field>.

    The match query analyzes any provided text before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term.

  • auto_generate_synonyms_phrase_query – bool If true, match phrase queries are automatically created for multi-term synonyms. Defaults to true.

  • fuzziness – Optional[str] Maximum edit distance allowed for matching. See Fuzziness for valid values and more information. See Fuzziness in the match query for an example.

  • max_expansions – int Maximum number of terms to which the query will expand. Defaults to 50.

  • prefix_length – int Number of beginning characters left unchanged for fuzzy matching. Defaults to 0.

  • fuzzy_transpositions – bool If true, edits for fuzzy matching include transpositions of two adjacent characters (ab → ba). Defaults to true.

  • fuzzy_rewrite –

    Optional[str] Method used to rewrite the query. See the rewrite parameter for valid values and more information.

    If the fuzziness parameter is not 0, the match query uses a fuzzy_rewrite method of top_terms_blended_freqs_${max_expansions} by default.

  • lenient – bool If true, format-based errors, such as providing a text query value for a numeric field, are ignored. Defaults to false.

  • operator –

    Optional[str] Boolean logic used to interpret text in the query value. Valid values are:

    • OR (Default) For example, a query value of capital of Hungary is interpreted as capital OR of OR Hungary.

    • AND For example, a query value of capital of Hungary is interpreted as capital AND of AND Hungary.

  • minimum_should_match – Optional[str] Minimum number of clauses that must match for a document to be returned. See the minimum_should_match parameter for valid values and more information.

  • zero_terms_query – str Indicates whether no documents are returned if the analyzer removes all tokens, such as when using a stop filter. Valid values are: none (Default) No documents are returned if the analyzer removes all tokens. all Returns all documents, similar to a match_all query.

Returns

'QueryInterface' A new instance is created

match_all(boost: Optional[float] = None) → elastipy.query.generated_interface.QueryInterface¶

The most simple query, which matches all documents, giving them all a _score of 1.0.

The _score can be changed with the boost parameter

elasticsearch documentation

Parameters

boost – Optional[float] The _score can be changed with the boost parameter

Returns

'QueryInterface' A new instance is created

match_none() → elastipy.query.generated_interface.QueryInterface¶

This is the inverse of the match_all query, which matches no documents.

elasticsearch documentation

Returns

'QueryInterface' A new instance is created

metric(*aggregation_name_type, **params)¶

Alias for aggregation()

metric_avg(*aggregation_name: Optional[str], field: str, missing: Optional[Any] = None, script: Optional[dict] = None, return_self: bool = False)¶

A single-value metrics aggregation that computes the average of numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • missing – Optional[Any]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_boxplot(*aggregation_name: Optional[str], field: str, compression: int = 100, missing: Optional[Any] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • compression – int

  • missing – Optional[Any]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_cardinality(*aggregation_name: Optional[str], field: str, precision_threshold: int = 3000, missing: Optional[Any] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • precision_threshold – int

  • missing – Optional[Any]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_extended_stats(*aggregation_name: Optional[str], field: str, sigma: float = 3.0, missing: Optional[Any] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • sigma – float

  • missing – Optional[Any]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_geo_bounds(*aggregation_name: Optional[str], field: str, wrap_longitude: bool = True, return_self: bool = False)¶

A metric aggregation that computes the bounding box containing all geo values for a field.

The Geo Bounds Aggregation is also supported on geo_shape fields.

If wrap_longitude is set to true (the default), the bounding box can overlap the international date line and return a bounds where the top_left longitude is larger than the top_right longitude.

For example, the upper right longitude will typically be greater than the lower left longitude of a geographic bounding box. However, when the area crosses the 180° meridian, the value of the lower left longitude will be greater than the value of the upper right longitude. See Geographic bounding box on the Open Geospatial Consortium website for more information.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str The field defining the geo_point or geo_shape

  • wrap_longitude – bool An optional parameter which specifies whether the bounding box should be allowed to overlap the international date line. The default value is true.

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_geo_centroid(*aggregation_name: Optional[str], field: str, return_self: bool = False)¶

A metric aggregation that computes the weighted centroid from all coordinate values for geo fields.

The centroid metric for geo-shapes is more nuanced than for points. The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket. For example, if a bucket contains shapes comprising of polygons and lines, then the lines do not contribute to the centroid metric. Each type of shape’s centroid is calculated differently. Envelopes and circles ingested via the Circle are treated as polygons.

Warning

Using geo_centroid as a sub-aggregation of geohash_grid:

The geohash_grid aggregation places documents, not individual geo-points, into buckets. If a document’s geo_point field contains multiple values, the document could be assigned to multiple buckets, even if one or more of its geo-points are outside the bucket boundaries.

If a geocentroid sub-aggregation is also used, each centroid is calculated using all geo-points in a bucket, including those outside the bucket boundaries. This can result in centroids outside of bucket boundaries.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str The field defining the geo_point or geo_shape

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_matrix_stats(*aggregation_name: Optional[str], fields: list, mode: str = 'avg', missing: Optional[Any] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • fields – list

  • mode – str

  • missing – Optional[Any]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_max(*aggregation_name: Optional[str], field: str, missing: Optional[Any] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • missing – Optional[Any]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_median_absolute_deviation(*aggregation_name: Optional[str], field: str, compression: int = 1000, missing: Optional[Any] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • compression – int

  • missing – Optional[Any]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_min(*aggregation_name: Optional[str], field: str, missing: Optional[Any] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • missing – Optional[Any]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_percentile_ranks(*aggregation_name: Optional[str], field: str, values: list, keyed: bool = True, hdr__number_of_significant_value_digits: Optional[int] = None, missing: Optional[Any] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • values – list

  • keyed – bool

  • hdr__number_of_significant_value_digits – Optional[int]

  • missing – Optional[Any]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_percentiles(*aggregation_name: Optional[str], field: str, percents: list = '(1, 5, 25, 50, 75, 95, 99)', keyed: bool = True, tdigest__compression: int = 100, hdr__number_of_significant_value_digits: Optional[int] = None, missing: Optional[Any] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • percents – list

  • keyed – bool

  • tdigest__compression – int

  • hdr__number_of_significant_value_digits – Optional[int]

  • missing – Optional[Any]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_rate(*aggregation_name: Optional[str], unit: str, field: Optional[str] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • unit – str

  • field – Optional[str]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_scripted_metric(*aggregation_name: Optional[str], map_script: str, combine_script: str, reduce_script: str, init_script: Optional[str] = None, params: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • map_script – str

  • combine_script – str

  • reduce_script – str

  • init_script – Optional[str]

  • params – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_stats(*aggregation_name: Optional[str], field: str, missing: Optional[Any] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • missing – Optional[Any]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_string_stats(*aggregation_name: Optional[str], field: str, show_distribution: bool = False, missing: Optional[Any] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • show_distribution – bool

  • missing – Optional[Any]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_sum(*aggregation_name: Optional[str], field: str, missing: Optional[Any] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – str

  • missing – Optional[Any]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_t_test(*aggregation_name: Optional[str], a__field: str, b__field: str, type: str, a__filter: Optional[dict] = None, b__filter: Optional[dict] = None, script: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • a__field – str

  • b__field – str

  • type – str

  • a__filter – Optional[dict]

  • b__filter – Optional[dict]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_top_hits(*aggregation_name: Optional[str], size: int, sort: Optional[dict] = None, _source: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • size – int

  • sort – Optional[dict]

  • _source – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_top_metrics(*aggregation_name: Optional[str], metrics: dict, sort: Optional[dict] = None, return_self: bool = False)¶

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • metrics – dict

  • sort – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_value_count(*aggregation_name: Optional[str], field: Optional[str] = None, script: Optional[dict] = None, return_self: bool = False)¶

A single-value metrics aggregation that counts the number of values that are extracted from the aggregated documents. These values can be extracted either from specific fields in the documents, or be generated by a provided script. Typically, this aggregator will be used in conjunction with other single-value aggregations. For example, when computing the avg one might be interested in the number of values the average is computed over.

value_count does not de-duplicate values, so even if a field has duplicates (or a script generates multiple identical values for a single document), each value will be counted individually.

Note

Because value_count is designed to work with any field it internally treats all values as simple bytes. Due to this implementation, if _value script variable is used to fetch a value instead of accessing the field directly (e.g. a “value script”), the field value will be returned as a string instead of it’s native format.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • field – Optional[str] The field who’s values should be counted

  • script – Optional[dict] Alternatively counting the values generated by a script

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

metric_weighted_avg(*aggregation_name: Optional[str], value__field: str, weight__field: str, value__missing: Optional[Any] = None, weight__missing: Optional[Any] = None, format: Optional[str] = None, value_type: Optional[str] = None, script: Optional[dict] = None, return_self: bool = False)¶

A single-value metrics aggregation that computes the weighted average of numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

When calculating a regular average, each datapoint has an equal “weight” …​ it contributes equally to the final value. Weighted averages, on the other hand, weight each datapoint differently. The amount that each datapoint contributes to the final value is extracted from the document, or provided by a script.

As a formula, a weighted average is the ∑(value * weight) / ∑(weight)

A regular average can be thought of as a weighted average where every value has an implicit weight of 1

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • value__field – str The field that values should be extracted from

  • weight__field – str The field that weights should be extracted from

  • value__missing – Optional[Any] A value to use if the field is missing entirely

  • weight__missing – Optional[Any] A weight to use if the field is missing entirely

  • format – Optional[str]

  • value_type – Optional[str]

  • script – Optional[dict]

  • return_self – bool If True, this call returns the created metric, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

property param¶

Access to the search parameters

pipeline(*aggregation_name_type, **params)¶

Alias for aggregation()

pipeline_avg_bucket(*aggregation_name: Optional[str], buckets_path: str, gap_policy: str = 'skip', format: Optional[str] = None, return_self: bool = False)¶

A sibling pipeline aggregation which calculates the (mean) average value of a specified metric in a sibling aggregation. The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • buckets_path –

    str The path to the buckets we wish to find the average for.

    See: bucket path syntax

  • gap_policy –

    str The policy to apply when gaps are found in the data.

    See: gap policy

  • format – Optional[str] Format to apply to the output value of this aggregation

  • return_self – bool If True, this call returns the created pipeline, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

pipeline_bucket_script(*aggregation_name: Optional[str], script: str, buckets_path: Mapping[str, str], gap_policy: str = 'skip', format: Optional[str] = None, return_self: bool = False)¶

A parent pipeline aggregation which executes a script which can perform per bucket computations on specified metrics in the parent multi-bucket aggregation. The specified metric must be numeric and the script must return a numeric value.

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • script – str The script to run for this aggregation. The script can be inline, file or indexed. (see Scripting for more details)

  • buckets_path – Mapping[str, str] A map of script variables and their associated path to the buckets we wish to use for the variable (see buckets_path Syntax for more details)

  • gap_policy – str The policy to apply when gaps are found in the data (see Dealing with gaps in the data for more details)

  • format – Optional[str] Format to apply to the output value of this aggregation

  • return_self – bool If True, this call returns the created pipeline, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

pipeline_derivative(*aggregation_name: Optional[str], buckets_path: str, gap_policy: str = 'skip', format: Optional[str] = None, units: Optional[str] = None, return_self: bool = False)¶

A parent pipeline aggregation which calculates the derivative of a specified metric in a parent histogram (or date_histogram) aggregation. The specified metric must be numeric and the enclosing histogram must have min_doc_count set to 0 (default for histogram aggregations).

elasticsearch documentation

Parameters
  • aggregation_name – Optional[str] Optional name of the aggregation. Otherwise it will be auto-generated.

  • buckets_path –

    str The path to the buckets we wish to find the average for.

    See: bucket path syntax

  • gap_policy –

    str The policy to apply when gaps are found in the data.

    See: gap policy

  • format – Optional[str] Format to apply to the output value of this aggregation

  • units – Optional[str] The derivative aggregation allows the units of the derivative values to be specified. This returns an extra field in the response normalized_value which reports the derivative value in the desired x-axis units.

  • return_self – bool If True, this call returns the created pipeline, otherwise the parent is returned.

Returns

'AggregationInterface' A new instance is created and attached to the parent and the parent is returned, unless ‘return_self’ is True, in which case the new instance is returned.

query(query: elastipy.query.generated_interface.QueryInterface)[source]¶

Replace the query.

Parameters

query – a QueryInterface sub-class

Returns

new Search instance

query_string(query: str, default_field: Optional[str] = None, allow_leading_wildcard: elastipy.query.generated_interface.QueryInterface.bool = True, analyze_wildcard: elastipy.query.generated_interface.QueryInterface.bool = False, analyzer: Optional[str] = None, auto_generate_synonyms_phrase_query: Optional[elastipy.query.generated_interface.QueryInterface.bool] = None, boost: float = 1.0, default_operator: Optional[str] = None, enable_position_increments: elastipy.query.generated_interface.QueryInterface.bool = True, fields: Optional[Sequence[str]] = None, fuzziness: Optional[str] = None, fuzzy_max_expansions: int = 50, fuzzy_prefix_length: int = 0, fuzzy_transpositions: elastipy.query.generated_interface.QueryInterface.bool = True, lenient: elastipy.query.generated_interface.QueryInterface.bool = False, max_determinized_states: int = 10000, minimum_should_match: Optional[str] = None, quote_analyzer: Optional[str] = None, phrase_slop: int = 0, quote_field_suffix: Optional[str] = None, rewrite: Optional[str] = None, time_zone: Optional[str] = None) → elastipy.query.generated_interface.QueryInterface¶

Returns documents based on a provided query string, using a parser with a strict syntax.

This query uses a syntax to parse and split the provided query string based on operators, such as AND or NOT. The query then analyzes each split text independently before returning matching documents.

You can use the query_string query to create a complex search that includes wildcard characters, searches across multiple fields, and more. While versatile, the query is strict and returns an error if the query string includes any invalid syntax.

Warning

Because it returns an error for any invalid syntax, we don’t recommend using the query_string query for search boxes.

If you don’t need to support a query syntax, consider using the match query. If you need the features of a query syntax, use the simple_query_string query, which is less strict.

elasticsearch documentation

Parameters
  • query – str Query string you wish to parse and use for search. See Query string syntax.

  • default_field –

    Optional[str] Default field you wish to search if no field is provided in the query string.

    Defaults to the index.query.default_field index setting, which has a default value of *. The * value extracts all fields that are eligible for term queries and filters the metadata fields. All extracted fields are then combined to build a query if no prefix is specified.

    Searching across all eligible fields does not include nested documents. Use a nested query to search those documents.

    For mappings with a large number of fields, searching across all eligible fields could be expensive.

    There is a limit on the number of fields that can be queried at once. It is defined by the indices.query.bool.max_clause_count search setting, which defaults to 1024.

  • allow_leading_wildcard – bool If true, the wildcard characters * and ? are allowed as the first character of the query string. Defaults to true.

  • analyze_wildcard – bool If true, the query attempts to analyze wildcard terms in the query string. Defaults to false.

  • analyzer – Optional[str] Analyzer used to convert text in the query string into tokens. Defaults to the index-time analyzer mapped for the default_field. If no analyzer is mapped, the index’s default analyzer is used.

  • auto_generate_synonyms_phrase_query – Optional[bool] If true, match phrase queries are automatically created for multi-term synonyms. Defaults to true. See Synonyms and the query_string query for an example.

  • boost –

    float Floating point number used to decrease or increase the relevance scores of the query. Defaults to 1.0.

    Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

  • default_operator –

    Optional[str] Default boolean logic used to interpret text in the query string if no operators are specified. Valid values are:

    • OR (Default) For example, a query string of capital of Hungary is interpreted as capital OR of OR Hungary.

    • AND For example, a query string of capital of Hungary is interpreted as capital AND of AND Hungary.

  • enable_position_increments – bool If true, enable position increments in queries constructed from a query_string search. Defaults to true.

  • fields –

    Optional[Sequence[str]] Array of fields you wish to search.

    You can use this parameter query to search across multiple fields. See Search multiple fields.

  • fuzziness – Optional[str] Maximum edit distance allowed for matching. See Fuzziness for valid values and more information.

  • fuzzy_max_expansions – int Maximum number of terms to which the query will expand. Defaults to 50.

  • fuzzy_prefix_length – int Number of beginning characters left unchanged for fuzzy matching. Defaults to 0.

  • fuzzy_transpositions – bool If true, edits for fuzzy matching include transpositions of two adjacent characters (ab → ba). Defaults to true.

  • lenient – bool If true, format-based errors, such as providing a text query value for a numeric field, are ignored. Defaults to false.

  • max_determinized_states –

    int Maximum number of automaton states required for the query. Default is 10000.

    Elasticsearch uses Apache Lucene internally to parse regular expressions. Lucene converts each regular expression to a finite automaton containing a number of determinized states.

    You can use this parameter to prevent that conversion from unintentionally consuming too many resources. You may need to increase this limit to run complex regular expressions.

  • minimum_should_match –

    Optional[str] Minimum number of clauses that must match for a document to be returned. See the minimum_should_match parameter for valid values and more information.

    See How minimum_should_match works for an example.

  • quote_analyzer –

    Optional[str] Analyzer used to convert quoted text in the query string into tokens. Defaults to the search_quote_analyzer mapped for the default_field.

    For quoted text, this parameter overrides the analyzer specified in the analyzer parameter.

  • phrase_slop – int Maximum number of positions allowed between matching tokens for phrases. Defaults to 0. If 0, exact phrase matches are required. Transposed terms have a slop of 2.

  • quote_field_suffix –

    Optional[str] Suffix appended to quoted text in the query string.

    You can use this suffix to use a different analysis method for exact matches. See Mixing exact search with stemming.

  • rewrite – Optional[str] Method used to rewrite the query. For valid values and more information, see the rewrite parameter.

  • time_zone –

    Optional[str] Coordinated Universal Time (UTC) offset or IANA time zone used to convert date values in the query string to UTC.

    Valid values are ISO 8601 UTC offsets, such as +01:00 or -08:00, and IANA time zone IDs, such as America/Los_Angeles.

    Note

    The time_zone parameter does not affect the date math value of now. now is always the current system time in UTC. However, the time_zone parameter does convert dates calculated using now and date math rounding. For example, the time_zone parameter will convert a value of now/d.

Returns

'QueryInterface' A new instance is created

range(field: str, gt: Optional[Union[str, int, float, datetime.date, datetime.datetime]] = None, gte: Optional[Union[str, int, float, datetime.date, datetime.datetime]] = None, lt: Optional[Union[str, int, float, datetime.date, datetime.datetime]] = None, lte: Optional[Union[str, int, float, datetime.date, datetime.datetime]] = None, format: Optional[str] = None, relation: str = 'INTERSECTS', time_zone: Optional[str] = None, boost: Optional[float] = None) → elastipy.query.generated_interface.QueryInterface¶

Returns documents that contain terms within a provided range.

When the <field> parameter is a date field data type, you can use date math with the gt, gte, lt and lte parameters. See date math

elasticsearch documentation

Parameters
  • field – str Field you wish to search.

  • gt – Optional[Union[str, int, float, date, datetime]] Greater than.

  • gte – Optional[Union[str, int, float, date, datetime]] Greater than or equal to.

  • lt – Optional[Union[str, int, float, date, datetime]] Less than.

  • lte – Optional[Union[str, int, float, date, datetime]] Less than or equal to.

  • format –

    Optional[str] Date format used to convert date values in the query.

    By default, Elasticsearch uses the date format provided in the <field>`s mapping. This value overrides that mapping format.

    For valid syntax see mapping data format

  • relation –

    str Indicates how the range query matches values for range fields. Valid values are:

    • INTERSECTS (Default) Matches documents with a range field value that intersects the query’s range.

    • CONTAINS Matches documents with a range field value that entirely contains the query’s range.

    • WITHIN Matches documents with a range field value entirely within the query’s range.

  • time_zone –

    Optional[str] Coordinated Universal Time (UTC) offset or IANA time zone used to convert date values in the query to UTC.

    Valid values are ISO 8601 UTC offsets, such as +01:00 or -08:00, and IANA time zone IDs, such as America/Los_Angeles.

  • boost –

    Optional[float] Floating point number used to decrease or increase the relevance scores of a query. Defaults to 1.0.

    You can use the boost parameter to adjust relevance scores for searches containing two or more queries.

    Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

Returns

'QueryInterface' A new instance is created

property response¶

Access to the response of the search. Raises exception if accessed before search

Returns

Response, a dict wrapper with some convenience methods

set_response(response: Mapping)[source]¶

Sets the elasticsearch API response.

Use this if you need other means of passing the API response to the Search instance.

Parameters

response – Mapping, the complete response from /search/ endpoint

Returns

self

size(size)[source]¶

Replace the maximum document count.

Parameters

size – int. number of document hits to return

Returns

new Search instance

sort(*sort) → elastipy.search.Search[source]¶

Change the order of the returned documents. See sort search results.

The parameter can be:

  • "field" or "-field" to sort a field ascending or descending

  • {"field": "asc"} or {"field": "desc"} to sort a field ascending or descending

  • a list of strings or objects as above to sort by a couple of fields

  • None to turn off sorting

Returns

Search A new Search instance is created

term(field: str, value: Union[str, int, float, elastipy.query.generated_interface.QueryInterface.bool, datetime.datetime], boost: Optional[float] = None, case_insensitive: Optional[elastipy.query.generated_interface.QueryInterface.bool] = None) → elastipy.query.generated_interface.QueryInterface¶

Returns documents that contain an exact term in a provided field.

You can use the term query to find documents based on a precise value such as a price, a product ID, or a username.

elasticsearch documentation

Parameters
  • field – str Field you wish to search.

  • value – Union[str, int, float, bool, datetime] Term you wish to find in the provided <field>. To return a document, the term must exactly match the field value, including whitespace and capitalization.

  • boost –

    Optional[float] Floating point number used to decrease or increase the relevance scores of a query. Defaults to 1.0.

    You can use the boost parameter to adjust relevance scores for searches containing two or more queries.

    Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

  • case_insensitive – Optional[bool] Allows ASCII case insensitive matching of the value with the indexed field values when set to true. Default is false which means the case sensitivity of matching depends on the underlying field’s mapping.

Returns

'QueryInterface' A new instance is created

terms(field: str, value: Sequence[Union[str, int, float, elastipy.query.generated_interface.QueryInterface.bool, datetime.datetime]], boost: Optional[float] = None) → elastipy.query.generated_interface.QueryInterface¶

Returns documents that contain one or more exact terms in a provided field.

The terms query is the same as the term query, except you can search for multiple values.

elasticsearch documentation

Parameters
  • field – str Field you wish to search.

  • value –

    Sequence[Union[str, int, float, bool, datetime]] The value of this parameter is an array of terms you wish to find in the provided field. To return a document, one or more terms must exactly match a field value, including whitespace and capitalization.

    By default, Elasticsearch limits the terms query to a maximum of 65,536 terms. You can change this limit using the index.max_terms_count setting.

  • boost –

    Optional[float] Floating point number used to decrease or increase the relevance scores of a query. Defaults to 1.0.

    You can use the boost parameter to adjust relevance scores for searches containing two or more queries.

    Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

Returns

'QueryInterface' A new instance is created

to_body() → dict[source]¶

Returns the complete body of the search request

Returns

dict

to_request() → dict[source]¶

Returns the complete request parameters as would be accepted by elasticsearch.Elasticsearch.search().

Returns

dict

search parameters¶

class elastipy.generated_search_param.SearchParameters(search)[source]¶

Access to this class is through Search.param.

Each method returns a new Search instance.

… CODE:

s = Search()
s = s.param.explain(True).param.size(100)
allow_no_indices(value: bool = True) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – bool If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

Returns

Search A new Search instance is created

allow_partial_search_results(value: bool = True) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

bool If true, returns partial results if there are request timeouts or shard failures. If false, returns an error with no partial results. Defaults to true.

To override the default for this field, set the search.default_allow_partial_results cluster setting to false.

Returns

Search A new Search instance is created

batched_reduce_size(value: int = 512) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – int The number of shard results that should be reduced at once on the coordinating node. This value should be used as a protection mechanism to reduce the memory overhead per search request if the potential number of shards in the request can be large. Defaults to 512.

Returns

Search A new Search instance is created

ccs_minimize_roundtrips(value: bool = True) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – bool If true, network round-trips between the coordinating node and the remote clusters are minimized when executing cross-cluster search (CCS) requests. See How cross-cluster search handles network delays. Defaults to true.

Returns

Search A new Search instance is created

docvalue_fields(value: Optional[Sequence[Union[Mapping[str, str], str]]] = None) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value –

Optional[Sequence[Union[str, Mapping[str, str]]]] Array of wildcard (*) patterns. The request returns doc values for field names matching these patterns in the hits.fields property of the response.

You can specify items in the array as a string or object. See Doc value fields.

Properties of docvalue_fields objects:

  • field (Required, string) Wildcard pattern. The request returns doc values for field names matching this pattern.

  • format (Optional, string) Format in which the doc values are returned.

For date fields, you can specify a [date format](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html9. For numeric fields, you can specify a DecimalFormat pattern.

For other field data types, this parameter is not supported.

Returns

Search A new Search instance is created

expand_wildcards(value: str = 'open') → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

str Controls what kind of indices that wildcard expressions can expand to. Multiple values are accepted when separated by a comma, as in open,hidden. Valid values are:

  • all Expand to open and closed indices, including hidden indices.

  • open Expand only to open indices.

  • closed Expand only to closed indices.

  • hidden Expansion of wildcards will include hidden indices. Must be combined with open, closed, or both.

  • none Wildcard expressions are not accepted.

Defaults to open

Returns

Search A new Search instance is created

explain(value: bool = False) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value – bool If true, returns detailed information about score computation as part of a hit. Defaults to false.

Returns

Search A new Search instance is created

fields(value: Optional[Sequence[Union[Mapping[str, str], str]]] = None) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value –

Optional[Sequence[Union[str, Mapping[str, str]]]] Array of wildcard (*) patterns. The request returns values for field names matching these patterns in the hits.fields property of the response.

You can specify items in the array as a string or object. See Fields for more details.

Properties of fields objects:

  • field (Required, string) Wildcard pattern. The request returns values for field names matching this pattern.

  • format

(Optional, string) Format in which the values are returned.

The date fields date and date_nanos accept a date format. Spatial fields accept either geojson for GeoJSON (the default) or wkt for Well Known Text.

For other field data types, this parameter is not supported.

Returns

Search A new Search instance is created

from_(value: int = 0) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value –

int Starting document offset. Defaults to 0.

By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after parameter.

Returns

Search A new Search instance is created

ignore_throttled(value: bool = True) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – bool If true, concrete, expanded or aliased indices will be ignored when frozen. Defaults to true.

Returns

Search A new Search instance is created

ignore_unavailable(value: bool = False) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – bool If true, missing or closed indices are not included in the response. Defaults to false.

Returns

Search A new Search instance is created

indices_boost(value: Optional[Sequence[Mapping[str, float]]] = None) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value –

Optional[Sequence[Mapping[str, float]]] Boosts the _score of documents from specified indices.

Properties of indices_boost objects:

<index>: <boost-value>

  • <index> is the name of the index or index alias. Wildcard (*) expressions are supported.

  • <boost-value> is the float factor by which scores are multiplied.

A boost value greater than 1.0 increases the score. A boost value between 0 and 1.0 decreases the score.

Returns

Search A new Search instance is created

max_concurrent_shard_requests(value: int = 5) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – int Defines the number of concurrent shard requests per node this search executes concurrently. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests. Defaults to 5.

Returns

Search A new Search instance is created

min_score(value: Optional[float] = None) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value – Optional[float] Minimum _score for matching documents. Documents with a lower _score are not included in the search results.

Returns

Search A new Search instance is created

pre_filter_shard_size(value: Optional[int] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

Optional[int] Defines a threshold that enforces a pre-filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method ie. if date filters are mandatory to match but the shard bounds and the query are disjoint. When unspecified, the pre-filter phase is executed if any of these conditions is met:

  • The request targets more than 128 shards.

  • The request targets one or more read-only index.

  • The primary sort of the query targets an indexed field.

Returns

Search A new Search instance is created

preference(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

Optional[str] Nodes and shards used for the search. By default, Elasticsearch selects from eligible nodes and shards using adaptive replica selection, accounting for allocation awareness.

Valid values:

  • _only_local Run the search only on shards on the local node.

  • _local If possible, run the search on shards on the local node. If not, select shards using the default method.

  • _only_nodes:<node-id>,<node-id> Run the search on only the specified nodes IDs. If suitable shards exist on more than one selected nodes, use shards on those nodes using the default method. If none of the specified nodes are available, select shards from any available node using the default method.

  • _prefer_nodes:<node-id>,<node-id> If possible, run the search on the specified nodes IDs. If not, select shards using the default method.

  • _shards:<shard>,<shard> Run the search only on the specified shards. This value can be combined with other preference values, but this value must come first. For example: _shards:2,3|_local

  • <custom-string> Any string that does not start with _. If the cluster state and selected shards do not change, searches using the same <custom-string> value are routed to the same shards in the same order.

Returns

Search A new Search instance is created

q(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

Optional[str] Query in the Lucene query string syntax.

You can use the q parameter to run a query parameter search. Query parameter searches do not support the full Elasticsearch Query DSL but are handy for testing.

Important

The q parameter overrides the query parameter in the request body. If both parameters are specified, documents matching the query request body parameter are not returned.

Returns

Search A new Search instance is created

request_cache(value: Optional[bool] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – Optional[bool] If true, the caching of search results is enabled for requests where size is 0. See Shard request cache settings. Defaults to index level settings.

Returns

Search A new Search instance is created

rest_total_hits_as_int(value: bool = False) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – bool Indicates whether hits.total should be rendered as an integer or an object in the rest search response. Defaults to false.

Returns

Search A new Search instance is created

routing(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – Optional[str] Target the specified primary shard.

Returns

Search A new Search instance is created

scroll(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

Optional[str] Period to retain the search context for scrolling. Format is Time units. See Scroll search results.

By default, this value cannot exceed 1d (24 hours). You can change this limit using the search.max_keep_alive cluster-level setting.

Returns

Search A new Search instance is created

search_type(value: str = 'query_then_fetch') → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

str How distributed term frequencies are calculated for relevance scoring.

Valid values:

  • query_then_fetch (Default) Distributed term frequencies are calculated locally for each shard running the search. We recommend this option for faster searches with potentially less accurate scoring.

  • dfs_query_then_fetch Distributed term frequencies are calculated globally, using information gathered from all shards running the search. While this option increases the accuracy of scoring, it adds a round-trip to each shard, which can result in slower searches.

Returns

Search A new Search instance is created

seq_no_primary_term(value: bool = False) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value – bool If true, returns sequence number and primary term of the last modification of each hit. See Optimistic concurrency control.

Returns

Search A new Search instance is created

size(value: int = 10) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value –

int Defines the number of hits to return. Defaults to 10.

By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after parameter.

Returns

Search A new Search instance is created

sort(value: Optional[Union[str, Sequence[Union[Mapping[str, str], str]], Mapping[str, str]]] = None) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value –

Optional[Union[str, Sequence[Union[str, Mapping[str, str]]], Mapping[str, str]]] Change the order of the returned documents. See sort search results.

The parameter can be:

  • "field" or "-field" to sort a field ascending or descending

  • {"field": "asc"} or {"field": "desc"} to sort a field ascending or descending

  • a list of strings or objects as above to sort by a couple of fields

  • None to turn off sorting

Returns

Search A new Search instance is created

source(value: Union[bool, str, Sequence] = True) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value –

Union[bool, str, Sequence] Indicates which source fields are returned for matching documents. These fields are returned in the hits._source property of the search response. Defaults to true.

Valid values:

  • true (Boolean) The entire document source is returned.

  • false (Boolean) The document source is not returned.

  • <wildcard_pattern> (string or array of strings) Wildcard (*) pattern or array of patterns containing source fields to return.

  • <object> Object containing a list of source fields to include or exclude. Properties for <object>:

    • excludes (string or array of strings) Wildcard (*) pattern or array of patterns containing source fields to exclude from the response. You can also use this property to exclude fields from the subset specified in includes property.

    • includes (string or array of strings) Wildcard (*) pattern or array of patterns containing source fields to return. If this property is specified, only these source fields are returned. You can exclude fields from this subset using the excludes property.

Returns

Search A new Search instance is created

source_excludes(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

Optional[str] A comma-separated list of source fields to exclude from the response.

You can also use this parameter to exclude fields from the subset specified in _source_includes query parameter.

If the _source parameter is false, this parameter is ignored.

Returns

Search A new Search instance is created

source_includes(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

Optional[str] A comma-separated list of source fields to include in the response.

If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the _source_excludes query parameter.

If the _source parameter is false, this parameter is ignored.

Returns

Search A new Search instance is created

stats(value: Optional[Sequence[str]] = None) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value – Optional[Sequence[str]] Stats groups to associate with the search. Each group maintains a statistics aggregation for its associated searches. You can retrieve these stats using the indices stats API.

Returns

Search A new Search instance is created

stored_fields(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

Optional[str] A comma-separated list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response.

If this field is specified, the _source parameter defaults to false. You can pass _source: true to return both source fields and stored fields in the search response.

Returns

Search A new Search instance is created

suggest_field(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – Optional[str] Specifies which field to use for suggestions.

Returns

Search A new Search instance is created

suggest_text(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – Optional[str] The source text for which the suggestions should be returned.

Returns

Search A new Search instance is created

terminate_after(value: int = 0) → elastipy.search.Search[source]¶

A search body parameter.

Parameters

value –

int The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.

Defaults to 0, which does not terminate query execution early.

Returns

Search A new Search instance is created

timeout(value: Optional[str] = None) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – Optional[str] Specifies the period of time to wait for a response in time units. If no response is received before the timeout expires, the request fails and returns an error. Defaults to no timeout.

Returns

Search A new Search instance is created

to_body() → dict¶

Convert all parameters to the representation in the search request body :return: dict

to_query_params() → dict¶

Convert all parameters to the representation as search request query parameters :return: dict

track_scores(value: bool = False) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – bool If true, calculate and return document scores, even if the scores are not used for sorting. Defaults to false.

Returns

Search A new Search instance is created

track_total_hits(value: Union[int, bool] = 10000) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value –

Union[int, bool] Number of hits matching the query to count accurately. Defaults to 10000.

If true, the exact number of hits is returned at the cost of some performance.

If false, the response does not include the total number of hits matching the query.

Returns

Search A new Search instance is created

typed_keys(value: bool = True) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – bool If true, aggregation and suggester names are being prefixed by their respective types in the response. Defaults to true.

Returns

Search A new Search instance is created

version(value: bool = False) → elastipy.search.Search[source]¶

A search query parameter.

Parameters

value – bool If true, returns document version as part of a hit. Defaults to false.

Returns

Search A new Search instance is created

printing utilities¶

class elastipy.search_dump.SearchDump(search: elastipy.search.Search)[source]¶
body(indent: Optional[Union[int, str]] = 2, file: Optional[TextIO] = None)[source]¶

Print the complete request body.

Parameters
  • indent – The json indentation, defaults to 2.

  • file – Optional output stream.

query(indent: Optional[Union[int, str]] = 2, file: Optional[TextIO] = None)[source]¶

Print the query json.

Parameters
  • indent – The json indentation, defaults to 2.

  • file – Optional output stream.

request(indent: Optional[Union[int, str]] = 2, file: Optional[TextIO] = None)[source]¶

Print the complete request parameters as would be accepted by elasticsearch.Elasticsearch.search().

Parameters
  • indent – The json indentation, defaults to 2.

  • file – Optional output stream.

response(indent: Optional[Union[int, str]] = 2, file: Optional[TextIO] = None)[source]¶

Print the response of the search.

Warning

Search must be executed, otherwise ValueError is thrown.

Parameters
  • indent – The json indentation, defaults to 2.

  • file – Optional output stream.

class elastipy.response_dump.ResponseDump(response: elastipy.search.Response)[source]¶
aggregations(indent: Optional[Union[int, str]] = 2, file: Optional[TextIO] = None)[source]¶

Print the aggregations part of the response.

Parameters
  • indent – The json indentation, defaults to 2.

  • file – Optional output stream.

documents(indent: Optional[Union[int, str]] = 2, file: Optional[TextIO] = None)[source]¶

Print the list of documents inside the hits.

Parameters
  • indent – The json indentation, defaults to 2.

  • file – Optional output stream.

table(score: bool = True, sort: Optional[str] = None, digits: Optional[int] = None, header: bool = True, bars: bool = True, zero: Union[bool, float] = True, colors: bool = True, ascii: bool = False, max_width: Optional[int] = None, max_bar_width: int = 40, file=None)[source]¶

Print the hit documents as a table.

Parameters
  • score – bool Include the score for each hit

  • sort – str Optional sort column name which must match a ‘header’ key. Can be prefixed with - (minus) to reverse order

  • digits – int Optional number of digits for rounding.

  • header – bool if True, include the names in the first row.

  • bars – bool Enable display of horizontal bars in each number column. The table width will stretch out in size while limited to ‘max_width’ and ‘max_bar_width’

  • zero –

    • If True: the bar axis starts at zero (or at a negative value if appropriate).

    • If False: the bar starts at the minimum of all values in the column.

    • If a number is provided, the bar starts there, regardless of the minimum of all values.

  • colors – bool Enable console colors.

  • ascii – bool If True fall back to ascii characters.

  • max_width – int Will limit the expansion of the table when bars are enabled. If left None, the terminal width is used.

  • max_bar_width – int The maximum size a bar should have

  • file – Optional text stream to print to.

Next Previous

© Copyright 2021, netzkolchose.de. Revision c1144ab3.

Built with Sphinx using a theme provided by Read the Docs.