Exporter

A base class to help export documents to elasticsearch.

reference

class elastipy.Exporter(client=None, index_prefix: Optional[str] = None, index_postfix: Optional[str] = None, update_index: bool = True)[source]

Bases: object

Base class helper to export stuff to elasticsearch.

Derive from class and define class attributes:

INDEX_NAME: str Name of index, might contain a wildcard *

MAPPINGS: dict The mapping definition for the index.

And optionally override methods:

transform_document() Convert a document to elasticsearch.

get_document_id() Return a unique id for the elasticsearch document.

get_document_index() Return an alternative index name for the document.

property client: Access to the elasticsearch client. If none was defined in constructor then elastipy.connections.get("default") is returned.

delete_index() → bool[source]

Try to delete the index. Ignore if not found.

Returns

bool True if deleted, False otherwise.

If the index name contains a wildcard *, True is always returned.

export_list(object_list: Iterable[Any], chunk_size: int = 500, refresh: bool = False, verbose: bool = False, verbose_total: Optional[int] = None, file=None, **kwargs)[source]

Export a list of objects.

Parameters

object_list – sequence of dict This can be a list or generator of dictionaries, containing the objects that should be exported.
chunk_size – int Number of objects per bulk request.
refresh – bool if True require the immediate refresh of the index when finished exporting.
verbose – bool If True print some progress to stderr (using tqdm if present)
verbose_total – int Provide the number of objects for the verbosity if object_list is a generator.
file – Optional string stream to output verbose info, default is stderr.

All other parameters are passed to elasticsearch.helpers.bulk

Returns: dict Response of elasticsearch bulk call.

get_document_id(es_data: Mapping)[source]

Override this to return a single elasticsearch object’s id.

Parameters: es_data – dict Single object as returned by transform_document()
Returns: str, int etc..

get_document_index(es_data: Mapping) → str[source]

Override to define an index per document.

The default function returns the result from index_name() but it’s possible to put objects into separate indices.

For example you might define INDEX_NAME = "documents-*"

and get_document_index might return

self.index_name().replace("*", es_data["type"]

Parameters: es_data – dict Single document as returned by transform_document()
Returns: str

get_index_params() → dict[source]

Returns the complete index parameters.

Override if you need to specialize things.

Returns: dict

index_name() → str[source]

Returns the configured index_prefix - INDEX_NAME - index_suffix

Returns: str

search(**kwargs) → Search[source]

Return a new Search object for this index and client.

Returns: Search instance

transform_document(data: Mapping) → Union[Mapping, Iterable[Mapping]][source]

Override this to transform each documents’s data into an elasticsearch document.

It’s possible to return a list or yield multiple elasticsearch documents.

Parameters: data – dict
Returns: dict or iterable of dict

update_index() → None[source]

Create the index or update changes to the mapping.

Can only be called if INDEX_NAME does not contain a '*' :return: None