Abstract

The present disclosure describes an embedding explorer that allows a user to interactively explore properties of an embedding space and how the embedding space relates to features of entities being embedded. The embedding space is a low-dimensional space onto which the embedding explorer translates high-dimensional vectors to low-dimensional vectors. In particular, the embedding explorer allows the user to load a table of embeddings and feature providers that are required by the user during the exploration. The table includes at least one column with an entity ID and another column with an array of floats representing the embedding for the entity. The user may further add a new embedding to the embedding space. To add the new embedding to the embedding space, the user provides hive tables or CSV files as input to a preprocessing workflow of the embedding explorer. The preprocessing workflow utilizes a t-distributed stochastic neighbor embedding (t-SNE) to scale down tens of millions of points representing the entities. After the preprocessing workflow is completed, it is required to manually register the embedding by adding an element to an “EmbeddingExplorationSources” function of the embedding explorer. An “allDataRootFilepath” field of the “EmbeddingExplorationSources” function accepts an output folder returned by the preprocessing workflow.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS