Investigate vector based text search

The goal is to make an alternative to FTS5.  

* Ideally you point it at a FTS5 content table
* Words are extracted and automatic vectors for each word are determined - ie no online lookups and vectors are determined unsupervised
* Vector per word stored in that or another database.  They are needed for both ingest and query
* Existing fts5 tokenizers can be used such as html, json, unicodewords
* Content can also be broken down into sentences - we have unicode sentence algorithm and guess paragraphs
* It looks like the average of the vectors of each word in a sentence is used as the vector for a sentence
* A search should not only find matching documents, but should also find the best sentences in the document

The testing should be done with SQLite HTML docs, the recipe database, and the enron emails.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate vector based text search #608

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Investigate vector based text search #608

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions