IntentChat Logo
← Back to English (UK) Blog
Language: English (UK)

Enhancing Chinese Message Search Capability in Telegram

2025-06-24

Enhancing Chinese Message Search Capability in Telegram

Conclusion

To improve the search effectiveness for Chinese messages in Telegram, this can be achieved by manually inserting invisible delimiters or by developing a custom Tokenizer. Furthermore, utilising AI technology for semantic search can also significantly enhance search accuracy.

Key Points

  • Telegram Database: Telegram uses SQLite as its database.
  • Full-Text Search Mechanism: Telegram's full-text search function splits strings into phrases using a Tokenizer, generates hash values, and then compares them against a hash table during searches.
  • Token Generator: The token generator relies on separators and delimiters to split strings.
  • Token Definition: Content outside of separators and delimiters is considered a "token", including three types: uppercase letters (*), numbers (N), and other characters (Co).
  • CJK Character Handling: Most Unicode CJK characters are recognised as tokens.

As there are no delimiters between Chinese characters, Telegram hashes entire strings of Chinese characters, leading to suboptimal search performance. This document delves into the limitations of Telegram's Chinese message search from a code perspective.

Improvement Suggestions

  1. Manually Inserting Delimiters: Manually add invisible delimiters between Chinese characters to improve search performance.
  2. Custom Tokenizer: Develop a custom Tokenizer and modify the Telegram client to enhance search functionality.

AI Semantic Search

In addition to traditional search methods, the introduction of AI offers a better solution for semantic search. The project telegram-search utilises an embedding model, which allows users to find desired content even without exact keyword matches. For example, inputting "the person who ate dinner last night" could retrieve "the man who ate with us last night".

By utilising the methods outlined above, the search experience for Chinese messages in Telegram can be significantly enhanced.