# ZY SmartBI System Configuration Guide

This guide provides a detailed overview of the various configuration options for the ZY SmartBI system, including environment variables, database connections, LLM service settings, and other system-level parameters. Understanding and correctly configuring these parameters is crucial for the system's stable operation and performance optimization.

## 1. Configuration Priority

ZY SmartBI uses a layered configuration system to ensure flexibility and security:

1.  **Database Configuration (Highest Priority)**: Settings saved via the "System Settings" page in the admin interface. These settings are stored in the database and are **loaded dynamically at runtime**, taking **immediate effect** without needing to restart the service.

2.  **Environment Variables (`.env` file)**: The `.env` file located in the project's root directory. These configurations are loaded at **application startup** and serve as the system's **default** or **fallback** values.

**Core Principle**: If a setting exists in the database, it will **override** the same setting in the `.env` file. This means that **changes made through the admin interface always take precedence**.

## 2. System Configuration via the Admin Interface (Recommended)

We strongly recommend that administrators configure the system through the built-in admin interface. This is not only more secure, but changes also take effect immediately.

- **Access Path**: After logging in, navigate to `Admin` -> `System Settings`.
- **Configurable Items**:
    - **AI Provider Settings**: Manage API keys and API Base URLs for various large model service providers (e.g., OpenAI, Gemini).
        - **API Keys**: For security, the system does not display saved keys. It only indicates "Set" or "Not Set". You can enter a new key at any time to update it.
        - **API Base URL**: If your network environment requires a proxy to access LLM services, you can configure the proxy address here.
    - **System Prompt Templates**: Customize the prompt templates the system uses when interacting with LLMs in different scenarios.
        - **Knowledge Q&A Prompt Template**: Used when a user's question is identified as a "Knowledge Q&A" intent. This template guides the LLM to answer questions about data definitions, business terms, etc., using its knowledge base and known business context.
        - **Fallback Prompt Template**: Used when the system cannot understand the user's intent or execute a request. This template aims to provide clear guidance to the user about what the system can do and offers examples of effective questions.
        - **Clarification Prompt Template**: Used when a user's question is not clear enough and requires more information to be executed. This template guides the LLM to generate a specific question to help the user provide necessary context (like time range, dimensions, etc.).
        - **System General Prompt Template**: Serves as a general system-level instruction, providing the LLM with its basic role and behavioral guidelines in a conversation.

## 3. Configuration via `.env` File (Fallback)

The `.env` file is primarily used for configuring the system's initial default values or providing fallback configurations when the database is inaccessible. After modifying this file, you **must restart the ZY SmartBI service** for the changes to take effect.

### 3.1 Database Configuration

*   **`DB_SERVER_TYPE`**
    *   **Description**: Specifies the type of ZY SmartBI's internal metadata database. The system uses this type and the other `DB_*` variables below to construct the final `DATABASE_URL`. This setting is ignored if `DATABASE_URL` is set directly.
    *   **Options**: `SQLite`, `PostgreSQL`
    *   **Example**: `DB_SERVER_TYPE=PostgreSQL`

*   **`DATABASE_URL`**
    *   **Description**: Directly specifies the connection string for ZY SmartBI's internal metadata database. If this is set, `DB_SERVER_TYPE` and other `DB_*` variables are ignored.
    *   **Format**:
        *   **SQLite (default)**: `sqlite+aiosqlite:///./insightlink_dev.db`
        *   **PostgreSQL**: `postgresql+asyncpg://user:password@host:port/database_name`

*   **`DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_PORT`, `DB_NAME`**
    *   **Description**: Used to build the connection string when `DB_SERVER_TYPE` is set to `PostgreSQL`.

*   **`VECTOR_STORE_PATH`**
    *   **Description**: The storage path for the ChromaDB vector database, which is used to store embeddings for the RAG service.
    *   **Format**: A file system path, e.g., `./chroma_db` (creates a `chroma_db` folder in the project root).
    *   **Important**: Ensure this path is writable and has sufficient disk space.

### 3.2 LLM Service Configuration

*   **`OPENAI_API_KEY`**, **`ANTHROPIC_API_KEY`**, **`GEMINI_API_KEY`**, **`DEEPSEEK_API_KEY`**, **`DASHSCOPE_API_KEY`**
    *   **Description**: API keys for the corresponding LLM services. Fill in the key for the provider you choose.

*   **`LLM_LOAD_BALANCING_ENABLED`**
    *   **Description**: Enables or disables LLM load balancing. When enabled, the system distributes requests among multiple LLM providers based on configured weights.
    *   **Options**: `True`, `False`
    *   **Example**: `LLM_LOAD_BALANCING_ENABLED=True`

*   **`LLM_PARALLEL_CALLING_ENABLED`**
    *   **Description**: Enables or disables parallel LLM calls. When enabled, the system sends requests to multiple LLM providers simultaneously and selects the fastest or best response based on `LLM_SELECTION_STRATEGY`.
    *   **Options**: `True`, `False`
    *   **Example**: `LLM_PARALLEL_CALLING_ENABLED=True`

*   **`LLM_SELECTION_STRATEGY`**
    *   **Description**: The strategy for selecting an LLM response when parallel calling is enabled.
    *   **Options**: `FIRST_COMPLETED`, `CHEAPEST`, `HIGHEST_QUALITY`.
    *   **Example**: `LLM_SELECTION_STRATEGY=FIRST_COMPLETED`

*   **`LLM_PROVIDERS`**
    *   **Description**: A JSON string to configure the weight, cost, and quality for each LLM provider, used for load balancing and selection strategies.
    *   **Format**: `[{"name": "openai", "weight": 1, "cost": 0.001, "quality": 0.8}, {"name": "gemini", "weight": 0, "cost": 0.002, "quality": 0.9}]`

*   **`OPENAI_TEXT_MODEL`**, **`ANTHROPIC_TEXT_MODEL`**, **`GEMINI_TEXT_MODEL`**, **`DEEPSEEK_TEXT_MODEL`**
    *   **Description**: Specifies the specific LLM model name for text generation tasks.

*   **`OPENAI_EMBEDDING_MODEL`**, **`GEMINI_EMBEDDING_MODEL`**, **`DASHSCOPE_EMBEDDING_MODEL`**
    *   **Description**: Specifies the specific LLM model name for embedding generation tasks.

*   **`LLM_TEMPERATURE`**
    *   **Description**: The temperature parameter for the LLM, controlling the randomness of the generated text. Higher values result in more random output; lower values result in more deterministic output.
    *   **Range**: 0.0 to 1.0
    *   **Example**: `LLM_TEMPERATURE=0.7`

*   **`GEMINI_THINK_BUDGET`**
    *   **Description**: The budget (in seconds) for the Gemini model's thinking process, used to control its thinking time before generating a response.
    *   **Example**: `GEMINI_THINK_BUDGET=300`

### 3.3 Security and Privacy Configuration

*   **`SECRET_KEY`**
    *   **Description**: A secret key used for encryption and secure signing. **In a production environment, you must set this to a long, random, and complex string.**
    *   **Example**: `SECRET_KEY=your_super_secret_and_random_key_here`

*   **`LOGIN_DEBUG`**
    *   **Description**: Enables or disables debug login mode. When enabled, the system allows logging in with any password for a given username.
    *   **Options**: `True`, `False`
    *   **Warning**: This is a serious security risk. Ensure it is set to `False` in a production environment.

*   **`PII_SCRUBBING_ENABLED`**
    *   **Description**: Enables or disables PII (Personally Identifiable Information) scrubbing. When enabled, the system automatically identifies and replaces sensitive information before sending data to external LLM services. PII patterns can be dynamically loaded from the database.
    *   **Options**: `True`, `False`
    *   **Example**: `PII_SCRUBBING_ENABLED=True`

### 3.4 Logging and Monitoring Configuration

*   **`LOG_LEVEL`**
    *   **Description**: The output level for system logs. Controls which levels of log messages are recorded. ZY SmartBI uses `structlog` for structured logging.
    *   **Options**: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
    *   **Example**: `LOG_LEVEL=INFO`

*   **`LOG_FILE_PATH`**
    *   **Description**: The storage path for the system log file. If not set, logs may be output to the console.
    *   **Example**: `LOG_FILE_PATH=./logs/ZY SmartBI.log`

### 3.5 Other Configurations

*   **`MAX_CONCURRENT_QUERIES`**
    *   **Description**: The maximum number of concurrent queries the system can handle. Adjust based on server resources and expected load.
    *   **Example**: `MAX_CONCURRENT_QUERIES=10`

*   **`METADATA_SCAN_INTERVAL_HOURS`**
    *   **Description**: The interval (in hours) for automatically scanning data source metadata. Set to 0 to disable automatic scanning.
    *   **Example**: `METADATA_SCAN_INTERVAL_HOURS=24`

*   **`MAX_CONVERSATION_HISTORY_MESSAGES`**
    *   **Description**: The maximum number of historical messages to retain in a conversation for context.
    *   **Example**: `MAX_CONVERSATION_HISTORY_MESSAGES=20`

*   **`MAX_SAMPLE_ROWS_IN_HISTORY`**
    *   **Description**: The maximum number of sample rows to store in the conversation history for data results, to avoid storing excessive sensitive data.
    *   **Example**: `MAX_SAMPLE_ROWS_IN_HISTORY=5`

## 4. Next Steps

*   **Data Source Management**: To learn how to connect and manage your business data sources, see the [Data Source Management Guide](04_datasource_management.md).
*   **Security and User Management**: To learn how to configure user permissions and security policies, see the [Security and User Management Guide](05_security_user_management.md).
